The precision with which numbers can be represented is determined by the word length in the fixed point format, and by the number of bits in the mantissa in the floating point format. In a 32 bit DSP processor the mantissa is usually 24 bits: so the precision of a floating point DSP is the same as that of a 24 bit fixed point processor. But floating point has one further advantage over fixed point: because the hardware automatically scales each number to use the full word length of the mantissa, the full precision is maintained even for small numbers: There is a potential disadvantage to the way floating point works. Because the hardware automatically scales and normalises every number, the errors due to truncation and rounding depend on the size of the number. If we regard these errors as a source of quantisation noise, then the noise floor is modulated by the size of the signal. Although the modulation can be shown to be always downwards (that is, a 32 bit floating point format always has noise which is less than that of a 24 bit fixed point format), the signal dependent modulation of the noise may be undesirable: notably, the audio industry prefers to use 24 bit fixed point DSP processors over floating point because it is thought by some that the floating point noise floor modulation is audible. The precision directly affects quantisation error. The largest number which can be represented determines the dynamic range of the data format. In fixed point format this is straightforward: the dynamic range is the range of numbers that can be represented in the available word length. For floating point format, though, the binary point is moved automatically to accommodate larger numbers: so the dynamic range is determined by the size of the exponent. For an 8 bit exponent, the dynamic range is close to 1,500 dB: So the dynamic range of a floating point format is enormously larger than for a fixed point format: While the dynamic range of a 32 bit floating point format is large, it is not infinite: so it is possible to suffer overflow and underflow even with a 32 bit floating point format. A classic example of this can be seen by running fractal (Mandelbrot) calculations on a 32 bit DSP processor: after quite a long time, the fractal pattern ceases to change because the increment size has become too small for a 32 bit floating point format to represent. Most DSP processors have extended precision registers within the processor: The diagram shows the data path of the Lucent DSP32C processor. Although this is a 32 bit floating point processor, it uses 40 and 45 bit registers internally: so results can be held to a wider dynamic range internally than when written to memory.
