Typical DSP operations require simple many additions and multiplications.
additions and multiplications require us to:
To fetch the two operands in a single instruction cycle, we need to be able to make two memory accesses simultaneously.
Actually, a little thought will show that since we also need to store the result - and to read the instruction itself - we really need more than two memory accesses per instruction cycle.
For this reason DSP processors usually support multiple memory accesses in the same instruction cycle. It is not possible to access two different memory addresses simultaneously over a single memory bus. There are two common methods to achieve multiple memory accesses per instruction cycle:
The Harvard architecture has two separate physical memory buses. This allows two simultaneous memory accesses:
The true Harvard architecture dedicates one bus for fetching instructions, with the other available to fetch operands. This is inadequate for DSP operations, which usually involve at least two operands. So DSP Harvard architectures usually permit the 'program' bus to be used also for access of operands. Note that it is often necessary to fetch three things - the instruction plus two operands - and the Harvard architecture is inadequate to support this: so DSP Harvard architectures often also include a cache memory which can be used to store instructions which will be reused, leaving both Harvard buses free for fetching operands. This extension - Harvard architecture plus cache - is sometimes called an extended Harvard architecture or Super Harvard ARChitecture (SHARC).
The Harvard architecture requires two memory buses. This makes it expensive to bring off the chip - for example a DSP using 32 bit words and with a 32 bit address space requires at least 64 pins for each memory bus - a total of 128 pins if the Harvard architecture is brought off the chip. This results in very large chips, which are difficult to design into a circuit.
Even the simplest DSP operation - an addition involving two operands and a store of the result to memory - requires four memory accesses (three to fetch the two operands and the instruction, plus a fourth to write the result). This exceeds the capabilities of a Harvard architecture. Some processors get around this by using a modified von Neuman architecture.
The von Neuman architecture uses only a single memory bus:
This is cheap, requiring less pins that the Harvard architecture, and simple to use because the programmer can place instructions or data anywhere throughout the available memory. But it does not permit multiple memory accesses.
The modified von Neuman architecture allows multiple memory accesses per instruction cycle by the simple trick of running the memory clock faster than the instruction cycle. For example the Lucent DSP32C runs with an 80 MHz clock: this is divided by four to give 20 million instructions per second (MIPS), but the memory clock runs at the full 80 MHz - each instruction cycle is divided into four 'machine states' and a memory access can be made in each machine state, permitting a total of four memory accesses per instruction cycle:
In this case the modified von Neuman architecture permits all the memory accesses needed to support addition or multiplication: fetch of the instruction; fetch of the two operands; and storage of the result.
Both Harvard and von Neuman architectures require the programmer to be careful of where in memory data is placed: for example with the Harvard architecture, if both needed operands are in the same memory bank then they cannot be accessed simultaneously.