Page 76 - System on Package_ Miniaturization of the Entire System
P. 76
Intr oduction to System-on-Chip (SOC) 53
related to the logical and physical architecture of the memory subsystem. We list some
of these options here based on Figure 2-10:
• Type of memory: SRAM, ROM, flash, embedded DRAM (eDRAM), and
Ferroelectric DRAM (FEDRAM) at both the L1 and L2 levels. The decision
depends on specific application requirements, availability in a given technology
node, performance, and cost.
• For L1 and L2
o Size (kbits)
o Unified (program and data) or program-only or data-only or combination
o Number of physical blocks, size of each block
o For each block choice between a denser (but slower) or a faster (but bigger)
memory
o Single-port versus dual-port or multiport memory
o For each physical block—MUX-factor, which decides performance and
aspect ratio
o Cache or mapped or combination
o In case of cache—type of cache, line size, etc.
o Clock rate relative to the central processing unit (CPU) and number of wait
states which may vary for each physical block
• For external memory interface (EMIF)
o Type(s) of memory to be interfaced
o Size and number of physical block of the off-chip memory
o Width of the EMIF interface (16, 32, or 64 bits)
o Clock rate
Since the performance and throughput need to be met in the context of an application,
the memory subsystem design involves working closely with the applications team.
While there can be multiple feasible solutions, an optimal solution is one in which the
CPU, memory, and I/O bandwidths are balanced such that none of them becomes a
bottleneck. This requires building a model (software simulator) of the instruction set
architecture, the memory subsystem, the direct memory access (DMA), the external
memory interface, and the off-chip memory. While it is desirable for the model to be
cycle accurate, it conflicts with the requirement of faster software simulation to enable
performance analysis over a reasonably large number of cycles. The design, application,
and software development tools teams have to work closely to make the right trade-offs
and adopt appropriate levels of abstraction for different system components.
The challenges in arriving at an optimal memory subsystem increase further if the
SOC is targeting applications that are based on different core algorithms. As an example,
Table 2.1 shows different applications targeted by the DM642 digital media processor
and the key algorithms for each of the applications. The CPU, memory, and I/O
bandwidth requirements vary across these applications. The memory subsystem is
decided by the application with the most stringent performance requirement, and for
other applications the CPU can be run slower (e.g., 500 MHz instead of 600 MHz) at a
lower supply voltage thus reducing the power dissipation.
Just as the memory subsystem can be optimized for a given software implementation
of an application, the software implementation can also be optimized for a given
memory subsystem. The memory subsystem hence needs to be designed concurrently
with the application development.