Page 367 - A Practical Guide from Design Planning to Manufacturing

P. 367

Silicon Debug and Test 337

Breakpoint mechanisms give even better control of clocking during
tests. Breakpoints monitor the processor state for specific events. These
events could be things like a particular type of instruction, a certain
number of branch mispredicts or cache misses, or a particular exception.
The signaling of a breakpoint then causes the processor clock to stop
automatically so that the scan state can be captured and read. This
allows for the scan capture to be tied to a particular processor behavior
rather than a particular clock cycle. Nested trigger events allow capture
to occur after a specific sequence of events that indicates a particular
complex failure.
Scan is appropriate for areas of the die that are a mix of logic and
sequentials. Scan does not work well for memory arrays, which are
almost entirely made up of sequentials. For large arrays, the area penalty
of making all the array elements scannable would be excessive. However,
it is very difficult to write functional tests that will stimulate all the
memory cells of all the on-die arrays. A common compromise is adding
array freeze and dump circuits. These allow the array state to be frozen
by a breakpoint event. This prevents any further writes to the array. Then
the array dump allows all the bits of the array to be read directly at the
pins. These values are compared to expected values to check the array
for defects.
For very large arrays, dumping the entire contents of the array may take
excessive test time. Instead, the circuitry to test the array is built on die
as part of the array itself. This is called built-in self-test (BIST). BIST
requires a stimulus generator, which writes values into the array, and a
response analyzer, which checks for the correct read values. BIST adds sig-
nificant area and complexity, but for the largest arrays on die the area of
a BIST controller may be tiny compared to that of the array itself.
Some DFT circuits allow defects not only to be detected but also to be
bypassed in order to allow the part to be shipped. The simplest method
for doing this is to disable the part of the die that has the defect. A single
processor die might be sold as two different server products, one with
multiprocessor support and one without. If a die has a defect that affects
only multiprocessor functionality, it may still be sold but only as the
product that does not support this feature. The same die might also be
sold as a desktop product with half the on-die cache disabled. Designing
the cache to allow either half of the full array to be disabled allows any
die with a single cache defect to still be sold by disabling the half with the
defect.
The full cache size can be supported despite defects by adding redun-
dant rows. These are extra rows of memory cells that are enabled to
replace defective cells. When a defect is found, fuses are set on the die that
cause cache accesses to the defective row to be directed to the redundant
row instead. For processors where on-die cache makes up more than half

362 363 364 365 366 367 368 369 370 371 372