Page 137 - The Art of Designing Embedded Systems

P. 137

124 THE ART OF DESIGNING EMBEDDED SYSTEMS

second, even when sitting in an idle loop. Smaller device geometries mean
that sometimes only a handful of electrons represent a one or zero. A
single-bit failure, for a fleetingly transient bit of time, is disaster.
Yet these failures and glitches are exceedingly rare. Our embedded
systems, and even our desktop computers, switch trillions of bits without
the slightest problem.
Problems can and do occur, though, due more often to hardware or
software design flaws than to glitches. A watchdog timer (WDT) is a good
defense for all but the smallest of embedded systems. It’s a mechanism that
restarts the program if the software runs amok.
The WDT usually resets the processor once every few hundred milli-
seconds unless reset. It’s up to the firmware to reinitialize the watchdog
timer, restarting the timing interval. The code tickles the timer frequently,
restarting the countdown interval. A code crash means the timer counts
down without interruption; at time-out, hardware resets the CPU, ideally
bringing the system back on-line.
The first rule of watchdog design is to drive the CPU’s reset in-
put, not an interrupt (such as NMI). A WDT time-out means that some-
thing awful happened, something that may have left the CPU in an unpre-
dictable scrambled state. Only RESET is guaranteed to bring the part back
on-line.
The non-maskable interrupt is seductive to some designers, espe-
cially when the pin is unused and there’s a chance to save a few gates. For
better or worse, NMI-and all other interrupt inputs-is not fail-safe. Con-
fused internal logic will shut down NMI response on some CPUs.
On other chips a simple software problem can render the non-mask-
able interrupt unusable. The 68K, for example, will crash if the stack
pointer assumes an odd value. If you rely on the WDT to save the day, dri-
ving an interrupt while SP is odd results in a double bus fault, which puts
the CPU in a dead state until it’s reset.
Next, think through the litigation potential of your system. Life-
threatening failure modes mean you’ve got to beware of simple watchdog
timers! If a single I/O instruction successfully keeps the WDT alive, then
there’s a real chance that the code might crash but continue to tickle the
timer. Some companies (Toshiba, for example) require a more complex se-
quence of commands to the timer; it’s equally easy to create a PLD your-
self that requires a fiendishly complex WDT sequence.
It’s also a very bad idea to put the WDT reset code inside of an in-
terrupt service routine. It’s always intriguing, while debugging, to find
your code crashed but one or more ISRs still functioning. Perhaps the ser-

132 133 134 135 136 137 138 139 140 141 142