Page 137 - The Art of Designing Embedded Systems
P. 137

124  THE ART OF  DESIGNING EMBEDDED SYSTEMS


                       second, even when sitting in an idle loop. Smaller device geometries mean
                       that  sometimes only a handful  of  electrons  represent  a one or zero. A
                       single-bit failure, for a fleetingly transient bit of time, is disaster.
                            Yet these failures and glitches are exceedingly rare. Our embedded
                       systems, and even our desktop computers, switch trillions of bits without
                       the slightest problem.
                            Problems can and do occur, though, due more often to hardware or
                       software design flaws than to glitches. A watchdog timer (WDT) is a good
                       defense for all but the smallest of embedded systems. It’s a mechanism that
                       restarts the program if the software runs amok.
                            The WDT usually resets the processor once every few hundred milli-
                       seconds unless reset. It’s up to the firmware to reinitialize the watchdog
                       timer, restarting the timing interval. The code tickles the timer frequently,
                       restarting the countdown interval. A code crash means the timer counts
                       down without interruption; at time-out, hardware resets the CPU, ideally
                       bringing the system back on-line.
                            The first rule  of  watchdog  design is to drive the CPU’s reset  in-
                       put, not an interrupt (such as NMI). A WDT time-out means that some-
                       thing awful happened, something that may have left the CPU in an unpre-
                       dictable scrambled state. Only RESET is guaranteed to bring the part back
                       on-line.
                            The non-maskable interrupt is seductive to some designers, espe-
                       cially when the pin is unused and there’s a chance to save a few gates. For
                       better or worse, NMI-and   all other interrupt inputs-is  not fail-safe. Con-
                       fused internal logic will shut down NMI response on some CPUs.
                            On other chips a simple software problem can render the non-mask-
                       able  interrupt  unusable.  The 68K, for  example, will crash if  the  stack
                       pointer assumes an odd value. If you rely on the WDT to save the day, dri-
                       ving an interrupt while SP is odd results in a double bus fault, which puts
                       the CPU in a dead state until it’s reset.
                            Next,  think  through  the  litigation  potential  of  your  system.  Life-
                       threatening failure modes mean you’ve got to beware of simple watchdog
                       timers! If a single I/O instruction successfully keeps the WDT alive, then
                       there’s a real chance that the code might crash but continue to tickle the
                       timer. Some companies (Toshiba, for example) require a more complex se-
                       quence of commands to the timer; it’s equally easy to create a PLD your-
                       self that requires a fiendishly complex WDT sequence.
                            It’s also a very bad idea to put the WDT reset code inside of an in-
                       terrupt  service routine. It’s  always intriguing,  while debugging, to find
                       your code crashed but one or more ISRs still functioning. Perhaps the ser-
   132   133   134   135   136   137   138   139   140   141   142