Page 378 -
P. 378

13.4   Dependable programming  361


                                          that an attacker may exploit by presenting the program with unexpected inputs
                                          that are not rejected by the system.

                                         Some standards for safety-critical systems development completely prohibit the
                                       use of these constructs. However, such an extreme position is not normally practical.
                                       All of these constructs and techniques are useful, though they must be used with
                                       care. Wherever possible, their potentially dangerous effects should be controlled by
                                       using them within abstract data types or objects. These act as natural ‘firewalls’ lim-
                                       iting the damage caused if errors occur.


                                       Guideline 5: Provide restart capabilities
                                       Many organizational information systems are based around short transactions where
                                       processing user inputs takes a relatively short time. These systems are designed so
                                       that changes to the system’s database are only finalized after all other processing has
                                       been successfully completed. If something goes wrong during processing, the
                                       database  is  not  updated  and  so  does  not  become  inconsistent.  Virtually  all
                                       e-commerce systems, where you only commit to your purchase on the final screen,
                                       work in this way.
                                         User interactions with e-commerce systems usually last a few minutes and
                                       involve minimal processing. Database transactions are short, and are usually com-
                                       pleted in less than a second. However, other types of systems such as CAD systems
                                       and word processing systems involve long transactions. In a long transaction system,
                                       the time between starting to use the system and finishing work may be several min-
                                       utes or hours. If the system fails during a long transaction, then all of the work may
                                       be lost. Similarly, in computationally intensive systems such as some e-science sys-
                                       tems, minutes or hours of processing may be required to complete the computation.
                                       All of this time is lost in the event of a system failure.
                                         In all of these types of systems, you should provide a restart capability that is
                                       based on keeping copies of data that is collected or generated during processing. The
                                       restart facility should allow the system to restart using these copies, rather than hav-
                                       ing to start all over from the beginning. These copies are sometimes called check-
                                       points. For example:


                                       1.  In an e-commerce system, you can keep copies of forms filled in by a user and
                                          allow them to access and submit these forms without having to fill them in again.

                                       2.  In a long transaction or computationally intensive system, you can automatically
                                          save data every few minutes and, in the event of a system failure, restart with the
                                          most recently saved data. You should also allow for user error and provide a way
                                          for users to go back to the most recent checkpoint and start again from there.


                                         If an exception occurs and it is impossible to continue normal operation, you can
                                       handle the exception using backward error recovery. This means that you reset the state
                                       of the system to the saved state in the checkpoint and restart operation from that point.
   373   374   375   376   377   378   379   380   381   382   383