Page 65 - How Cloud Computing Is Transforming Business and Why You Cant Afford to Be Left Behind
P. 65

THE AMORPHOUS CL OUD



                 that Google operation managers are alerted, the failing server
                 is identified, and its workload is moved elsewhere before the
                 battery is exhausted. I suspect but do not know that all this
                 happens automatically. A human somewhere notes the server
                 outage. At some point during a regular maintenance sweep,
                 the power supply unit is replaced and the server is brought

                 back online, or perhaps the entire server is replaced when it
                 reaches a certain age.
                     Google officials have talked about how they’ve designed
                 their data center expecting such component failures. When
                 there are tens of thousands of servers working together, such
                 failures, which are infrequent for the home computer user,
                 start to occur on a regular basis. Disk drives fail, power sup-

                 plies fail, network interface cards fail, other components seize
                 up, and the server grinds to a halt.
                     In a paper outlining many aspects of the cloud data cen-
                 ter, Urs Holzle, senior vice president of engineering at Google,
                 and Luiz Barroso, Google distinguished engineer, say, “An ap-
                 plication (such as a search engine) running across thousands
                 of machines may need to react to failure conditions on an
                 hourly basis.” Holzle and Barroso have given us a major clue
                 to the rise of cloud computing: it achieves new economies

                 of scale yet remains broadly available to multitenant users
                 because it’s being managed by software, not humans, and it
                 achieves fault tolerance in that software, not the hardware.
                     For example, Google has designed its search engine oper-
                 ation with the expectation that one or more single nodes
                 within the cluster will fail. Rather than try to build infallibility
                 into the hardware, it has kicked the responsibility upstairs to



                                                                      45
   60   61   62   63   64   65   66   67   68   69   70