Page 170 - How Cloud Computing Is Transforming Business and Why You Cant Afford to Be Left Behind
P. 170
MANA GEMENT STRATEGIES F O R THE CL OUD R EV OL UTION
incident. Apparent Networks, a network performance moni-
toring company, however, monitors network performance by
using more than one account per cloud data center. In the
northern Virginia outage, Apparent Networks had 20 accounts
running virtual machines with EC2, and 6 of the machines, or
30 percent, were unavailable. Its executives are careful to say
that they can’t tell whether a similar percentage of all cus-
tomer accounts were affected by the outage.
“On the whole, Amazon is extremely consistent” in both
steady data center operations and reporting incidents as they
occur, said Javier Soltero, CTO of management products at
VMware. He is the former CEO of Hyperic, the company be-
hind the open source code system that monitors cloud services
and is the basis for the free service at www.CloudStatus.com.
In the Amazon outage, he concedes, “We see a gap,” or a delay
between the occurrence of the incident and the time it was re-
ported. Whether that was due to the staff workload required
to fix the problem, a preference for getting a handle on an in-
cident before saying anything, or some other reason, “only
people at Amazon know for sure,” he said.
But the failure of both primary and backup power sup-
plies in EC2 should teach the unwary customer a lesson: keep
your recovery system in the cloud in a separate zone from your
primary system.
Like other EC2 terms, a zone means something specific to
Amazon Web Services, but there’s not necessarily a clear defi-
nition of the zone involved in this incident. The explanation I
received in a December 12 e-mail from Amazon said: “Avail-
ability Zones are distinct locations that are engineered to be
150