Notes from the Cloud Academy: RAIC – Redundant Arrays of Inexpensive Cloud services

We have been running the Cloud Academy roundtables in several European countries. I’d like to share some of the more interesting questions, debates and insights around a number of topics, starting today with RAIC—Redundant Arrays of Inexpensive Cloud Services. Other topics will include:

  • A TV industry analogy: Competition for the IT department
  • Cloud Shortcuts: Can the Cloud make( internal) IT more agile
  • Service Level Management and the Cloud
  • Cloud R&R – Retained responsibilities for IT
  • Elastic Services: Everybody wants to be a manager  

Redundant Arrays of Inexpensive Cloud services
Today’s post discusses whether we can ensure performance and availability of public cloud services. I’m not sure we can. Public cloud services are a bit like the weather: we are lucky if we can predict what it is going to be like, but cannot manage or change it as we don’t control the underlying elements. The same holds true trying to “manage” public cloud services.

So what do we do? Give up on public cloud services altogether? No, that would be throwing out the baby with the bathwater. Instead, we can follow a method we have been using in IT for a long time. If we cannot count on a certain item to be always available, we make sure we have a fail over option.

The best example comes from storage. At a certain moment, people realized that even the most expensive disks encountered failures now and then. So they developed a strategy where failure of an individual disk is not so important. The result was RAID, a redundant array of inexpensive disks that, transparently tot the user, served the requested data from other disks in the array when one of the disks failed. In typical IT fashion, we used the name RAID 0 for a configuration where we had no raid at all, RAID 2 for 2 disks etc. The benefit of higher raid numbers is is that the predicted availability increases significantly by adding marginally more redundant capacity.

How do we apply a similar “redundant array” approach to cloud services? The idea of contracting for two email services or two CRM systems is counter-intuitive for most IT folks, since for years we strived to standardize on one of each . And the reality is that if half the company uses one email system and the other half another, 50% of the people are still down if one fails. So instead of looking at email in isolation, we should look at all the employee communication options. These may include email, instant messaging , VOIP, even a social media functions similar to Facebook or Twitter. If based on different technologies and sourced from different vendors, the chances of them all being down at the same time is extremely unlikely.

Using chat or instant messaging as a backup for email is not how we traditionally think in IT—- and challenging such traditional thinking is exactly the idea of the Cloud Academy – but it aligns with the next generation of IT users. An example: Teenagers (like the two living in my home) instantly switch from MSN to Google chat or to Hyves or Facebook or even to hotmail or text messaging, if the service they are using is behaving strangely. They are not particularly interested in whether a particular service is down; their only interest is whether they can continue to communicate with their friends.

Of course, since today’s IT departments proactively monitor the infrastructure and know the status of systems, they rarely get a call saying “all systems are down.. But that’s not true with external cloud services. We need to find an alternative early- warning system, something like a weather report on the status of the external cloud services our user depend upon. An interesting site in this context is http://www.unifiedmonitoring.com/.

So what conclusions did we reach in our (sometime heated) Cloud Academy debate?

Using public cloud services is another step in giving up control of the underlying components. Years ago, when companies bought the first computers , they were expected to program these themselves in Assembler. Later, they bought higher- level language compilers, followed by complete off the shelf software packages followed now by infrastructure and software as a service. Along each step, IT has lost some control, but in exchange we are no longer required to do all the work.

We do, however ,have to make conscious decisions when to cede control. This differs by industry, type of application and possible risk. Using public cloud services in many cases already makes sense today. But when using them, we need to have some way to monitor availability and outcome so that we can make smart or pragmatic tradeoffs and precautions when the services are not available.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s