by Devin Moore

Does Your Datacenter Have An SLA?

from the prove-it dept

I have great concerns about whether or not mission-critical applications are having their SLA's met in datacenters, whether they are hosted in-house, third-party supported, or any other form of datacenter-based hosting. First, consider the alternative: the server sits in a room next to your expert developers. Sure, it's probably a SOX violation, but I can tell you this much: that server will not go down often, and if it does, you can be sure that it will be restored as fast as humanly possible. That's the advantage to having an expert babysit your system. If you have two experts in different geographic locations and each babysits a server in case one goes down, then you have about the best support possible. However, for large systems, this may not be convenient, etc.

But how do you know that a datacenter-hosted app has this type of support? First, you need to know for sure what the SLA spells out in terms of support and monitoring. Look for this in your SLA:

"If your app encounters event W, person X will do Y about that specific event within Z amount of time"

I guarantee that anything less specific than that, or anything as specific that's not in writing in the SLA to that effect, will not be honored. Vague responses equal no responses, because why would the datacenter host open themselves up to liabilities by initiating a response that wasn't specified in writing? Specific, measurable responses with noted responsible parties are required to be honored for the SLA or the datacenter host can be held accountable for any failure to respond as specified.

So assume you have an acceptable SLA in place, and you know what they're supposed to do. How can you be sure they'll actually do the things they say they'll do? Well, you obviously need to know before you can count on your apps for something mission-critical, so while the mission-critical app is still running somewhere else (i.e. being babysat by an expert), you set out to prove that the support can respond -- by staging various types of failures. You could tell the host about the staged failure attempts, but then they'll know and will definitely staff and respond appropriately. I would stage failures and not tell the host that the failures are a test. After all, from the host's perspective, any failure is a failure. Be sure to measure closely the response and check if the SLA was honored as expected. Any failure to honor it, for any reason, should be a strong indication that the host is not prepared to honor the SLA, thus potentially costing you your mission-critical app.

Do not allow a complicated roll-over or automated monitoring to imply that the datacenter can respond to any event with seamless mission-critical app coverage. An inexperienced datacenter admin simply hitting the wrong button can send any app to Davy Jones' locker in a big hurry. If you truly want mission-critical backup performance, ask yourself what would happen if the datacenter was completely unresponsive? For example, what if it were hit by a hurricane and completely wiped out? How soon could you be back up and running, and at what capacity? If you can't answer that, you better find an answer before some unpredictable event knocks out your one server running everything.

Reader Comments

Subscribe: RSS

View by: Time | Thread

  • identicon
    mermaldad, 8 Dec 2009 @ 5:36am

    Define your acronyms

    Nice article, but please define your acronyms:

    SLA = Service Level Agreement
    SOX = Sarbanes Oxley Act

    reply to this | link to this | view in chronology ]

  • identicon
    Music Search Engine, 19 Mar 2010 @ 6:15am

    That's a good point, the SLA. However, even with SLA, there are outstanding concerns about data privacy. Not only making sure that the customer's data is not leaked and meeting government regulations on that topic, but also making sure that corporate data (beyond that on the customer) is not inadvertently leaked, especially when the cloud provided has a multi-tenant cloud.
    This is of course ignoring the needed ability for customers to change cloud provided. It appears the market is setting up to be, at least for the moment, several non-interoperable islands.

    reply to this | link to this | view in chronology ]

Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Use markdown for basic formatting. HTML is no longer supported.
  Save me a cookie
Follow Techdirt
Special Affiliate Offer

Report this ad  |  Hide Techdirt ads
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Chat
Report this ad  |  Hide Techdirt ads
Recent Stories


Email This

This feature is only available to registered users. Register or sign in to use it.