Does Your Datacenter Have An SLA?

from the prove-it dept

I have great concerns about whether or not mission-critical applications are having their SLA's met in datacenters, whether they are hosted in-house, third-party supported, or any other form of datacenter-based hosting. First, consider the alternative: the server sits in a room next to your expert developers. Sure, it's probably a SOX violation, but I can tell you this much: that server will not go down often, and if it does, you can be sure that it will be restored as fast as humanly possible. That's the advantage to having an expert babysit your system. If you have two experts in different geographic locations and each babysits a server in case one goes down, then you have about the best support possible. However, for large systems, this may not be convenient, etc.

But how do you know that a datacenter-hosted app has this type of support? First, you need to know for sure what the SLA spells out in terms of support and monitoring. Look for this in your SLA:

"If your app encounters event W, person X will do Y about that specific event within Z amount of time"

I guarantee that anything less specific than that, or anything as specific that's not in writing in the SLA to that effect, will not be honored. Vague responses equal no responses, because why would the datacenter host open themselves up to liabilities by initiating a response that wasn't specified in writing? Specific, measurable responses with noted responsible parties are required to be honored for the SLA or the datacenter host can be held accountable for any failure to respond as specified.

So assume you have an acceptable SLA in place, and you know what they're supposed to do. How can you be sure they'll actually do the things they say they'll do? Well, you obviously need to know before you can count on your apps for something mission-critical, so while the mission-critical app is still running somewhere else (i.e. being babysat by an expert), you set out to prove that the support can respond -- by staging various types of failures. You could tell the host about the staged failure attempts, but then they'll know and will definitely staff and respond appropriately. I would stage failures and not tell the host that the failures are a test. After all, from the host's perspective, any failure is a failure. Be sure to measure closely the response and check if the SLA was honored as expected. Any failure to honor it, for any reason, should be a strong indication that the host is not prepared to honor the SLA, thus potentially costing you your mission-critical app.

Do not allow a complicated roll-over or automated monitoring to imply that the datacenter can respond to any event with seamless mission-critical app coverage. An inexperienced datacenter admin simply hitting the wrong button can send any app to Davy Jones' locker in a big hurry. If you truly want mission-critical backup performance, ask yourself what would happen if the datacenter was completely unresponsive? For example, what if it were hit by a hurricane and completely wiped out? How soon could you be back up and running, and at what capacity? If you can't answer that, you better find an answer before some unpredictable event knocks out your one server running everything.



Reader Comments (rss)

(Flattened / Threaded)

  1.  
    identicon
    mermaldad, Dec 8th, 2009 @ 5:36am

    Define your acronyms

    Nice article, but please define your acronyms:

    SLA = Service Level Agreement
    SOX = Sarbanes Oxley Act

     

    reply to this | link to this | view in thread ]

  2.  
    identicon
    moore850, Dec 10th, 2009 @ 11:43am

    Re: Define your acronyms

    Thanks. In the future, I'll tack definitions on the bottom of the article.

     

    reply to this | link to this | view in thread ]

  3.  
    identicon
    Music Search Engine, Mar 19th, 2010 @ 6:15am

    That's a good point, the SLA. However, even with SLA, there are outstanding concerns about data privacy. Not only making sure that the customer's data is not leaked and meeting government regulations on that topic, but also making sure that corporate data (beyond that on the customer) is not inadvertently leaked, especially when the cloud provided has a multi-tenant cloud.
    This is of course ignoring the needed ability for customers to change cloud provided. It appears the market is setting up to be, at least for the moment, several non-interoperable islands.

     

    reply to this | link to this | view in thread ]


Add Your Comment

Have a Techdirt Account? Sign in now. Want one? Register here
Get Techdirt’s Daily Email
Save me a cookie
  • Note: A CRLF will be replaced by a break tag (<br>), all other allowable HTML will remain intact
  • Allowed HTML Tags: <b> <i> <a> <em> <br> <strong> <blockquote> <hr> <tt>
Follow Techdirt
A word from our sponsors...
Essential Reading
Techdirt Reading List
Techdirt Insider Chat
A word from our sponsors...
Recent Stories
A word from our sponsors...

Close

Email This