Sunday
Dec112011

First Steps on Amazon's CloudWatch

In this post we will go over some of the services offered by Amazon Web Services, focusing on CloudWatch in particular. I'm assuming that you either have some app running on AWS or at least you know the rudiments on how to launch one.

I don’t need no CloudWatch: my app is bullet-proof

Yes, you have thought out every contingency, every possible happenstance, and every little accident that the universe could throw at your app. You have plans named after most letters of the alphabet. Your app is so strong that it will outlive cockroaches in case of a nuclear attack. However, there's still that devil-spawn butterfly down there in Brazil, waiting for the TechCrunch feature about your app to go live so it can flap its multicolored wings and bring it to its sorry knees. Or better yet, that butterfly and its buddies are waiting for you to showcase the app to potential investors.

Back in the old days (1990's), the best practices to keep a server up and running included smearing garlic on the sides of the server rack, hanging clovers and horseshoes on the door and lighting a candle to Edward Murphy. This list is not exhaustive. Now, with the advent of cloud computing we don’t even know where the servers are, let alone have access to them without getting shot at or eaten by giant Dobermans.

OK, I need CloudWatch; how do I go about it?

Out of the box, the guys at AWS give you five metrics at a frequency of one sample every 5 minutes. For free. These metrics are CPU, Disk Reads, Disk Writes, Network In, and Network Out and are stored for two weeks. To have a look, log in to your AWS console and select the EC2 tab. Once you see the instance you're interested in, click on it: on the bottom half of the console, click the tab labeled "Monitoring". Do it, I'll wait here.

While we are on this screen, go to the "Description" tab and write down the instance id; we'll use it later. To get a collective impression of how things are running, select multiple instances on the top half of the console. The "Description" tab will show nothing useful, but the "Monitoring" section will render an aggregation of metrics, each instance with its own color.

There isn’t much you can do with these metrics but get a feeling on how your fleet is doing. Think of it as licking your finger to get a sense of where the wind is blowing.

In order to close the feedback loop, we need a way to take action based upon conditions we can measure. Enter CloudWatch. On the top of the console, go over the "CloudWatch" tab and select "All Metrics" on the Navigation panel. On the "Viewing" combo box, select "EC2: Instance Metrics". You'll see a list of metrics; we're after the one reading "CPUUtilization" for the MetricName field, and the instance id you wrote down earlier (told you we'll need it) for the InstanceId field. Clicking on it will bring forth a more detailed graph and some nifty buttons on the right side. You can play for a while and go reminiscing with your instance going back and forth in time, up to two weeks ago. The fun part begins with the "Create alert" button.

In the alarm creation wizard you can set up a threshold condition that will trigger the alarm. Once you set this threshold you can define what you want it to do. Alarms can be in one of three possible states: OK, ALARM and INSUFFICIENT DATA. When the threshold condition is reached for the period of time set, the alarm will go from OK to ALARM. This state transition could also initiate an action. Notice that the actions are performed only during state transitions. Let's have a look at this with an example: say you set an alarm whose threshold is 90% CPU for 30 consecutive minutes. During the first 29 minutes the instance is at 90% CPU the alarm is still in the OK state. Worse still, if the CPU is at 100% during those first 29 minutes, the alarm will remain in the OK state. Once the alarm transitioned to the ALARM state, if an action was defined for this transition it will be taken. The alarm will remain in this state until the condition does not hold, i.e. the CPU has been below 90% at least for one sample in the last 30 minutes.

Here's when CloudWatch' sampling frequency becomes relevant: you can have 1 sample every 5 minutes for free, or 1 sample per minute for a monthly fee per instance monitored (at the time of this writing, it was around $3.50/month/instance). If you are sampling once every 5 minutes, and you set the alarm condition for 10 minutes you are actually evaluating only two samples. Conversely, if you take many samples at that rate, things can get very bad till the moment you notice. In short: spend the three fifty and sleep better.

 

would you rather give 3.50 to the loch ness monster?

There's also the third state: INSUFFICIENT DATA. The alarm enters this state when, as is implied in the state's name, it does not get data from the instance. Setting an action for a transition to this state can be immensely valuable: it basically tells you that your instance is dead.

Regarding the actions that can be taken upon transitions: the most obvious is of course sending an email. To set this up you need to define a channel of sorts in the Simple Notification Service (SNS) tab. You set a topic ("database-machines") and assign to it subscriptions (email addresses of people on your team). More interesting alternatives to email are HTTP, SMS and SQS. With HTTP you can set up a web service somewhere that reacts to the alert. And with SQS you can dispatch messages through Amazon's Simple Queue Service; how cool is that?

All that is pretty cool… what else can CloudWatch do?

I'll write about custom metrics and auto scaling in the upcoming posts. Stay tuned!

 

 

PrintView Printer Friendly Version

EmailEmail Article to Friend

Reader Comments (1)

Hello Friends....
DNS30 Pro-Edition - User Interface for Amazon Route53 services is releasing very soon. It will provide fast and convenient way to access route53 services. Web interface for this service is also available.
http://www.dns30.com

January 31, 2012 | Unregistered CommenterNisha

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
Some HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <code> <em> <i> <strike> <strong>
« Porting from Ruby 1.8.7 to Ruby 1.9.3: An Oddyssey | Main | Mock It All Up »