newbie admin pitfalls of using Amazon AWS

August 27, 2012

Collected from experiences of self and other largely honest group of admins, AWS users group meeting (yes it’s like AA, we all say our name and begin our story) This post will not make sense to those who have not actually used Amazon AWS even once. If you have had a dab at it or your boss asked you to go figure AWS and “set us up a server or you have no chance to survive” then this might help you.

Drink Driving

1. Treating Amazon AWS resources as a regular Data Center or Dedicated Host provider.

AWS is just that, a web service. All servers are virtual and disks are theoretical, sysadmins are asleep. Do not put all your beers in one basket. Also there is no Basket, only snapshots and AMIs Largely the idea that servers are not unique and do not exist as physical hardware is hard to grasp. This leads to elaborate single server setup backed completely by EBS. Both EBS and Instances (not servers as they are not) can and will fail - Yoda ( It was either him or me. )

Best practise both in and out of AWS is to assume that after you have done your perfect CentOS6 LAMP setup the whole thing will disappear. Write Bash scrips, WMI, Images whatever to be able to recreate everything on a new hardware from scratch.

Assuming AWS resources as if you were in a data center with a bunch of noisy servers and your expensive SAN disks, is just an accident waiting to happen. Psychological help is advised.

2. I know ” I will use NFS”

That’s just the alcohol talking. When faced with multi-tenanted architecture most sysadmins and developers realize they need to share “these” files between “those” autoscale servers. noobie sysadmin pronto hooks up another “server” with large “harddisk” and NFS shares the hell out of it “voila, i made bomb”. The problem here is you cannot scale EBS (which is a network attached storage at best) and neither can you scale your NFS server. Entire setup is henceforth considered fail and waste of startup funding. Solution here is to code to use S3, databases, memcache. For not frequently updating files like source code there is Source control (github? codesion? take your pick) which can just as easily be checked out on every system at specific events (push/pull). But please do not commit your user uploaded files to Source control as well.

3. We have now added ELB, bring it on “spiky traffic”

Rum is widely assumed to bring on unrealistic and sometimes fatal acts of bravery. ELB is a slow autoscale system and takes time to “warm up” and serve requests. If you have mountain shaped traffic spikes its best to email Amazon support with your peak traffic data and ask them to “pre warm” your ELB. They know what to do. Also advisable to check on Amazon SES rates, DynamoDB table limits , RDS etc which are not autoscale resources. Setting up SNS alerts for thresholds is a must and should be set a bit earlier than “you shall not pass” messages starts hitting users. I use 70% as a good number for DynamoDB rates, RDS CPU usage, Connection usage etc.

4. Haha, 2 instances before product launch and 100+ after launch. I’ll just sit here and refresh my dashboard…

The cops will get to you first, period. Amazon AWS has resources limits. It is unlikely you can allocate unlimited number of resources. It is best to email support and find out. Ec2 Instances are limited to 20 instances per account by default. Therefore if you needed more and getting flooded by traffic, you are out of luck as support can take at best a day to lift your limits, justifiable reasons aside. In case you do find yourself in this situation that you have exceeded instance limit and need more right now, Spot Instances can come to your rescue. So far we have not seen any limits on those. They will buy you the time. Spot instances do not count towards your account quota limits.

5. I will use ELB as internal proxy, now have a neat gun of a MySQL load balancer as well.

Why would you snail mail your AA meeting flyers when you have everyone’s email address and facebook groups? We need to talk about this problem of yours. ELBs are essentially gateways and can be compared to your home modem. They are not routers and cannot determine if you are talking to a PC on the Home Network or trying to SSH tunnel to your office. Therefore you are essentially adding lag to your traffic with additional round trip. It’s like sharing files using DropBox between two home computes on the same LAN. If you notice the ELB does not seem to have an internal domain name unlike Instances. The VPC ELBs operation is different and we are not going to discuss it with someone like you till you recover from this hangover.

6. Lets use a single zone for all instances for autoscaling and save on the stupid inter AZ data transfer cost. I should get a bonus for this.

Getting free drinks is no excuse. You are obviously not making progress here.

ELB will refer to your instance zones and setup one ELB endpoint per zone (you only see one ELB for simplicity). If you have only one zone, you my friend, are going to have a problem, you also have only one zone ELB. It is quite common to see a single, or sometimes more than one zone, experience issues like network latency, faults etc. I personally was once baffled when 2 of the zones had no instances to spare and every one was spot bidding above the on demand price just to stay in the same zone. I should get some of the free drinks those guys got but I am trying to be sober here while I attend this night beach party.

7. I save cost of EBS IO by starting an instance store and then dd copying from EBS attached volumes to instance store. Why do I have x and y problems?

You are obviously in the wrong meeting. You need to get in touch with the nearest hospital or something. Why, just why?

8. We use resources from more than one REGION to server a single request.

Good Luck on your new adventures. You can add gambling to your list of problems.

More..coming too soon.