EC2 isn't 50% slower
I don’t want to start a nerdfight here, but it might be inevitable.
Valleywag ran a story today about how Amazon’s EC2 instances are running at 50% of their stated speed/capacity. They based the story on a blog post by Ted Dziuba, of Persai and Uncov fame, whose writing I really love.
Problem is, this time, he’s just wrong. Completely full of FAIL.
I’ll get to that in a minute, but first, let me explain what I think is happening: Amazon’s done a poor job at setting user expectations around how much compute power an instance has. And, to be fair, this really isn’t their fault – both AMD and Intel have been having a hard time conveying that very concept for a few years now.
All of the other metrics – RAM, storage, etc – have very fixed numbers. A GB of RAM is a GB of RAM. Ditto storage. And a megabit of bandwidth is a megabit of bandwidth. But what on earth is a GHz? And how do you compare a 2006 Xeon GHz to a 2007 Opteron GHz? In reality, for mere mortals, you can’t. Which sucks for you, me, and Amazon – not to mention AMD and Intel.
Luckily, there’s an answer – EC2 is so cheap, you can spin up an instance for an hour or two and run some benchmarks. Compare them yourself to your own hardware, and see where they match up. This is exactly what I did, and why I was so surprised to see Ted’s post. It sounded like he didn’t have any empirical data.
Admittedly, we’re pretty insane when it comes to testing hardware out. Rather than trust the power ratings given by the manufacturers, for example, we get our clamp meters out and measure the machines’ power draw under full load. You’d be surprised how much variance there is.
There was one data point in a thread linked from Ted’s post that had me scratching my head, though, and I began to wonder if the Small EC2 instances actually had some sort of problem. (We only use the XLarge instance sizes) This guy had written a simple Ruby script and was seeing a 2X performance difference between his local Intel Core 2 Duo machine and the Small EC2 instance online. Can you spot the problem? I missed it, so I headed over to IRC to find Ted and we proceeded to benchmark a bunch of machines we had around, including all three EC2 instance sizes.
Bottom line? EC2 is right on the money. Ted’s 2.0GHz Pentium 4 performed the benchmark almost exactly as fast as the Small (aka 1.7GHz old Xeon) instance. My 866MHz Pentium 3 was significantly slower, and my modern Opteron was significantly faster.
So what about that guy with the Ruby benchmark? Can you see what I missed, now? See, he’s using a Core 2 Duo. The Core line of processors has completely revolutionized Intel’s performance envelope, and thus, the Core processors preform much better for each clock cycle than the older Pentium line of CPUs. This is akin to AMD, which long ago gave up the GHz race, instead choosing to focus on raw performance (or, more accurately, performance per watt).
Whew. So, what have we learned?
- All GHz aren’t created equal.
- CPU architecture & generation matter, too, not just GHz
- AMD GHz have, for years, been more effective than Intel GHz. Recently, Intel GHz have gotten more effective than older Intel GHz.
- Comparing old pre-Core Intel numbers with new Intel Core numbers is useless.
- “top” can be confusing at best, and outright lie at worst, in virtualized instances. Either don’t look at it, or realize the “steal %” column is other VMs on your same hardware doing their thing – not idle CPU you should be able to use
- Benchmark your own apps yourself to see exactly what the price per compute unit is. Don’t rely on GHz numbers.
- Don’t believe everything you read online (threads, blogs, etc) – including here! People lie and do stupid things (I’m dumb more often than I’m not, for example). Data is king – get your own.
Hope that clears that up. And if I’m dumb, I fully expect you to tell me so in the comments – but you’d better have the data to back it up!
(And yes, I’m still prepping a monster EC2 post about how we’re using it. Sorry I suck!)