Home > amazon, datacenter > EC2 isn't 50% slower

EC2 isn't 50% slower

February 27, 2008

I don’t want to start a nerdfight here, but it might be inevitable. :)

Valleywag ran a story today about how Amazon’s EC2 instances are running at 50% of their stated speed/capacity. They based the story on a blog post by Ted Dziuba, of Persai and Uncov fame, whose writing I really love.

Problem is, this time, he’s just wrong. Completely full of FAIL.

I’ll get to that in a minute, but first, let me explain what I think is happening: Amazon’s done a poor job at setting user expectations around how much compute power an instance has. And, to be fair, this really isn’t their fault – both AMD and Intel have been having a hard time conveying that very concept for a few years now.

All of the other metrics – RAM, storage, etc – have very fixed numbers. A GB of RAM is a GB of RAM. Ditto storage. And a megabit of bandwidth is a megabit of bandwidth. But what on earth is a GHz? And how do you compare a 2006 Xeon GHz to a 2007 Opteron GHz? In reality, for mere mortals, you can’t. Which sucks for you, me, and Amazon – not to mention AMD and Intel.

Luckily, there’s an answer – EC2 is so cheap, you can spin up an instance for an hour or two and run some benchmarks. Compare them yourself to your own hardware, and see where they match up. This is exactly what I did, and why I was so surprised to see Ted’s post. It sounded like he didn’t have any empirical data.

Admittedly, we’re pretty insane when it comes to testing hardware out. Rather than trust the power ratings given by the manufacturers, for example, we get our clamp meters out and measure the machines’ power draw under full load. You’d be surprised how much variance there is.

There was one data point in a thread linked from Ted’s post that had me scratching my head, though, and I began to wonder if the Small EC2 instances actually had some sort of problem. (We only use the XLarge instance sizes) This guy had written a simple Ruby script and was seeing a 2X performance difference between his local Intel Core 2 Duo machine and the Small EC2 instance online. Can you spot the problem? I missed it, so I headed over to IRC to find Ted and we proceeded to benchmark a bunch of machines we had around, including all three EC2 instance sizes.

Bottom line? EC2 is right on the money. Ted’s 2.0GHz Pentium 4 performed the benchmark almost exactly as fast as the Small (aka 1.7GHz old Xeon) instance. My 866MHz Pentium 3 was significantly slower, and my modern Opteron was significantly faster.

So what about that guy with the Ruby benchmark? Can you see what I missed, now? See, he’s using a Core 2 Duo. The Core line of processors has completely revolutionized Intel’s performance envelope, and thus, the Core processors preform much better for each clock cycle than the older Pentium line of CPUs. This is akin to AMD, which long ago gave up the GHz race, instead choosing to focus on raw performance (or, more accurately, performance per watt).

Whew. So, what have we learned?

  • All GHz aren’t created equal.
  • CPU architecture & generation matter, too, not just GHz
  • AMD GHz have, for years, been more effective than Intel GHz. Recently, Intel GHz have gotten more effective than older Intel GHz.
  • Comparing old pre-Core Intel numbers with new Intel Core numbers is useless.
  • “top” can be confusing at best, and outright lie at worst, in virtualized instances. Either don’t look at it, or realize the “steal %” column is other VMs on your same hardware doing their thing – not idle CPU you should be able to use
  • Benchmark your own apps yourself to see exactly what the price per compute unit is. Don’t rely on GHz numbers.
  • Don’t believe everything you read online (threads, blogs, etc) – including here! People lie and do stupid things (I’m dumb more often than I’m not, for example). Data is king – get your own.

Hope that clears that up. And if I’m dumb, I fully expect you to tell me so in the comments – but you’d better have the data to back it up!

(And yes, I’m still prepping a monster EC2 post about how we’re using it. Sorry I suck!)

  1. rick
    February 27, 2008 at 8:49 pm

    yeah, but the real question is, why don’t you ever hit #uncov anymore? ;)

  2. February 28, 2008 at 12:05 am

    Nice post, Don! Thanks for your real-world perspective.

  3. Wolfgang
    February 28, 2008 at 1:15 am

    “A GB of RAM is a GB of RAM. Ditto storage. And a megabit of bandwidth is a megabit of bandwidth.”

    That must be why “500 GB” hard drives can store at most 465,7 GB — and that includes all the overheads from file systems and necessary reserve spaces to handle fragmentation gracefully.

    And why many 1 MBit ADSL connections carry at most “1000^2 bit/s” which includes protocol overhead (often enough including things Ethernet-over-PPP).

    And try to get 4.7 GB on a single layer DVD, instead of “salesman GBs” (1000^3 Bytes).

    RAM chips deliver 1024 byte per Kb — but they used to deliver more.
    Remember parity bits and faked parity bits? These parity bits could do a terrific job today for ECC. Hard drives have ECC, processor caches have ECC, DSL has ECC, CDs have ECC, DVDs have ECC … RAM has not.

    “Comparing old pre-Core Intel numbers with new Intel Core numbers is useless.”

    That was, more or less, always so.
    Compare an 8086 vs 80286 vs 80386.
    Look at “clock cycles” for your assembler instruction, and see how CPUs became more and more economical. Not to speak of gaining more expressive vocabulary (both with wider registers, so one could do 16 and 32-bit ops and with new commands).
    ANDing 2 registers: 3, 2 and 1 clock step, respectively. Just a factor or 2 or 3 …

    Some operations got slower, e.g. rotating bits through carry (a circular shift of carry+register): 2,2,9 for bit in a register …

  4. February 28, 2008 at 7:32 am

    I’ll give you a different (real world) perspective.

    First, I think EC2 and the other Amazon services are awesome. But they are certainly not a match for everybody and each application. Since your post is about gigahertzes and performance I will also talk about it from that perspective.

    I can tell you the following about performance. We’re running highly optimized (C++) code for image processing. It’s been profiled to death, tuned for Intel and AMD and given love with the right compiler options. Apart from algorithm improvements, it basically runs as fast as possible at the moment.

    We came to the following conclusions:

    A ‘small’ EC2 instance is about *eight* times slower than a $800 USD bare bones rackmountable Intel Core 2 Quad 2.4 Ghz / 2GB memory / 80GB disk server. Yes you can get them that cheap.

    Keeping a small instance up and running on EC2 24/7/365 will cost you about $800. But that is for 1/8th of the performance. Do the math.

    The newer fast EC2 instances are absolutely faster than the small instances. Our benchmarks show that they are basically as fast as the Core 2 Quad 2.4 Ghz. But then they are also 4 or 8 times more expensive per month. If you are doing it for just the CPU power then the prices become insane.

    So for us the math was simple: If you need a lot of these then getting some rackspace of your own and picking cheap hardware beats EC2 in a big way.

    You will probably say that there is a large overhead of managing such an infrastructure. But really, if you design it properly and assume things will break (failure is an option) then you should be fine and have minimal maintenance costs. So far it works extremely well for us.

    We are probably a corner case use-case but I think it is still a nice perspective on things.

    S.

  5. February 28, 2008 at 8:43 am

    @Stefan Arentz:

    Doesn’t sound like you have a performance problem, it sounds more like you have a $ / CPU complaint. (Which in your case may be totally valid, and IMHO, more important than the actual performance. But I was only talking about raw performance, not the cost of that performance).

    I don’t think there’s anyone who’d question that doing a fixed EC2 cluster is more expensive than building your own fixed cluster. The only two things that sway the situation in EC2’s favor are:

    – free transfer into/out of EC2, S3, and the other services
    – elasticity. You can use 1000 CPUs for 1 hour if you’d like, and never use EC2 again.

    Those two are easily the reason we use EC2, since we’re clearly capable of building, scaling, and managing our own infrastructure.

    I do think that 8X slower than a Core 2 @ 2.4GHz sounds high. I would have expected more like 3-4X. What’s your use case? Do you have a benchmark I can try myself? All I care about is real, measured, empirical data.

  6. February 28, 2008 at 9:03 am

    I made a mistake, the same code actually ran 4 times slower on a small EC2 instance. This was six months ago. Our code is now much faster but if the EC2 hardware is the same then you can still expect the same 4x ratio.

    I can’t disclose the benchmarks or numbers but the application is pattern recognition.

    S.

  7. February 28, 2008 at 9:18 am

    When I say four times slower I mean 1 EC2 small *core* vs 1 Core2Duo *core* btw. So the Core 2 Quad is in effect 16 times faster when fully utilized. I’ll try to produce some real numbers later at the airport to kill time.

  8. February 28, 2008 at 9:21 am

    @Stefan:

    Yeah, so 4X I totally buy. That sounds right.

    If, even given the free S3/EC2 transfers and the elasticity, that doesn’t work for you – don’t use it.

    There are tons of things inside of SmugMug we don’t put on EC2 because it just doesn’t make sense. But there are plenty that we do. :)

  9. February 28, 2008 at 9:36 am

    @Don

    Take a (smug)mug of coffee and go ahead writing about your EC2 experience. We are all waiting for your monster post :)

  10. February 28, 2008 at 9:38 am

    Don,
    What did you use for benchmarks? Can you maybe just give a quick word on what software/process you used?

    Thanks,
    – Chris

  11. February 28, 2008 at 9:49 am

    @Chris Munns:

    For our internal real-world benchmarks, the process is simple:

    – grab a variety of photos & videos
    – time how long they take to render

    Since that’s all our EC2 instances do, it’s the only metric we care about.

    As for this particular test case, though, we used the simple Ruby benchmark linked in the thread above.

  12. February 28, 2008 at 10:45 am

    @Don

    “Doesn’t sound like you have a performance problem, it sounds more like you have
    a $ / CPU complaint. (Which in your case may be totally valid, and IMHO, more
    important than the actual performance. But I was only talking about raw
    performance, not the cost of that performance”

    This is isomorphic to a performance problem. Most startup companies that use EC2
    don’t have an infinite budget. So $ for $ you’re running 1/Nth slower on EC2.

    This is assuming that your application is slower on EC2.

    We don’t/can’t use EC2 becuase we’re too close to the hardware. SSD+RAID tuning,
    etc prevents us from thinking about virtualizing our hardware. That and the
    prices is about 4x more for EC2 when you factor in bandwidth.

    ServerBeach does a great job for us…. basically almost identical to EC2 in
    that you can bring new hardware up in about 1-2 hours. You can’t shut it down
    easily like EC2 though..

  13. February 28, 2008 at 4:15 pm

    @Kevin:

    In some respects, $ for $ for raw performance, using your own hardware could still be more effective, but as Don mentioned, the ability to spawn 1000 instances for an hour is where EC2 wins. In a more real world scenario where your $ for $ performance mattered, you could build your infrastructure around your own hardware with the capacity to leverage EC2 under high loads. Building your own hardware to grow by 250% (if each instance is 1/4th a normal one) can be very costly, but with EC2, there is no upfront costs.

    Overall, it is probably better to put stuff where performance matters on your own hardware, and put stuff where scalability/elasticity matters on EC2, or to allow overflow for performance stuff on EC2.

  14. damien morton
    March 1, 2008 at 4:40 pm

    Seems to me the only sensible and cost-effective way to use EC2 is to have your own hardware to handle the baseload, while using EC2 to handle any spikes in usage.

  15. April 1, 2008 at 9:00 pm

    (And yes, I’m still prepping a monster EC2 post about how we’re using it. Sorry I suck!)

    So where’s this fabled EC2 post? ;-)

  16. April 2, 2008 at 10:23 am

    What about the GREEN factor? The cost of power? I pay 26 us cents for each KWH here in CA. So a computer for 24/7/365 can cost more than its weight in $ for cost in power by the end of the year.

    AC cost?
    Power cost?
    Floor space?
    My Time doing the installs?
    Toxic waste when the new version comes out in 6 mo….

    When you factor all of the costs into a computer the hardware cost is just a small part for small farms. As the Farm grows it tips the scale the other way a bit. But I would still say that run time cost is the big number, not the cost of the hardware (think lease over time of the Hardware cost at 10% cost of cash to make the numbers easy).

    Your $800.00 system costs: ($800.00 * .1 )/12 = $ 6.67 cost of HW/mo.
    300 watts: .3*.26*24*30 = $56.00 cost of system power alone /mo.
    AC cost? who knows?
    So at $73.00/mo for EC2 us CA users are bucks ahead!!!!!

    BTW: CA power in my area is the all time high for the US I think!!!

  17. April 2, 2008 at 12:03 pm

    @erik

    To be honest, I’m waiting for a specific EC2 announcement before posting it, since it changes what I’ve written (but not posted).

    As soon as they announce it (and no, sorry, I can’t talk about what it is), I’ll finish up my entry.

    Sorry for the delay!

  18. April 2, 2008 at 1:34 pm

    @Don

    Cool – no problem. I’m just a wee bit curious :-)

  19. chris
    April 7, 2008 at 7:08 pm

    Not to be overlooked is the quarter gig network drop on each instance. It may not have an sla or be guaranteed to it (there was a network outage today) but anteceotally I have found the Internet connection alone worthy of a 70/mo.

  20. Hightechrider
    May 19, 2009 at 7:20 am

    My own benchmarking comparing a dedicated GoDaddy Core 2 Duo server against an Amazon EC2: High-CPU Medium Instance shows that Amazon EC2 is about 30% slower and costs the same per month. This on on a fairly complex image rendering task.

  21. November 6, 2009 at 3:59 pm

    Love it! You got me so excited to get one and start shooting video!

  1. No trackbacks yet.
Comments are closed.
Follow

Get every new post delivered to your Inbox.

Join 34 other followers

%d bloggers like this: