HDD IOPS limiting factor – seek or rpm?

Home > datacenter > HDD IOPS limiting factor – seek or rpm?

HDD IOPS limiting factor – seek or rpm?

October 8, 2007 Don MacAskill

Any storage experts out there? Can you forward this to any you may know?

An interesting thread developed in the comments on my post about Dell’s MD3000 storage array regarding theoretical maximum random IOPS to a single HDD. I’m hoping by bringing it up to the blog level, we can get some smart people who know what they’re talking about (ie, not me) to weigh in.

I’ve always believed that for a small random write workload, the revolutions per minute (rpm) of the drive was the biggest limiting factor. I think I’ve believed this for a few reasons:

It seems logical that the biggest “time waster” in seek time is probably rpm anyway. Even if the drive arm has found the right position on the platter, it likely has to wait some amount of time, up to a full revolution, before it can write.
rpm is a “fixed” number, and thus easier to calculate, than seek which is more variable. So taking the easy way out, one of my favorite hobbies, seemed appropriate.

Using this theory, a 7200rpm drive can do a theoretical maximum of 120 IOPS, and a 15K drive can do 250. Note that these are fully-flushed non-cached writes to the spinning metal, with no buffering or write combining. Over the years, my own tests seem have validated this theory, and so I’ve just always believed it to be gospel.

Tao Shen, though, commented that my assumption is wrong, and that seek time is the limiting factor that matters, not rpm, and that faster drives can deliver more IOPS than my rpm math. He posits that a 15K drive with a 2ms seek time can do 500 IOPS. Now, he may have access to better drives than I do, since I think our fastest are 3.5ms (best case scenario), not 2ms. That’s what the latest-and-greatest Seagate Cheetah 15K.6 drives seem to do, too.

So which is it? Am I totally smoking crack? Is he? Or is the truth that seek time and rpm are so intimately tied together that separating them is impossible?

How does one calculate theoretical maximum IOPS?

Categories: datacenter

Comments (10)

Jeff Bonwick

October 8, 2007 at 7:00 pm

You’re right that rotation dominates, but you’re off by a factor of 2 in the calculation. On average, you have to wait half a revolution, not a whole revolution, for a random I/O. Thus a 7200 RPM drive should be able to deliver 240 IOPS, and a 15K RPM drive should deliver 500. However, this is a little optimistic because when you’re waiting for a 270 degree rotation, seek time doesn’t matter, but when you’re waiting for a 10 degree rotation it does — in fact, if you can’t seek fast enough, you’ll end up waiting for 370 degrees in that case. And the seek distance matters too, because track-to-track seeks are very fast, while larger seeks take longer (because you have to decelerate the head, find the sync marks, and wait for the arm to stop ringing).

With a strong enough magnet and current, you can make seeks as fast as you need them. The limiting factor is rotation because the heads float on an air bearing, and the outer edge of a 3.5″ 15k RPM platter moves at 155 MPH — a category 5 hurricane, or Mach 0.21. There are all sorts of problems with the airflow if you spin much faster. (BTW, that’s why 2.5″ drives can spin faster — airspeed is linear in diameter.) So you first decide the rotation speed, then select (typically) the cheapest armature that can deliver comparable seek times. You can get incremental improvement with faster seeks (fewer low-angle-of-rotation I/Os will blow a rotation), but it’s a case of rapidly diminishing returns with rapidly increasing hardware costs and power consumption.
Jason Watkins

October 9, 2007 at 12:41 am

http://citeseer.ist.psu.edu/738362.html

Max iops is going to depend on your particular pattern of access. Theoretical max would be if you’re so very lucky that the blocks can be accessed in exactly the order they happen to fly under the head. So, that would be the product of revolutions and density.

For real workloads, both seek and spin matter. From a given block, there’s a set of blocks that are accessible within some given constant time. This is because you can move the seek head while you wait for rotation. So if you could figure out the locality of your access patterns, you could find predict iops.

1 / average seek time would seem a reasonable practical guess.
Tao Shen

October 9, 2007 at 1:38 am

@Don:

This is interesting:
http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=2010150014+1035507779&name=15%2c000+RPM

According to newegg, that’s the list of 15K drives they have:

And some spec they publish:

Average Latency: 2ms
Average Seek Time: 3.3ms
Average Write Time: 3.8ms

Supposely, a 15K rpm drive can do 250rps, inverse of that is 4ms per revolution. Jeff is right that, on average, you wait for half a revolution 180 degrees, or 2ms. So according to newegg, that spec is actually named “Average latency”. All 15K drives gives 2ms for average latency as a simple stat due to RPM.

Now the seek time 3.3ms is a function of the armature. And I also suppose it is the worst case guaranteed speed. Correct me if I am wrong, for the best case, the head is at the exact position, and it’s basically waiting for the drive to spin to the location. Also there is this “multiple platter” factor. Most 15K drives are at least 2 platter design.(?), so the System-on-chip controller used on the drive optimizes the seek on the two platters for random IO.

As to the MD3000’s cached writes…yes they bundle the writes and reads together, but what they are really doing is to operate the drives at higher IO queue depth. If you look at the storagereview charts at different IO queue depth level(io depth 1,2,4,….etc, 128) you will see that at higher IO queue depth, some drives deliver higher total IOPS performance. It is a function of how well the drive’s System-on chip controller’s firmware is optimized across multiple platters.

So when I said that 15K drives have a “maximum theoretical IOPS limit” of 500 due to 2ms seek, I used the wrong word and I appologize for it. Should have used rotational latency. But seek performance can helped a little by firmware optimization and IO queue depth. It’s due to the rotational latency, 15K drives can never do more than 500 IOPS, on average, provides that all IOs are random.

Storagereview said that at 128 IO queue depth, most 15K SCSIs do 350-380 IOPS, and the seagate one on top did 417,which is really good.

In the case of the DELL MD3000, the 15 drives are configured in 14 drive RAID 10, with 1 swap. So you will get maximum read performance of 14 drives, and maximum write performance of 7 drives. Let’s assume that the 6000 IOPS measured are mostly reads spread across 14 drives, gives you 428 average IOPS per drive for reading, which is within the reach of good 15K SAS drives.
Tao Shen

October 9, 2007 at 2:20 am

Same analysis can be done for the 7200rpm drives:

Average Latency: 4.2ms
Average Seek Time: 8.9ms
Average Write Time: 10.9ms

the 7200rpm = 120rps or 8.333ms or 4.1666ms average latency for 180 degree rotation.

But on those “nonenterprise drives”, they pair them up with the armature that on average does 8.9ms seek, giving you a maximum of 112 IOPS due to seek. or a maximum of 240 IOPS due to pure rotation.

It’s interesting: in a RAID configuration, depends on the optimization done on the raid controller, the real world performance tests seem to fall in between. [112, 240] for 7200rpm, [303, 500] for 15k rpm.
Joseph Kirby

October 9, 2007 at 10:23 am

Your assuming the drive needs to do linear reads/writes but if it needs to wait a full rotation for a sector it may be able to pick up some other sector before that time.

AKA if it write sectors 1, 10, 5 it may write them 1, 5, 10.
Joseph Kirby

October 9, 2007 at 10:24 am

PS: drives also have more than one platter.
Karl Schulmeisters

October 30, 2007 at 4:43 pm

Its not “MAX IOPS” it is “TYPICAL” or “AVERAGE MAX” IOPS.

As has been pointed out before, the “best case” scenarios beat these numbers significantly. Even with “random” IO you still have “locality of reference (ie when writing a 1 GByte image, you aren’t interleaving those writes with data from other sources on a byte-by-byte or even block-by-block basis).

As has been pointed out, BEST CASE is no-seek required, no spin latency before the read/write. And WORST CASE is seeking from one side of the platter to the other and then waiting 359deg of rotation.

Clearly the “average case” then is a seek of 1/2 the platter (where the “average seek time” comes from) and a 180deg revolution. But this assumes completely “dumb” IO on the part of the OS. On servers, with large disk-buffer caching, it is quite possible to write algorithms that are much more clever about how they lay down data.

For example the controller can “scatter-gather” out of the cache buffer and organize the I/O to minimize the seek distance. Of course you have to add in weighting factors like “age” so that a bit of data out on the far reaches doesn’t get “starved”, but this isn’t rocket science. For some Operating Systems, this is part of the difference between the “Server” versions and “desktop” versions.

Also as pointed out RAID controllers make this calculation even more complex.

But that said, All things being equal, your TYPICAL IOPS is going to be the inverse of (Average Latency+Average Seek)

Note also, that Writing IS slower than reading because reading relies purely on Spintronics http://en.wikipedia.org/wiki/Spintronics based sensing, whereas writing requires that you generate higher currents to influence the actual magnetic surface. BUT because you can’t effectively vary the speed of the disk, you spin the disk ALWAYS at the speed of Maxium writing.
huh?

June 13, 2008 at 3:56 am

what are you talking about people? may i ask? should i need high RPM and high seek time for my computer to run faster? in layman’s word please
complexxL9

January 27, 2009 at 11:55 am

so as far as I understood to know what IOps capability HDD has I can use
1000 / (seek time[ms] + latency[ms])= IOps

but how do I measure what IOps my system needs. Is it writes/s+reads/s when monitoring in perfmon?
auction

December 9, 2009 at 7:37 am

well, i couldn't get it clearly, could you please make it more clear. I am not a technical person. So, all I see when i go for buying hard drive is for RPM and Capacity.