Sun Honeymoon Update: Storage
As I mentioned in my review of the Sun X2200 M2 servers we got recently, which we absolutely loved, Sun’s storage wasn’t impressive at all. In fact, it was downright bad. But before I get into the gory details, I feel compelled to mention that I believe Sun’s future, including storage, is bright. Their servers rock, they’re innovating all over the place, and for the most part, the people at Sun have been fantastic to work with – even when they’re being told their storage hardware sucks. That’s impressive. Now, on with the show:
The storage arrays I’m blogging about here are Sun StorEdge 3320 SCSI arrays. For more on why we chose this particular model, you can read about my on-going search for The Perfect DB Storage Array. The bottom line, though, for us is that Speed Matters. Their list price is quite expensive, but Sun was willing to work with us on the price, and we managed to get things into a reasonable ballpark. Reasonable, that is, as long as they performed. 🙂
First, some details. These boxes were destined to be part of our DB layer, with the first few going in as new storage for replication slaves. The goal was to maintain a high number of small (4K) i/o operations per second (IOPS) with an emphasis on writes, since scaling reads is easier (add slaves) than scaling writes (only so many spindles you can add, etc). In this particular case, the writes were being delivered from a MySQL master using InnoDB running Linux on 3 effective 7200rpm spindles, so the Sun array, on paper, should be able to keep up, no sweat. If your needs differ, our story might not be useful – test for yourself.
Installing and configuring them was an adventure. Craig Meakin, our Server Surgeon, was tasked with installing them and immediately ran into a snag. When configured for DHCP management access (which is how they were set up out of the box, exactly how we like them), they wouldn’t actually DHCP an IP address. It took someone at Sun wading through 4 different manuals to determine that not only did the array have to toggled to DHCP, but you also had to write “DHCP” in the IP address spot to make it work. Strike one.
(As an aside, one of Sun’s engineers also told us, after we’d bought them and installed them in the rack, that these storage arrays don’t come with battery backed write caches. Given how expensive they are, I was shocked and furious, but quickly got verification that they do, indeed, have BBWC.)
We brought one online and moved a DB slave snapshot over which was a few hours out-of-date and started replication so it could catch up with the master. Obviously, it wasn’t live and in production, so it was mostly spooling and committing writes from the master, only doing reads as needed for updates and whatnot. A very light load, in other words. With interest, we started timing how fast it would catch up, since it should scream. We were betting at least 2X (15K drives, after all) faster than the master, and on par with our other 15K SCSI slaves. Instead, we measured more than 4X *slower* than our slaves. Strike two.
Ok, no worries. Obviously this is a new array, and we did something terribly stupid setting it up. Sun support to the rescue, right? So we opened up tickets, dumped our config and all other relevant details to them. Nada. Oh, they came back with lots of suggestions and things to try. But none of them helped. Next step was to grab detailed system i/o statistics on production slaves which worked and Sun SE3320s that didn’t, so Sun could compare. And compare they did – their data showed a 6X performance differential between our production slaves (which had $700 off-the-shelf LSI SCSI MegaRAID cards in them) with 15K disks and Sun’s hardware. Sun was 6X slower. Final verdict? “System is performing as designed.” Strike three – they’re out!
Frantically, since the entire reason we had gone with Sun was because Rackable had shipped us a bunch of broken units and we were now months behind on an expansion project, we called Dell and ordered some PowerVault MD3000 SAS arrays. I always give Dell props for fast, efficient delivery, and knock them for a lack of innovation. In this case, they not only got us the gear fast, but the MD3000 turned out to be a fantastic DAS device and nearly perfect for our needs. Thank goodness!
Normally, that would be the end of our little tale, but as luck would have it, when Sun realized they’d laid an egg with the SE3320, they rushed us an engineering sample of their not-yet-announced (then) new StorEdge 2540 array. The good news? It performed neck-and-neck with the Dell array and uses SAS drives, which we prefer over SCSI. The bad news? They weren’t out yet and we needed storage yesterday. I believe they are out now, and I would buy the 100% SAS version, the StorEdge 2530, rather than 2540, for use in our datacenters if we hadn’t gotten the Dells.
So now we’ve got fantastic Sun servers attached to fantastic Dell storage. And our little franken-servers are as happy as can be. Fast, too.
Feed readers: Digg this story