Archive
Amazon announces SimpleDB (in Beta)
Sweet! Amazon finally took the wraps off of SimpleDB. They’ve been working on this for awhile, and as you can probably tell, it’s a natural fit with S3 and EC2. There’s a great write-up about it over on inside looking out.
This is nearly a perfect solution for some of our data-related scaling challenges, except for two issues:
- Physical proximity. Some of my datacenters aren’t close to Amazon’s, so the actual time to query SimpleDB is query time plus latency. This isn’t a problem if you’re doing all your queries from EC2, but we’re not there yet (we’d like to be, but a few pieces are missing. SimpleDB is one of those pieces, so we’re getting closer…). Amazon has promised me they’re workin the speed of light issue. 😉
- Attribute size limits. We have some data fields that are longer than 1024 bytes (most aren’t and would work fine). We’ve thought about chunking the data up to get around this, which is a possibility, but it gets messy. Storing them in S3 is both overkill and probably too slow – if I need to get a few thousand photo captions *fast*, doing it through S3 isn’t optimal. If we could solve the latency problem I already mentioned, I’d be fine storing that specific data in some other store and working around it that way.
On the plus side, SimpleDB should be screaming fast, incredibly scalable, and almost all of our SQL queries would work with no changes other than syntax. Like many of you, I’m sure, we’re using much of our RDBMS as a fairly simple data store and aren’t using many advanced RDBMS capabilities. All of those queries could just use SimpleDB and then we could devote our DB iron to just the rare complex queries. We’re not alone – tons of web apps are gonna love this.
I’m thrilled to see the Amazon AWS stack continue to grow, and I’m shocked that they have as big of a lead as they do. I would have thought Microsoft / Google / Sun / whomever would have been out with some competition by now. It’s gonna happen – but I never would have guessed it would take this long.
Oh, and while I have your attention – SmugMug is now a fairly heavy user of EC2 and I have a write-up coming. So check back later if that’s of interest.
Companies That Listen: Sun
I’m a sucker for companies that listen to their customers. I’m sure you are too. How many times have you gotten a product that’s nearly perfect but is missing that final touch? Or worse, the product just doesn’t live up to it’s expectations? Don’t you usually feel helpless in the face of some huge software/electronics/car/whatever company? I know I do.
For example, the monopolistic cable company I’m forced to use, Comcast, hasn’t figured out how to deliver TV to my house for more than a month (isn’t that sorta what they do?) – and I’m helpless!
I’m happy to report that Sun listens to their customers. Really, truly, listens. Even to small ones like me. Even to small ones like me who complain loudly when a product isn’t right (but who cheer equally loudly when it is).
As you may have gathered from Jonathan Schwartz’s blog post ‘The Internet As Customer’, we were one of the attendees at Sun’s information gathering event, and it was fascinating.
One of my biggest takeaways (other than that Sun listens to their customers) is that Sun’s customer base is amazingly schizophrenic. Check out this small cross sample of some of them:
- Some customers don’t want to buy Sun hardware unless they’ve embraced Linux (like, say, us). Others are freaked out that Sun is embracing Linux and are afraid it shows a lack of commitment to Solaris. (Wonder what they think about the new Windows deal? 🙂 )
- Some customers wouldn’t even be customers if it weren’t for AMD/Intel support (us again). Others see this as the death knell for Sun’s custom hardware and are worried.
- Some customers don’t want to use Sun technologies unless they’re open source (us yet again). Others think Sun’s giving away the farm and that proprietary software (and hardware!) is the only way to survive.
- Some of us can’t stand the complicated buying process and just want ‘Amazon for servers’ through a web UI (can you guess if this is us?). Others love having complicated, but complete and thorough, ordering channels.
- A few of them worry that a focus on Java could possibly mean a de-emphasis of datacenter technologies (we don’t use Java, but this isn’t a fear I share). Others wish Sun would just focus on the most important thing to them, Java, and get rid of all this boring datacenter muck already!
I hope you get the general idea – and I’m so super glad that I don’t have to deal with a customer base nearly this broad and fractured. Whew! I don’t know how they do it!
A few quick notes:
- This was an incredibly expensive event for Sun. Not in the the-food-must-have-cost-a-fortune sense of the word, but in the sheer-man-hours sense of the word. Going to the event, I knew Jonathan was speaking for an hour or so on the first day. I assumed that, being a busy guy with a multi-billion-dollar business to run, he’d speak and then leave to go run Sun. How wrong I was. Jonathan stayed the entire time, as did Scott McNealy, and an amazing braintrust of top executives and engineering talent. I completely believe it was absolutely worth it for much of Sun’s brainpower to be focused on listening to their customers – but honestly, I was surprised to see them actually do it.
- About 6 months ago, we asked Sun for a product that would be incredibly difficult to design, but would dramatically change how we build datacenters. They nodded, said they’d look into it, and we crossed our fingers. Apparently we weren’t the only ones, because it’s coming – and it’s far better than we had initially asked for.
- One of the attendees, who spends obscene, ungodly amounts of money with IBM, can’t even get engineering staff on the phone. Apparently, IBM has a big sales force who’s trained to buffer customers away from the engineers. Ugh. It’s an attitude like that which ensured IBM came in dead last in our vendor shoot-out. They literally didn’t want our business. Thank goodness Sun gets me in front of technical people when I need it.
- I only read the dress code requirements after arriving. They said “Business” for the meetings. Since I don’t even own any “Business” clothes, that was a problem. T-shirt, Crocs, and a baseball cap it was! (And, of course, no-one cared. Or they were polite enough not to say anything 🙂 )
All in all, I’m still feeling pretty dang good about our decision to go with Sun for our servers. An emphasis on innovation and willingness to listen to their customers is a winning strategy in my book.
I get SLAs now. Duh.
Ok, so I guess I’m a total n00b. In hindsight, SLAs make a lot of sense after all. The whole point isn’t to compensate SmugMug for our loss, it’s to make it unprofitable for the service provider to keep making the same mistakes.
In other words, let’s say Amazon’s margins on S3 are 15%. (I have no data, I’m just picking that number out of the air). If Amazon has a serious problem during a month, they have to cough up 25% to all their customers. In other words, they lose 10% instead of make 15%.
That’s pretty major incentive – and it now totally makes sense why SLAs are so highly valued.
Carry on.
Amazon S3 SLA is here! (Nirvanix dies?)
Amazon has finally released and put into effect their SLA for S3. I know a lot of my readers will be thrilled about this. 🙂
I’ve gotten a few questions about Nirvanix in the past month or so, especially about the fact that they offer an SLA (and that S3 didn’t). I think this probably puts the final nail in Nirvanix’ coffin because:
- Why would you trust Nirvanix, a no-name company, with your precious data?
- Worse, they’re affiliated with MediaMax/Streamload in some way, who have a reputation of poor service. (I’ve even seen reports of data loss at Streamload, though I haven’t bothered to check).
- Just how much is an SLA worth when there’s nothing behind it to back it up?
- They’re more expensive than Amazon. Um, duh.
SLAs don’t mean a lot to us, anyway, as I’ve said before because:
- Everything fails sometimes.
- The SLA payment is rarely comparable to the pain and suffering your customers had to deal with.
But I know it’s very important to lots of people, so I expect there’s cheering and dancing in the streets. 🙂
UPDATE: I get SLAs now. Sorry for being dumb.
HDD IOPS limiting factor – seek or rpm?
Any storage experts out there? Can you forward this to any you may know?
An interesting thread developed in the comments on my post about Dell’s MD3000 storage array regarding theoretical maximum random IOPS to a single HDD. I’m hoping by bringing it up to the blog level, we can get some smart people who know what they’re talking about (ie, not me) to weigh in.
I’ve always believed that for a small random write workload, the revolutions per minute (rpm) of the drive was the biggest limiting factor. I think I’ve believed this for a few reasons:
- It seems logical that the biggest “time waster” in seek time is probably rpm anyway. Even if the drive arm has found the right position on the platter, it likely has to wait some amount of time, up to a full revolution, before it can write.
- rpm is a “fixed” number, and thus easier to calculate, than seek which is more variable. So taking the easy way out, one of my favorite hobbies, seemed appropriate.
Using this theory, a 7200rpm drive can do a theoretical maximum of 120 IOPS, and a 15K drive can do 250. Note that these are fully-flushed non-cached writes to the spinning metal, with no buffering or write combining. Over the years, my own tests seem have validated this theory, and so I’ve just always believed it to be gospel.
Tao Shen, though, commented that my assumption is wrong, and that seek time is the limiting factor that matters, not rpm, and that faster drives can deliver more IOPS than my rpm math. He posits that a 15K drive with a 2ms seek time can do 500 IOPS. Now, he may have access to better drives than I do, since I think our fastest are 3.5ms (best case scenario), not 2ms. That’s what the latest-and-greatest Seagate Cheetah 15K.6 drives seem to do, too.
So which is it? Am I totally smoking crack? Is he? Or is the truth that seek time and rpm are so intimately tied together that separating them is impossible?
How does one calculate theoretical maximum IOPS?
Dell MD3000 – Great DAS DB Storage
So I’ve written about storage before, specifically our quest for The Perfect DB Storage Array and how Sun’s storage didn’t stack up with their excellent servers. As you can probably tell, I spend a lot of my time thinking about and investigating storage – both small-and-fast for our DBs and huge-and-slower (like S3) for our photos.
I believe we’ve finally found our best bang-for-the-buck storage arrays: Dell MD3000. Here’s a quick rundown of why we like them so much, how to configure yours to do the same, and where we’re headed next:
- The price is right. I have no idea why these companies (everyone does it) continue to show expensive prices on their websites and then quote you much much cheaper prices, but Dell is no exception. Get a quote, you’ll be shocked at how affordable they really are.
- DAS via SAS. If you’re scaling out, rather than up, DAS makes the most sense and SAS is the fastest, cheapest interconnect.
- 15 spindles at 15K rpm each. Yum. Both fast and odd. Why odd? Because you can make a 14 drive RAID 1+0 and have a nice hot spare standing by.
- 512MB of mirrored battery-backed write cache. Use write-back mode to have nice fast writes that survive almost all failure scenarios.
- You can disable read caching. This is a big one. Given we have relatively massive amounts of RAM (32GB on server vs 512MB on controller) *and* that the DB is intelligent at reading and pre-fetching precisely the stuff it wants, read caching is basically useless. Not only that, but it harms performance by getting in the way of writes – we want super-fast non-blocking writes. That’s the whole point.
- You can disable read-ahead prefetching. Again, our DB does its own pre-fetching already, so why would we want the controller trying to second guess our software? We don’t.
- The stripe sizes are configurable up to 512KB. This is important because if you’re going to read, say, a 16KB page for a DB, you want to involve only a single disk as often as you can. The bigger the stripes, the better the odds are of only using a single disk for each read.
- The controller ignores host-based flush commands by default. Thank goodness. The whole point of a battery-backed write-back cache is to get really fast writes, so ignoring those commands from the host is key.
- They support an ‘Enhanced JBOD’ mode where by you can get access to the “raw” disks as their own LUNs (in this case, 15), but writes still flow through the write-cache. Why is this cool? Because you can move to 100% server-controlled software storage systems, whether they’re RAID or LVM or whatever. More on this below…
Ok, sounds good, you’re thinking, but how to I get at all these goodies? Unfortunately, you have to use a lame command-line client to handle most of this stuff and it’s a PITA. However, you asked, so here you go (commands can be combined):
- disable read cache: set virtualDisk[“1”] readCacheEnabled=FALSE
- disable read pre-fetching: set virtualDisk[“1”] cacheReadPrefetch=FALSE
- change stripe size: read the docs for how to do this on new virtualDisks, but to do online changing of existing ones – set virtualDisk[“1”] segmentSize=512
- Enhanced JBOD: Just create 15 RAID 0 virtual disks! 🙂
- BONUS! modify write cache flush timings: set virtualDisk[“1”] cacheFlushModifier=60 – This is an undocumented command that changes the cache flush timing to 60 seconds from the default of 10 seconds. You can also use words like ‘Infinite’ if you’d like. I haven’t played with this much, but 10 seconds seems awfully short, so we will.
Wishlist? Of course I have a wishlist. Don’t I always? 🙂
- This stuff should be exposed in the GUI. Especially the stripe size setting should be easily selectable when you’re first setting up your disks. It’s just dumb that it’s not.
- Better documentation. After a handy-dandy Google search, it appears as if the Dell MD3000 is a rebranded LSI/Engenio array, which lots of other companies also appear to have rebranded, like the IBM DS4000. But the Engenio docs are more thorough, which is how I found the cacheFlushModifier setting. (On a side note, why do these companies hide who’s building their arrays? They don’t hide that Intel makes the CPUs… Personally, I’d rather know)
- Faster communication. I asked Dell quite awhile ago for information on settings like these and I had to wait awhile for a response. I imagine this might be related to the Engenio connection – Dell may have just not known the answers and had to ask.
- Bigger stripe sizes. I’d love to benchmark 1MB or bigger stripes with our workload.
- Better command-line interface. Come on, can’t we just SSH into the box and type in our commands already?
Ok, so where are we going next?
- ZFS. I believe the ‘Enhanced JBOD’ mode (15 x RAID-0) would be perfect for ZFS, in a variety of modes (striped + mirrored, RAID-Z, etc). So we’re gonna get with Sun and do an apples-to-apples comparison and see what shakes out. Our plan is to take two Sun X2200 M2 servers, hook them up to a Dell MD3000 apiece, run LVM/software RAID on one and ZFS on the other, then put them under a live workload and see which is faster. My hope is that ZFS will win or be close enough that it doesn’t matter. Why? Because I love ZFS’s data integrity and I believe COW will let us more easily add spindles and see a near-linear speed increase.
- Flash. We’ve been playing around with the idea of flash storage (SSD) on our own for awhile, and have been talking to a number of vendors about their approaches. It’s looking like the best bet may be to move from a two-tier storage system (system RAM + RAID disks) to a three-tier system (system RAM + flash storage + RAID disks) to dramatically improve I/O. If we come across anything that works in practice, rather than theory, I’ll definitely let you know.
- MySQL. We’ve now got boxes which appear to not be CPU-bound *or* I/O bound but are instead bounded by something in software on the boxes, either in MySQL or Linux. Tracking this down is going to be a pain, especially since it’s out of my depth, but we’ve gotta get there. If anyone has any insight or ideas on where to start looking, I’m all ears. We have MySQL Enterprise Platinum licenses so I can probably get MySQL involved fairly easily – I just haven’t had time to start investigating yet.
Also, you might want to check out this review of the MD3000 as well, he’s gone more in-depth on some of the details than I have.
Finally, I’m hoping other storage vendors perk up and pay attention to the real message here: Let us configure our storage. Provide lots of options, because ‘one size fits all’ is the exception, not the rule.
Sun’s announcement today that they’re unifying Storage and Servers under Systems is a good move, I think, but they’ve still got work to do. I believe (and everyone at Sun has heard this from me before) that their storage has been failing because it’s not very good. I hope this change does make a difference – because Jonathan’s right that storage is getting to be more important, not less.
UPDATE: One of the Dell guys who works with us (and helped us get all the nitty gritty details to configure these low-level settings) just posted his contact info in the comments. Feel free to call or email him if you have any questions.
Datacenter love: Equinix
I write a lot about products and companies that have potential, but aren’t quite perfect, like Amazon Unbox on TiVo and lots of Sun stuff. But this week’s outage at 365 Main, a datacenter in San Francisco (which we don’t use), reminded me that there are a few products and companies we love that I don’t say nearly enough about. So I’ll start with our datacenter, Equinix, and try to post about some of the others, too.
SmugMug got its start with 3 old used VA Linux boxes (dual 700mhz Pentium 3s with 2GB of RAM which are still in production today and have been our most reliable boxes) from a dead dotcom, which we threw into a friend’s cheap rack at Hurricane Electric. Once the money started flowing in, and we ran into HE’s power contraints and poor bandwidth, we hunted around for datacenter space. Equinix had the very best reputation among the Operations crowd here in Silicon Valley, so we gave them a shot and pulled out of Hurricane Electric.
I should warn you up front that there’s a little “sticker shock” when you first talk with Equinix (ok, and every time you need to buy more stuff from them, it returns), but in the end, it’s well worth it. It turns out that in life, some things are worth paying for. Datacenter space is certainly one of those things (and we feel like photo sharing is too!).
In the ~4 years we’ve been with Equinix, we’ve had only one major problem: They sold our power out from under us (to Yahoo) which forced us to move from one of their locations to another. Ugh. Datacenter moves, especially with hundreds of terabytes of disks, really suck. Luckily, thanks to decent system architecture and some magic from Amazon S3, we were able to do 99% of our move during normal business hours over the course of a month with no impact on our users.
In all fairness to Equinix (though this is no excuse), they weren’t the only datacenter that had poorly prepared for the ‘Power is King’ change in the datacenter landscape that happened a few years back. Plenty of other companies with other providers tell me the same story, so we’re not alone. Datacenters all over the place used to sell you mostly based on space (square footage) rather than power (watts). They all got burned when CPU and server vendors started getting really fast & dense gear. Nowadays, almost the entire negotiation is regarding power and everyone has empty dead space in their rented cages. Such is life.
On the bright side, everything else about Equinix rocks:
- Power. I’m surprised to hear all of the horror stories out of 365 Main because I assumed they were as good as Equinix has been for us. We haven’t had a single power-related outage in all of the years we’ve been there. It just works – and it’d better, that’s the biggest reason we use a datacenter.
- Metro cross-connects. If you’re hosted in multiple Equinix datacenters in a single metro area, like we are, you can get cheap (a few hundred bucks per month) GigE cross-connects wired between your various locations.
- Support. I’m still surprised every time we need to use Equinix’s support staff and they’re actually super-knowledgeable and helpful. I’m talking about hardcore networking and routing questions. BGP, whatever, you name it – they know it. Better than we do.
- Equinix Direct. I’m always surprised when I talk to other Equinix customers who don’t know about this gem. It’s a way to provision your IP transit providers on a month-by-month basis with no minimum commits or contracts. You pick your providers and pay-as-you-go. Pretty sweet. We’re already directly multi-homed on GigE with multiple providers, but we mix in Equinix Direct to have access to still more. Best thing? ED doesn’t add an extra BGP hop, so your routes still look fast (as opposed to someone like InterNAP who adds an extra BGP hop to do similar stuff).
- Security. 5 biometric scanners are between you and your cage when you enter the building, with live security on hand 24/7. Stuff like this is fairly common at high-end datacenters, but it’s important, so I’m mentioning it anyway.
- Bandwidth providers. Equinix is a carrier-neutral facility, and basically everyone has connectivity there, so you can easily pick whomever you’d like to carry your traffic.
Of course, they do all of the other myriad things a datacenter is supposed to do. One of the reasons I haven’t blogged about them in the past is because they just work – and they work so well, I just don’t spend much time thinking about them.
Which, of course, is the way it’s supposed to be. 🙂
(Now, of course, I’ve jinxed the whole thing like Red Envelope and our datacenters are going to explode in a Martian Invasion. Sorry about that!)
Silent data corruption on AMD servers
One of my readers, Yusuf Goolamabbas, let me know about a nasty silent data corruption on AMD servers with 4GB or more of RAM running Linux. Yikes! This is the sort of thing that keeps me up at night. Yusuf linked me to two bugs on the subject, one at kernel.org and another at Red Hat.
Lots of servers from a variety of manufacturers seem to be affected. It looks like a combination of some problem with Nvidia’s hardware (I’m not an expert, so maybe it’s AMD’s fault, but it doesn’t sound that way to me) and the Linux kernel not doing stuff properly with GART pages. Other OSes don’t seem to be affected, either because they don’t use the hardware iommu or they do things correctly in the first place.
One sucky thing? Apparently Red Hat’s fix isn’t out yet for RHEL5 or RHEL4. Ugh. You can force the kernel to use software iommu instead, but I’m glad I’m not affected.
Most of our servers have over 4GB of RAM, and as you know doubt know, we’re pretty in love with our SunFire x2200 servers, most of which have 4GB – 32GB of RAM. So I fired off a frantic email late last night to Sun, asking them if our servers have the problem.
The good news? They don’t! Whew. Maybe I’ll get some sleep tonight… 🙂
FYI, there are some Sun servers (and plenty from every other vendor, too) that are affected. Here’s a link to Sun Alert 102790 with more info. Sun was also good enough to send along info on a similar-sounding, but different, issue in Sun Alert 102323.
My next question for Sun will be about how ZFS would handle silent data corruption like this, since it’s supposed to be quite resilient to strange hardware behavior. My bet is that this is likely outside of the scope of things ZFS can detect and avoid (I think it’s awesome at read error detection, but I’m not sure how it could tell that a write doesn’t contain the right data. But then, I’m not as smart as they are 🙂 )
Anyway, hope this info helps some of you out. I know I’d want to know about this stuff.
Sun Honeymoon Update: Storage
As I mentioned in my review of the Sun X2200 M2 servers we got recently, which we absolutely loved, Sun’s storage wasn’t impressive at all. In fact, it was downright bad. But before I get into the gory details, I feel compelled to mention that I believe Sun’s future, including storage, is bright. Their servers rock, they’re innovating all over the place, and for the most part, the people at Sun have been fantastic to work with – even when they’re being told their storage hardware sucks. That’s impressive. Now, on with the show:
The storage arrays I’m blogging about here are Sun StorEdge 3320 SCSI arrays. For more on why we chose this particular model, you can read about my on-going search for The Perfect DB Storage Array. The bottom line, though, for us is that Speed Matters. Their list price is quite expensive, but Sun was willing to work with us on the price, and we managed to get things into a reasonable ballpark. Reasonable, that is, as long as they performed. 🙂
First, some details. These boxes were destined to be part of our DB layer, with the first few going in as new storage for replication slaves. The goal was to maintain a high number of small (4K) i/o operations per second (IOPS) with an emphasis on writes, since scaling reads is easier (add slaves) than scaling writes (only so many spindles you can add, etc). In this particular case, the writes were being delivered from a MySQL master using InnoDB running Linux on 3 effective 7200rpm spindles, so the Sun array, on paper, should be able to keep up, no sweat. If your needs differ, our story might not be useful – test for yourself.
Installing and configuring them was an adventure. Craig Meakin, our Server Surgeon, was tasked with installing them and immediately ran into a snag. When configured for DHCP management access (which is how they were set up out of the box, exactly how we like them), they wouldn’t actually DHCP an IP address. It took someone at Sun wading through 4 different manuals to determine that not only did the array have to toggled to DHCP, but you also had to write “DHCP” in the IP address spot to make it work. Strike one.
(As an aside, one of Sun’s engineers also told us, after we’d bought them and installed them in the rack, that these storage arrays don’t come with battery backed write caches. Given how expensive they are, I was shocked and furious, but quickly got verification that they do, indeed, have BBWC.)
We brought one online and moved a DB slave snapshot over which was a few hours out-of-date and started replication so it could catch up with the master. Obviously, it wasn’t live and in production, so it was mostly spooling and committing writes from the master, only doing reads as needed for updates and whatnot. A very light load, in other words. With interest, we started timing how fast it would catch up, since it should scream. We were betting at least 2X (15K drives, after all) faster than the master, and on par with our other 15K SCSI slaves. Instead, we measured more than 4X *slower* than our slaves. Strike two.
Ok, no worries. Obviously this is a new array, and we did something terribly stupid setting it up. Sun support to the rescue, right? So we opened up tickets, dumped our config and all other relevant details to them. Nada. Oh, they came back with lots of suggestions and things to try. But none of them helped. Next step was to grab detailed system i/o statistics on production slaves which worked and Sun SE3320s that didn’t, so Sun could compare. And compare they did – their data showed a 6X performance differential between our production slaves (which had $700 off-the-shelf LSI SCSI MegaRAID cards in them) with 15K disks and Sun’s hardware. Sun was 6X slower. Final verdict? “System is performing as designed.” Strike three – they’re out!
Frantically, since the entire reason we had gone with Sun was because Rackable had shipped us a bunch of broken units and we were now months behind on an expansion project, we called Dell and ordered some PowerVault MD3000 SAS arrays. I always give Dell props for fast, efficient delivery, and knock them for a lack of innovation. In this case, they not only got us the gear fast, but the MD3000 turned out to be a fantastic DAS device and nearly perfect for our needs. Thank goodness!
Normally, that would be the end of our little tale, but as luck would have it, when Sun realized they’d laid an egg with the SE3320, they rushed us an engineering sample of their not-yet-announced (then) new StorEdge 2540 array. The good news? It performed neck-and-neck with the Dell array and uses SAS drives, which we prefer over SCSI. The bad news? They weren’t out yet and we needed storage yesterday. I believe they are out now, and I would buy the 100% SAS version, the StorEdge 2530, rather than 2540, for use in our datacenters if we hadn’t gotten the Dells.
So now we’ve got fantastic Sun servers attached to fantastic Dell storage. And our little franken-servers are as happy as can be. Fast, too.
Feed readers: Digg this story
Speed Matters.
As subscribers to my blog have probably already guessed, we spend an inordinate amount of time at SmugMug trying to optimize for speed. As a media-heavy website, that’s a difficult thing to do and there are a lot of pieces. A typical gallery page at SmugMug contains 16 photos (though may contain thousands), plus all of the other graphic elements on the page, JavaScript includes (we use lots of JS), CSS includes, and the page HTML itself.
We’ve long tracked our own internal “page render” time, but once it leaves our servers, it gets more difficult to track. There’s a huge, nasty mess of networking equipment and providers between our servers and each customer. There are paid services that will track some of this for you, but that doesn’t tell you what the actual customer experience is like. We have employees in Utah, Idaho, Ohio, Virginia, New York, London, the Netherlands, and Australia so we can get a decent idea, but nothing beats aggregate data.
Enter Alexa with their excellent service and the data it provides. They get a lot of publicity for their Traffic Rank and Reach stats, but they don’t help us hardly at all (we have tens of thousands of customers who use their own custom domains, for example, among other problems). The stat I really love is the Speed rating. Since Alexa aggregates data from millions of people all over the world, across all page views on a site (heavy and light), we can get a really good view of just how fast our site is:
The usual disclaimers about statistics, particularly Alexa’s, apply: We don’t know exactly what they’re measuring, how much or often they’re measuring it, and how many people are actually measured. But we do know that Alexa’s Speed rating has directly correlated to feedback we get from our customers, and most importantly, our customer satisfaction. That’s good enough for me.
Now, like any Alexa statistic, such as Traffic Rank, it’s best viewed in relation to other sites, rather than alone. So here’s a bunch of photo-sharing sites, both ‘larger’ and ‘smaller’ than SmugMug, and their Speed ratings in rough order of ‘size’ according to Alexa’s Traffic Rank (again, Traffic Rank is notoriously flawed, but we have to order by *something*):
Now, we’re not perfect. SmugMug, like every other site on the net, has problems. But we try very very hard to keep the site speedy and responsive – and I think both the stats above and our customer satisfaction speaks volumes. And I think it’s only fair to note that some of those sites handle more page and photo requests per day than we do – but we’ve left the “small site” size behind long ago, so I wouldn’t discount our size too much. It’s probably only fair to note that with the possible exception of PBase, they all have massive financial resources in comparison to ours, though, too.
We have a huge laundry list of things we can do to speed the site up even more, so look for us to shave more milliseconds off your page load times as we go forward. And I have to thank our Ops team, Andrew Gibbons – Director of Operations & Craig Meakin – Server Surgeon, and programming team, Jimmy Thompson – Web Superhero & Lee Shepherd – SmugSorcerer. Couldn’t have gotten below 1 second without them!
If there’s enough interest, I can do a follow-up post on lots of the tricks we use to get there. I don’t think we do anything earth-shattering, but lots of small things add up. Let me know.


