Archive

Posts Tagged ‘google’

Great idea! Google *should* open their index!

July 15, 2010 26 comments
Raised bridge on the Chicago River by Art Hill

Raised bridge on the Chicago River by Art Hill

tl;dr: Serving dozens (hundreds?) of crawlers is expensive. We could use an open index. Google’s?

Just read Tom Foremski’s Google Exec Says It’s A Good Idea: Open The Index And Speed Up The Internet article. And I have to say, it’s a great idea!

I don’t have hard numbers handy, but I would estimate close to 50% of our web server CPU resources (and related data access layers) go to serving crawler robots. Stop and think about that for a minute. SmugMug is a Top 300 website with tens of millions of visitors, more than half a billion page views, and billions of HTTP / AJAX requests (we’re very dynamic) each month. As measured by both Google and Alexa, we’re extremely fast (faster than 84% of sites) despite being very media heavy. We invest heavily in performance.

And maybe 50% of that is wasted on crawler robots. We have billions of ‘unique’ URLs since we have galleries, timelines, keywords, feeds, etc. Tons of ways to slice and dice our data. Every second of every day, we’re being crawled by Google, Yahoo, Microsoft, etc. And those are the well-behaved robots. The startups who think nothing of just hammering us with crazy requests all day long are even worse. And if you think about it, the robots are much harder to optimize for – they’re crawling the long tail, which totally annihilates your caching layers. Humans are much easier to predict and optimize for.

Worst part about the whole thing, though? We’re serving the exact same data to Google. And to Yahoo. And to Microsoft. And to Billy Bob’s Startup. You get the idea. For every new crawler, our costs go up.

We spend significant effort attempting to serve the robots quickly and well, but the duplicated effort is getting pretty insane. I wouldn’t be surprised if that was part of the reason Facebook revised their robots.txt policy, and I wouldn’t be surprised to see us do something similar in the near future, which would allow us to devote our resources to the crawlers that really matter.

Anyway, if a vote were held to decide whether the world needs an open-to-all index, rather than all this duplicated crawling, I’d vote YES! And SmugMug would get even faster than it is today.

On a totally separate, but sorta related issue, Google shouldn’t have to do anything at all to their algorithms. Danny Sullivan has some absolutely brilliant satire on that subject.

My MySQL keynote slides and video

April 15, 2010 4 comments

Been asked a few times in the last few days about where my slides are from my MySQL keynote from *last* year.

Ooops.

Um, yeah.  Sorry about that.  Here’s a link to ‘The SmugMug Tale’ slides, and you can watch the video below:

Sorry for the extreme lag.  I suck.

The important highlights go something like this:

  • Use transactional replication.  Without it, you’re dead in the water. You have no idea where a crashed slave was.
  • Use a filesystem that lets you do snapshots.  Easily the best way to do backups, spin up new slaves, etc. I love ZFS.  You’ll need transactional replication to really make this painless.
  • Use SSDs if you can. We can’t afford to be fully deployed on SSDs (terabytes are expensive), but putting them in the write path to lower latency is awesome.  The read path might help, too, depending on how much caching you’re already doing.  Love hybrid storage pools.
  • Use Fishworks (aka Open Storage) if you can.  The analytics are unbeatable, plus you get SSDs, snapshots, ZFS, and tons of other goodies.
  • Use transactional replication. This is so important I’m repeating it.  Patch it into MySQL (Google, Facebook, and Percona have patches) or use XtraDB if you use replication.  We use the Percona patch.

Holler in the comments if something in the presentation isn’t clear, I’ll answer.  Apologies again.

Shameless plug – we’re hiring. And it’s a blast.

Nasty Safari bug not fixed since December :(

March 26, 2009 33 comments
A rotten little apple

A rotten little apple by Ashley Harding

Apple has had a nasty Safari bug since December which breaks SmugMug, Facebook, Gmail, and lots of US banks.

3 months later, it’s still not fixed. Your only option is to use Firefox if you’re affected.

Apple’s known about the problem since December, and has lots of internal bugs on the issue (30+ I last heard). (For my Apple readers, here’s our bug on the subject and the one it was marked as a duplicate of).

I’ve done everything I know how to get this resolved – Apple employees have been internally working on our behalf, we have an AppleCare Enterprise Support case open (#332101), I tried to open an ADC Premier case (it was denied because they don’t “provide code-level support for content creation issues” whatever that means). Still no luck.

Apparently this is fixed in Mac OS X 10.5.7. We have the latest seed, so we’re going to find out, but 10.5.7 is likely a month or more away from shipping, so expect this stuff to be be broken at least until then. Use Firefox.

Safari happens to be my favorite browser, so this is especially disheartening. The good news is you may not be affected. Not everyone running 10.5.6 with Safari is, for some reason, but lots of you are. You’ll know you are if you see SmugMug galleries which appear to be empty (but aren’t) or see ugly white pages with undecipherable error messages. For that, I apologize – I really wish we could help but we can’t. You can help yourself, though – use Firefox.

Put another way, Eric Schmidt, Google’s CEO, sits on Apple’s board of directors. Gmail has also been broken for 3 months. Apparently he’s powerless too. 😦

Great things afoot in the MySQL community

December 23, 2008 37 comments

tl;dr: The MySQL community rocks. Percona, XtraDB, Drizzle, SSD storage, InnoDB IO scalability challenges.

For anyone who lives and dies by MySQL and InnoDB, things are finally starting to heat up and get interesting. I’ve been banging the “MySQL/InnoDB scales poorly” drums for years now, and despite having paid Enterprise licenses, I haven’t been able to get anywhere. I was pretty excited when Sun bought MySQL since their future is intrinsically tied to concurrency, but things have been pretty slow going over there this year.

But the community has finally taken up arms and is fighting the good fight. It’s (finally!) a great time to be a MySQL user because there’s been lots of recent progress. Here’re some of my favorites (and highlights of work left to do):

PERCONA

I can’t sing Percona’s praises enough. They’re probably the most knowledgeable MySQL experts out there (possibly even including Sun). Absolutely the best bang for the buck in terms of MySQL service and support – better than MySQL’s own offering. (If I had to guess why that is, I’d bet that MySQL/Sun don’t want to step on Oracle’s toes by fixing InnoDB – but >99% of what we need is related to InnoDB. Percona has no such tip-toeing limitations.) Let me quickly count the ways they’ve helped me in the last few months:

  • They knew of a super obscure configuration setting “back_log“. Have you ever heard of it? I hadn’t. But we started seeing latency on MySQL connections (up to *3 seconds*!) on systems that hadn’t changed recently (exactly 3 seconds sounded awfully suspicious, and sure enough, it was TCP retries). After going through every single kernel, network, and MySQL tuning parameter I know (and I know a lot), I finally called Percona. They dug in, investigated the system, and unearthed ‘back_log’ within an hour or two. Popped that into my configuration and boom, everything was fine again. Whew!
  • We have servers that easily exceed InnoDB’s transaction limits. Did you know InnoDB has a concurrent transaction limit of 1024? (Technically, 1024 INSERTs and 1024 UPDATEs. But INSERT … ON DUPLICATE KEY UPDATE manages to chew up one of each). I know all about it – I’ve had bugs open with MySQL Enterprise for more than 2 years on the issue. What’s more, these are low-end systems – 4 cores, 16GB of RAM – and they’re no-where near CPU or IO bound. It took MySQL months to figure out what the problem was (years, really, to figure out all the final details like the different undo logs for INSERT vs UPDATE). Their final answer? It’ll be fixed in MySQL 6. 😦 Note that 5.1 *just* went GA after years and years. On the other hand, it took Percona one weekend to diagnose the problem, and 13 days to have a preliminary patch ready to extend it to 4072 undo slots. Talk about progress! (And yes, we want Percona to release the patch to the world)
  • Solving the CPU scaling problems. These have been plaguing us for years (we have had some older four-socket systems for awhile … now with quad-core, it’s even worse), and thanks to Google and Percona, this problem is well on its way to being solved. We’re sponsoring this work and can’t wait to see what happens next.
  • XtraDB. This is the biggy. So big it deserves its own heading….

XTRADB

Oracle’s done a terrible job of supporting the community with InnoDB. The conspiracy theorists can all say “I told you so! Oracle bought them to halt MySQL progress” now – history supports them. Which is a shame – Heikki is a great guy and has done amazing work with InnoDB, but the fact remains that it wasn’t moving forward. The InnoDB plugin release was disappointing, to say the least. It addressed none of the CPU or IO scalability issues the community has been crying about for years.

Luckily, Percona finally did what everyone else has been too afraid to do – they forked InnoDB. XtraDB is their storage engine, forked from InnoDB (and then turbocharged!). We’re not running it in production yet, but we are running all of the patches that went into XtraDB and I can tell you they’re great. We’re sponsoring more XtraDB development (and yes, we made sure Percona will be contributing anything they build for us back to the community) with Percona, and I’m sure that’ll continue.

DRIZZLE

I’ve already blogged a bit about Drizzle, but it sure looks like Drizzle + XtraDB might be a match made in heaven. Drizzle can be though of as a MySQL engine re-write with an eye towards web workloads and performance, rather than features. MySQL 4.1, 5.0, and 5.1 added a lot of features that bloated the code without offering anything really useful to web-oriented workloads like ours, so the Drizzle team is ripping all that stuff back out and rethinking the approaches to the things that are being left in. Very exciting.

SSD STORAGE

The advent of “cheap enough” super-fast SSD storage is finally upon us. I’ve got Sun S7410 storage appliances in production and they’re blazingly fast. I have a very thorough review coming, but the short version is that even with NFS latencies, we’re able to do obscene write workloads to these boxes (let alone reads). 10000+ write IOPS to 10TB of mirrored, crazy durable (thanks ZFS!) storage is a dream come true. Once you mix in snapshots, clones, replication, and Analytics – well, it just doesn’t get much better than this.

(Don’t get sticker shock looking at the web pricing – no-one pays anything even remotely like that. Sign up for Startup Essentials if you can, or talk to your Sun sales rep if you can’t, and you can get them much cheaper. I nearly had a heart attack myself until I got “real” pricing. Tell them I sent you – enough Sun people read this blog, it might just help 🙂 ).

STILL NEEDED…

So, all in all, there’s been an awful lot of progress this year, which is great. CPUs are finally scaling under InnoDB, and we finally have storage that isn’t bounded by physical rotation and mechanical arms. Unfortunately, great CPU scaling plus amazing IO capabilities isn’t something InnoDB digests very well. As is common in complicated systems, once you fix one bottleneck, another one elsewhere in the system crops up. This time, it’s IOPS. It was eerie reading Mark Callaghan’s post about this last night – I’d come to the exact same conclusions (from an Operations point of view rather than code-level) just yesterday.

Bottom line: Despite having ample CPU and ample IO, InnoDB isn’t capable of using the IO provided. You can bet we’ll be working with Percona, Google and Sun (read: sitting back and admiring their brilliant work while writing the occasional check and providing production workload information) to look into fixing this.

In the meantime, we’re back to the old standbys: replication and data partitioning. Yes, we’re stacking lots of MySQL instances on each S7410 to maximize both our IOPS and our budget. Fun stuff – more on that later. 🙂

UPDATE: Just occurred to me that there are plenty of *new* readers to my blog who haven’t heard me praise Google and their patches before. Mark Callaghan’s team over at Google definitely deserves a shout-out – they’ve really been a catalyst for much of this work along with Percona.

Hot technologies I care about – Sep '08

September 17, 2008 30 comments
Iron Worker by ikegami

photo by: ikegami

I’ve been too busy to blog lately, and for that I apologize.  But here’s a quicky detailing the technologies (internet related and not) I’m excited about right now:

  • Drizzle.  For years now, I’ve felt that MySQL has been doing in a direction in opposition to my use case.  Stored procedures, views, etc etc have added bloat and complexity without offering me anything useful.  Turns out I’m not alone – and thus Drizzle was born.  To say I’m *super* excited about this is a serious understatement.
  • Google & Percona’s MySQL patches.  While I wait for Drizzle, I’m stuck dealing with terrible concurrency issues in MySQL/InnoDB that force us to partition data way before we really should have to, making our system more complex.  It’s crazy having a server keel over when it shouldn’t be either CPU-bound *or* IO-bound but that’s life with MySQL and InnoDB these days – or at least, it was until Google and Percona fixed what I couldn’t get MySQL to fix with our Platinum Enterprise subscriptions.  Open source rules!
  • Flash storage.  I really wish I could talk about this some more (pesky NDAs), but there are datacenter changes coming that are more dramatic than anything I’ve seen in 14 years of working on them. I hope I’ve talked to everyone in the space (and from the companies I’ve talked to, one of them seems to be the *very* clear winner for this upcoming round), but if you’re a storage vendor working on flash appliances and I haven’t talked to you, ping me.  We’re a bleeding edge customer and we’ll put your stuff in production faster than you can deliver it to us.  🙂
  • ZFS.  Regardless of flash storage, ZFS is the filesystem of choice – head and shoulders over everything we’ve used or heard of.  The advent of flash just makes this even more compelling.  The downside?  It’s not on Linux.  😦
  • OpenSolaris.  ZFS is so incredible, my hand has been forced, and we’re about to put our first OpenSolaris system into production.  OpenSolaris is, in theory, the Solaris kernel (think ZFS, DTrace, SMF, high concurrency, etc) with the GNU-like userland (think Linux-like).  In practice, it’s still extremely painful for a Linux expert and Solaris n00b like me to use – even on a single-purpose machine like a MySQL server.  Only ZFS makes the pain worth it.  For development, it’s basically unusable for Linuxers (it’s probaby fabulous for Solaris guys – lucky ducks).
  • Nexenta.  Unlike OpenSolaris, Nexenta *is* the Solaris kernel plus GNU userland.  Unfortunately, it’s not backed by Sun or anyone else I have any relationship with.  Sun has been absolutely the very best technology vendor we’ve ever dealt with in terms of support, technical knowledge, and just plain listening to us, so that’s a big issue.  I wish Sun had taken Nexenta’s approach (or would just buy them or offer support or something).  If OpenSolaris continues to be painful, we may fall back on Nexenta instead – remember, ZFS is the driving factor here.
  • Amazon Web Services competitors.  They’ve been promising they’d be coming out for years now and I’m shocked they’ve given Amazon this much runway.  But I believe a few more are getting very close (can’t say more, again, pesky NDAs).  Now, we’re extremely happy with Amazon, so we have no plans to switch, but competition is good for everyone – and Amazon is a fierce competitor.  Plus there are still gaps in Amazon’s strategy, and if I can mix & match to plug some of those gaps, awesome – sign me up.
  • Memcached.  This one’s been on my list for years, and it’s still way up there.  Binary protocol on the verge of shipping, nice patch to resolve some networking issues we’ve seen, and talk about scabability.  If you’re building web apps and this isn’t a core part of your infrastructure, you’re doing it wrong.
  • Big RAM.  4GB DIMMs are dirt cheap, so if you’re not loading your DB and Memcached boxes to the gills, you’re missing the boat.  Cheap 2-socket 64GB (and relatively cheap 128GB at 4-sockets) are here.
  • Sun Fire X4140 and X4440.  The best 1U (2-socket) and 2U (4-socket) servers on earth.  Despite being late to the game with quad-core, Opteron RAM performance kills Xeon, so these are the servers we’re buying.  You can load them to the gills with 4GB DIMMs, enjoy the dual-power supplies (yes, in the 1U box too), and crank out some great stuff.
  • OpenSocial, Y!OS, etc.  The big boys are finally getting real about getting open and cross-pollinating data and I think we’re finally nearing an inflection point.  We’re hiring a Sorcerer to do nothing but think and build in this space.  I’m sure magic will ensue.
  • Nikon D90 and Canon 5D MkII.  Nikon’s taken the photography world by storm with amazing high-ISO performance, and Canon just announced a DSLR that shoots full 1080p video.  Both look amazing and both are game-changers.
  • Onkyo TX-SR806.  I’m an A/V junkie and this thing is amazing.  5 HDMI inputs (need more?), THX Ultra2 Plus (the low-volume enhancements are *awesome* with young kids sleeping at home), automatic room EQ, decodes every modern audio encoding, etc.  I don’t even use the amplifier section (I have separates), but it’s turning out to be the best Pre/Pro I’ve ever owned.  Sounds fabulous on my gear.
  • iPhone App Store.  That thing is a game changer, and we’re barely seeing the tip of the iceberg.  All the other players have to respond – which is great for you and I.  And talk about a platform that’s a dream to develop on!
So there you have it.  Those are the most important pieces of tech I’m watching these days.  I’ll *definitely* be writing up our ZFS experiments as they come along and I have interesting data to share.  Stay tuned.  
 
Oh, and if you’re curious about what I *wish* was on the list, there’s really only one thing:  iTunes syncing.  I have two desktops (one at my office, one at home) and two laptops, plus my wife has accounts on my computers.  Keeping those all in sync so that when I update a playlist at the office, the update is waiting for me at home, is a nightmare.  I’d pay lots of money if someone could solve that – seems like iTunes + AWS + a smart coder = solved, no?  Wish I had some time….

Thoughts on Google App Engine

April 8, 2008 23 comments

First:  Very cool.

Next:  I think it’s interesting that Google has basically taken a sniper scope out and aimed it at a specific cloud computing target.  App Engine is only for web applications.  No batch computing, no cron jobs, no CPU/disk/network access, etc.  

I think this is very smart of Google.  Rather than attacking Amazon head-on, Google has realized there’s a huge playing field for cloud computing, and are attempting to dominate another portion of it, one where they have a lot of expertise.  Very good business move, imho.

Will we use it?  I wouldn’t be surprised.  I’ve long thought that we’ll continue to mix in web services from a variety of providers, and it looks like App Engine can solve a slice of our datacenter need that other providers don’t yet provide.  

I’m more than a little concerned, though, by how much vendor lock-in there is with App Engine.  At first glance, it doesn’t look like the apps will be portable at all.  If I want to switch providers, or add in other providers so I’m not relying solely on Google, I’m outta luck.  

I’m hopeful other languages get supported, too.  I think Python is great – don’t get me wrong – but we have a lot more experience with other languages, so there’ll be a learning curve.

Finally, I’m dying to find out what pricing for an application of our scale will look like.  I can see some immediate, obvious things I’d like to try to do on App Engine, but the beta limits aren’t gonna cut it for us.  😦

Will it replace Amazon?  It sure doesn’t look like it from where I sit.  In fact, I don’t see this as much of a competitor to Amazon Web Services.  There’s some overlap in some small area (hosted web apps on EC2), but I doubt that’s the bulk of Amazon’s business.  As I said, we’ll likely end up using both (and other providers as they come along, too).

My favorite bit?  In theory, Google has solved the data scaling problem.  I don’t mean raw binary (blob) storage, which S3, SmugFS, MogileFS, and plenty of other things have solved, but the “database” scaling problem.  Every popular web app runs into this problem, and it’s typically solved with a combination of memcached, federation, and replication.  But it’s messy.  In theory, Google has automated that piece for us.  I can’t wait to play with it and see if that’s true.

I also can’t wait to see who else is going to wade into this fray.  Sun?  Microsoft?  Yahoo?  IBM?  

Bring it on!