Archive
Great idea! Google *should* open their index!
tl;dr: Serving dozens (hundreds?) of crawlers is expensive. We could use an open index. Google’s?
Just read Tom Foremski’s Google Exec Says It’s A Good Idea: Open The Index And Speed Up The Internet article. And I have to say, it’s a great idea!
I don’t have hard numbers handy, but I would estimate close to 50% of our web server CPU resources (and related data access layers) go to serving crawler robots. Stop and think about that for a minute. SmugMug is a Top 300 website with tens of millions of visitors, more than half a billion page views, and billions of HTTP / AJAX requests (we’re very dynamic) each month. As measured by both Google and Alexa, we’re extremely fast (faster than 84% of sites) despite being very media heavy. We invest heavily in performance.
And maybe 50% of that is wasted on crawler robots. We have billions of ‘unique’ URLs since we have galleries, timelines, keywords, feeds, etc. Tons of ways to slice and dice our data. Every second of every day, we’re being crawled by Google, Yahoo, Microsoft, etc. And those are the well-behaved robots. The startups who think nothing of just hammering us with crazy requests all day long are even worse. And if you think about it, the robots are much harder to optimize for – they’re crawling the long tail, which totally annihilates your caching layers. Humans are much easier to predict and optimize for.
Worst part about the whole thing, though? We’re serving the exact same data to Google. And to Yahoo. And to Microsoft. And to Billy Bob’s Startup. You get the idea. For every new crawler, our costs go up.
We spend significant effort attempting to serve the robots quickly and well, but the duplicated effort is getting pretty insane. I wouldn’t be surprised if that was part of the reason Facebook revised their robots.txt policy, and I wouldn’t be surprised to see us do something similar in the near future, which would allow us to devote our resources to the crawlers that really matter.
Anyway, if a vote were held to decide whether the world needs an open-to-all index, rather than all this duplicated crawling, I’d vote YES! And SmugMug would get even faster than it is today.
On a totally separate, but sorta related issue, Google shouldn’t have to do anything at all to their algorithms. Danny Sullivan has some absolutely brilliant satire on that subject.
Yahoo adds SmugMug support!

Yahoo! in cloud OR Hadoop? (Яху в облаках) by Alexander & Natalie
tl;dr: Yahoo adds SmugMug support to Profiles. Windows Live coming. Lots of other services, too.
Wow, what a pleasant surprise! Woke up this morning to this story on TechCrunch about 20 new services they’d added to Yahoo Profiles (here’s mine). Lo and behold, SmugMug is one of them! In fact, in Yahoo’s blog post about the new features, SmugMug was the one mentioned for photos. Cool!
As far as I know, we haven’t talked to Yahoo about this at all – which is part of what makes this so great. Microsoft was supposed to have rolled something like this out to Windows Live profiles awhile ago, but I still haven’t seen it drop. We’re very excited about that, too, but the two company’s approaches were very different: Microsoft came over, chatted with us about the product, then had us sign a contract to participate. That was months ago, and I have no idea when it’s actually coming. Yahoo, on the other hand, seems to have just built it and shipped it.
I can see the arguments for both approaches: Microsoft is probably being extra careful about privacy, and working through their internal rules and regulations about re-using user generated content. Yahoo, on the other hand, is scrambling to catch up now as the underdog. I assume Yahoo realized that SmugMug already has strong privacy controls around our feeds and simply hit the gas – full speed ahead.
Either way, what’s especially heartening is the number of sites, services, and pieces of software that now support SmugMug. At The Crunchies last week, we weren’t nominated (we won for Best Design last year), but it still felt like we were winning – many of the winners use or integrate with us: Google Reader, Windows Live Mesh, Cooliris, lots of companies using Amazon Web Services, lots of apps on the iPhone 3G, and FriendFeed. Very cool.
(And all of that despite what we *know* is terrible and/or nonexistent documentation around our feeds. Yes, we’ll work on that.)