Archive
Great idea! Google *should* open their index!
tl;dr: Serving dozens (hundreds?) of crawlers is expensive. We could use an open index. Google’s?
Just read Tom Foremski’s Google Exec Says It’s A Good Idea: Open The Index And Speed Up The Internet article. And I have to say, it’s a great idea!
I don’t have hard numbers handy, but I would estimate close to 50% of our web server CPU resources (and related data access layers) go to serving crawler robots. Stop and think about that for a minute. SmugMug is a Top 300 website with tens of millions of visitors, more than half a billion page views, and billions of HTTP / AJAX requests (we’re very dynamic) each month. As measured by both Google and Alexa, we’re extremely fast (faster than 84% of sites) despite being very media heavy. We invest heavily in performance.
And maybe 50% of that is wasted on crawler robots. We have billions of ‘unique’ URLs since we have galleries, timelines, keywords, feeds, etc. Tons of ways to slice and dice our data. Every second of every day, we’re being crawled by Google, Yahoo, Microsoft, etc. And those are the well-behaved robots. The startups who think nothing of just hammering us with crazy requests all day long are even worse. And if you think about it, the robots are much harder to optimize for – they’re crawling the long tail, which totally annihilates your caching layers. Humans are much easier to predict and optimize for.
Worst part about the whole thing, though? We’re serving the exact same data to Google. And to Yahoo. And to Microsoft. And to Billy Bob’s Startup. You get the idea. For every new crawler, our costs go up.
We spend significant effort attempting to serve the robots quickly and well, but the duplicated effort is getting pretty insane. I wouldn’t be surprised if that was part of the reason Facebook revised their robots.txt policy, and I wouldn’t be surprised to see us do something similar in the near future, which would allow us to devote our resources to the crawlers that really matter.
Anyway, if a vote were held to decide whether the world needs an open-to-all index, rather than all this duplicated crawling, I’d vote YES! And SmugMug would get even faster than it is today.
On a totally separate, but sorta related issue, Google shouldn’t have to do anything at all to their algorithms. Danny Sullivan has some absolutely brilliant satire on that subject.
Yahoo adds SmugMug support!

Yahoo! in cloud OR Hadoop? (Яху в облаках) by Alexander & Natalie
tl;dr: Yahoo adds SmugMug support to Profiles. Windows Live coming. Lots of other services, too.
Wow, what a pleasant surprise! Woke up this morning to this story on TechCrunch about 20 new services they’d added to Yahoo Profiles (here’s mine). Lo and behold, SmugMug is one of them! In fact, in Yahoo’s blog post about the new features, SmugMug was the one mentioned for photos. Cool!
As far as I know, we haven’t talked to Yahoo about this at all – which is part of what makes this so great. Microsoft was supposed to have rolled something like this out to Windows Live profiles awhile ago, but I still haven’t seen it drop. We’re very excited about that, too, but the two company’s approaches were very different: Microsoft came over, chatted with us about the product, then had us sign a contract to participate. That was months ago, and I have no idea when it’s actually coming. Yahoo, on the other hand, seems to have just built it and shipped it.
I can see the arguments for both approaches: Microsoft is probably being extra careful about privacy, and working through their internal rules and regulations about re-using user generated content. Yahoo, on the other hand, is scrambling to catch up now as the underdog. I assume Yahoo realized that SmugMug already has strong privacy controls around our feeds and simply hit the gas – full speed ahead.
Either way, what’s especially heartening is the number of sites, services, and pieces of software that now support SmugMug. At The Crunchies last week, we weren’t nominated (we won for Best Design last year), but it still felt like we were winning – many of the winners use or integrate with us: Google Reader, Windows Live Mesh, Cooliris, lots of companies using Amazon Web Services, lots of apps on the iPhone 3G, and FriendFeed. Very cool.
(And all of that despite what we *know* is terrible and/or nonexistent documentation around our feeds. Yes, we’ll work on that.)
Seattle/Redmond dinner update
Sitting in the Virgin America part of the SFO International terminal. Talk about an awesome terminal. Can’t wait to fly Virgin, too – my first time (hah!). I doubt anyone on the plane throws down in DOOM the way I do, so I expect a river of tears in the aisle.
Wish I could be at the Google Campfire tonight, cuz the news is awesome, but Microsoft got to me first, so up to the frigid North I go…. Hope this MS stuff is as good as it sounds.
It turns out the Seattle Photography Group is meeting on Wednesday night. Cool! So I’m going to that. You should come. 🙂
Then, afterwards, I’m hoping we’ll head out for some food. If you want to get food afterwards, please leave a comment so we have at least a rough head count. Attendance at the SPG isn’t mandatory, but I’m sure you’re welcome to come.
On Tuesday, I already have dinner plans, but wouldn’t mind hanging out with some geeks afterwards. Doubt I’m up for going into Seattle, though, so if you’re in Redmond or Bellevue, holler.
(I’ll try to email everyone who sent emails, commented, and twittered – but if you haven’t heard from me, re-comment because I’m lame)
Thoughts on the new IE compatibility switch
Over on IEBlog and A List Apart, they detail a new flag for the upcoming IE8 that would enable you to “lock” the browser down to older versions should you be expecting older broken behavior from IE6 or IE7.
This is a bad idea. The Safari team has a great write-up about why they think it’s a bad idea, which I agree with, but I also have an additional take:
Pages and sites that are likely to care about this are poorly written and poorly maintained. Microsoft created this problem themselves when they let IE6 sit idle for more than half a decade, and now they have to deal with it. Instead of letting someone flag their site as being broken (that’s what they’re doing), why shouldn’t they finally force them to fix their site and improve the browsing experience for everyone (not to mention improve the stability, speed, and maintainability of their codebase)?
If someone owned a car, but didn’t know how to drive it properly, would we bend the driving laws to let them on the road? Of course not. Some reasonable adherence to standards and moving things forward is the only thing keeping the web browser mess from descending into pure chaos.