Archive
Great idea! Google *should* open their index!
tl;dr: Serving dozens (hundreds?) of crawlers is expensive. We could use an open index. Google’s?
Just read Tom Foremski’s Google Exec Says It’s A Good Idea: Open The Index And Speed Up The Internet article. And I have to say, it’s a great idea!
I don’t have hard numbers handy, but I would estimate close to 50% of our web server CPU resources (and related data access layers) go to serving crawler robots. Stop and think about that for a minute. SmugMug is a Top 300 website with tens of millions of visitors, more than half a billion page views, and billions of HTTP / AJAX requests (we’re very dynamic) each month. As measured by both Google and Alexa, we’re extremely fast (faster than 84% of sites) despite being very media heavy. We invest heavily in performance.
And maybe 50% of that is wasted on crawler robots. We have billions of ‘unique’ URLs since we have galleries, timelines, keywords, feeds, etc. Tons of ways to slice and dice our data. Every second of every day, we’re being crawled by Google, Yahoo, Microsoft, etc. And those are the well-behaved robots. The startups who think nothing of just hammering us with crazy requests all day long are even worse. And if you think about it, the robots are much harder to optimize for – they’re crawling the long tail, which totally annihilates your caching layers. Humans are much easier to predict and optimize for.
Worst part about the whole thing, though? We’re serving the exact same data to Google. And to Yahoo. And to Microsoft. And to Billy Bob’s Startup. You get the idea. For every new crawler, our costs go up.
We spend significant effort attempting to serve the robots quickly and well, but the duplicated effort is getting pretty insane. I wouldn’t be surprised if that was part of the reason Facebook revised their robots.txt policy, and I wouldn’t be surprised to see us do something similar in the near future, which would allow us to devote our resources to the crawlers that really matter.
Anyway, if a vote were held to decide whether the world needs an open-to-all index, rather than all this duplicated crawling, I’d vote YES! And SmugMug would get even faster than it is today.
On a totally separate, but sorta related issue, Google shouldn’t have to do anything at all to their algorithms. Danny Sullivan has some absolutely brilliant satire on that subject.
I owe Apple an apology
In my last post, I wrote that Apple wasn’t giving App developers access to the high quality 720p video recordings from your Library on iPhone 4.
I was wrong.
The documentation wasn’t clear and we made a bad assumption. And talking to other developers, they all concurred that they couldn’t get access to the high-quality Library videos, either. For years, Apple didn’t let developers get access to the full resolution photos from your Library, which they now permit, so we assumed that’s what was going on here, too. Thank goodness we were wrong.
Sorry Apple!
Go grab the latest SmugShot and enjoy blur-free videos. 🙂
Upload iPhone 4 HD Video over the air!
Seems to be quite a bit of noise online about how you can’t upload HD video from your awesome new iPhone 4 over the air. Even Steve Jobs has weighed in.
I have good news – you can do it today. Easily. Just install SmugShot, sign up for a free trial of SmugMug (you’ll get a nice discount if you signup through SmugShot), and upload HD video to your heart’s content. You’ll need a Power or Pro account, but can use either free for 14 days.
Go wild!
One caveat: Apple doesn’t let us get access to the high res videos from your Library. So you’ll need to film your HD movies using SmugShot. We’re hoping this gets fixed – all versions of iOS prior to 4 didn’t let you get access to high-res photos via your Library either, but they fixed that in iOS 4. I’m assuming they’ll do the same for video at some point (and Steve seems to imply it, too). This is fixed in the latest version, and was our fault, not Apple’s!
(For existing SmugMug family members, yes, this means Power Users can now upload 1080p HD video to their accounts. As always, we’re listening.)