Web Accelerator - the other reason

by @jehiah on 2005-05-16 04:07UTC
Filed under: All , Articles

It didn’t occur to me until 9pm when I was halfway from Portsmouth VA to Annapolis MD that Cringely was wrong. It might not even be that he was wrong entirely, but that he was only looking at one side of the coin.

Cringely in his most recent article about Inflection Points talks about the new Goolge Web Accelerator, and gives some reasons as to why they released it.

The second inflection point this week was made by Google with its Google Web Accelerator. The company has generally downplayed the Accelerator as simple research – a test that required a few thousand users. But it is much more than that. First, Google hasn’t yet announced a beta, and then changed its mind about what it’s beta testing. Every Google service that has begun as a beta turns eventually into an official extension of the Googleplex. Froogle, their comparison buying service, isn’t going away, nor are image and video search or GMail. The same goes for this Google Web Accelerator.

…

But why? Why spend all this money, make this heroic effort, just to make web surfing twice as fast? The first reason is because Google can do it. The company likes big stretches like this. The second reason is because everybody else CAN’T do it. The technology required is so breathtaking and audacious that even a Microsoft or IBM wouldn’t dare to try it and certainly Yahoo won’t. The best Yahoo can hope for is that Google fails, which they probably won’t. And the final reason for doing this is because it co-opts every ISP and web page owner. If surfing can be doubled in speed for nothing, of course nearly everyone will go for it. But that means every AOL customer becomes a de facto Google customer and this page becomes a de facto Google service that costs them nothing to produce.

Google is offering a proxy to make web browsing twice as fast. I can buy that, but why do they only need a few thousand users right now? and is that really research worthy of all those brilliant minds?

Google does do cool things, but I think they normally server a more worthy purpose than making web browsing faster for some of us. After all, running a proxy service can [and has] created a lot of problems for some of those serving up webpages. Perhaps that isn’t their intent at all.

Let’s step back a minute and see what other problems Google has that we know they are researching, and while doing so, keeping in mind their philosophy of “do no evil.”

While Google is first and foremost (from a revenue perspective) an ad hosting company, their bread and butter (and name recognition) still comes from search. One of the big things Google is wresting in when it comes to relevant search results is dealing with SPAM results (website which give google a real page, but server users a different page altogether, therefore deceptively gaining a position in search results). These SPAM results degrade the search experience because end users get un-relevant results.

Why can’t they fix this problem you ask? They have a Report a Spam Result page on their site! As we can see this is a manual process, and while it’s good to have, it does not scale to cover a search collection of 8 billion pages. Why don’t they automate the process of discovering Spam in their results? Because sadly they have pledged to “do no evil” and have limited options.

A little fundamental knowledge of how the web works in order here before we can explain this fully. Every page request that Google’s spider makes when building their index includes the user-agent string “GoogleBot”. This helps webmasters know that it is a search engine which is crawling their site, and not a real user.

The problem - as it always seems to be - is that some websites abuse this information and return a page skimmed from other sites when they see the “GoogleBot” user-agent. Similarly when that user-agent is not “GoogleBot” they return an ad infested search gateway page. Google has trouble dealing with this because of it’s very nature - they don’t see the fake page, only the skimmed one.

How can the Google Web Accelerator help with this? It allows Google legitimate access to what a normal user sees when browsing the web (ie: without the “GoogleBot” user-agent). When they have this information they can compare it to the pages their search engine spider has, and if their more than a reasonable discrepancy they know to blacklist a site.

It has very little to do with watching what a user gets, or how quickly they get it - though those are nice benefits. I believe it has everything to do with watching what the website serves up as a result.