January 31, 2010

Google's Algorithms

Cracking the Google Algorithm, and Understanding Search Patents with Ted “tedster” Ulle, Stuntdubl (Jan 28)

Outside of Google itself, Ted Ulle of WebMasterWorld is the expert on Google's algorithms. In this interview he answers questions about "5 most significant algorithm change" and "top 5 changes in the next 5 years"

Of particular interest:

+ "Phrase-based indexing, as described in the 2006 patents, brought a deeper level of semantic intelligence to the search results."

+ "Geo-located results began to create different rankings even for various areas of the same US and UK city somewhere around 2005 or so."

+ "Google’s user "intention engine" has had a major effect, and that rolled out in a big way in 2009. This was coupled with a kind of automated taxonomy of query terms."

This was especially interesting because it suggests that Google is clustering results in the background but not showing the "taxonomy labels". Instead it selects from the clusters. See Ted's post from August 2009

I've been studying one of the "phrase-based indexing" patents that Google filed, in particular Automatic taxonomy generation in search results using phrases [patft1.uspto.gov]. It's giving me new thoughts on how search results can be blended to include representatives from different clusters, or different taxonomies related to the original query phrase.

Walking through the patent's logic: a search phrase is associated with several clusters of web pages. Each one of those clusters is a group that includes some other phrase, in addition to the requested keyword phrase. This assumes that the phrases that create a cluster are groups of words that offer what the patent calls "information gain".

This patent would automatically create a taxonomy label for each cluster, based on that second phrase. A given web page could be a member of more than one cluster, and therefore be part of several different taxonomies related to the principal search term.

From Webmaster World - Google Search News - Blended Results, QDF and User Intention at Google

+ "The beginnings of sentiment analysis may begin to show up in the next few years. I expect to see it first on the level of rating for where content falls on a fact-to-opinion spectrum. Full sentiment analysis (rating content on a "favorable-to-critical" opinion spectrum) is already in use for some social media monitoring, but that is probably too big a technical challenge to expect Google to go with it in the general search results."

Posted by Gwen at January 31, 2010 03:13 PM