May 08, 2008

Analyzing Search Engine Results

In teaching the current run of the Mastering Web Searching course, I've discovered that the major search engines have been changing how they handle words and syntax.

Word Variants and use of +

Google has been stemming words for some time - enter the word smudge and Google will bold smudges and smudging in the search results. In many cases it looks for singular and plural and word variants - the gerund. Word variants can be a bit bizarre - it considers principal a variant of principle.

If you don't want Google to do this use the + sign -- eg, +smudge.

Live operates the same way, as do Yahoo and Ask now. In all cases, use + if you wish to stop it. However, having these extra terms automatically does make the query richer and usually improves results.

Boolean at Yahoo

Yahoo Search dropped its acceptance of conventional Boolean operators other than OR. It used to accept AND NOT, NOT. No more - now it even bolds NOT in the search results. Therefore, use the minus sign to exclude.

Ask.com - To Exclude a Domain

Ask.com does not let us exclude a site when using site as the prefix. Maybe this has been true for some time and I didn't notice. Enter -- native american -site:gov -- you get gov sites rather than removing them. But they do disappear with -gov. It's possible that Ask strips out all .gov in a domain name. That seems to be the case with aboriginals in australia -gov. And also with canada parliament -gc.

Sponsored Ads

Speaking of Ask.com, sponsored results are filling the middle section. A smart answer will still appear first, if available, and is followed by as many as five sponsored ads, and then another set at the end of the page. Makes you rush to change options to show 50 or 100 results per page. Too bad the cookies don't hold to the next session. The three pane design is nice, but not at the cost of having to deal with sponsored ads front and center all the time.

Counting Results

Google's counter has been out of whack the past few days. I've seen several examples, but this one in which the * is used for answer format was the most dramatic.

Google counter


Showing only 2 pages.

Google said there were 388,000 hits for information literacy is critical to * (unbelieveable), but on page 2 of the results changed the count to 71.

Have to wonder about Yahoo too. Results count is often much higher than Google's and some figures are truly astonishing.

For concerts Vancouver June 2008 listings, Google has 153,000 and Yahoo has 2.3 million. Interesting also that adding the word listings had Yahoo increase from 1.5 million. It's possible that this has something to do with the stemming / word variants that is picking up lists and list at both engines. Use hte + sign to get more reasonable numbers - concerts Vancouver June 2008 +listings -- Yahoo has 703,000 (still very high) and Google 30,100. I don't know whose counter is wrong - maybe both.

We get a better sense of relative size from a single phrase search - "religious melancholia" -- Google has 2,260 vs Yahoo's 683.

Thanks to the students of the MWS class for these examples.

Posted by Gwen at May 8, 2008 06:21 PM