Web Search Guide banner
 

WSG Newsletter:
Fallout from Yahoo! Search

Issue: June 9, 2004

In April 2004, Yahoo pulled the plug on AltaVista and Alltheweb to cries of consternation from all who write about web search. The sites for these once eminent search engines still exist but the databases of indexed web pages and many of their search features are no more. Today searches at Altavista, Alltheweb, and all the Inktomi-based search services are almost indistinguishable from Yahoo.

What remains? What alternatives do we have for what was lost? What is the larger significance?

Some History

Yahoo built the new engine using parts of Inktomi search technology to create a database that is likely as large as Inktomi’s of 3 billion pages and possibly close to Google’s 4 billion.

Yahoo, like other search services, struggled for revenue during the dotcom bust and waning of banner advertisements. Overture (previously GoTo) hit on the idea of selling small text ads that were keyword activated. Sites that wanted to be found would bid for keywords. It was the magic formula for great fortunes. Google, who was then known for its "pure search", followed up with its own money-making version for paid listings.

Yahoo charged sites for being listed in the directory but it needed to do more. Its history of acquisitions began in 2002 when it bought Inktomi. This engine powered other services, notably MSN Search, Hotbot in the US and the UK.

Early in 2003 Overture bought AltaVista, a former search star that suffered when searchers discovered Google, and Alltheweb, once the showcase for FAST Search technology. People feared then that AltaVista and Alltheweb would be neglected as Overture exploited the technologies for better placement of the paid listings. There was little time to find out. Yahoo gobbled up Overture only a few months later (October 2003). In February 2004 it launched Yahoo! Search with a new crawler - a blend of the Inktomi, Alltheweb, and AltaVista crawlers - and a new database.

Today versions of Yahoo! Search are used at AltaVista, Alltheweb, and, through the former Inktomi customers, MSN, Hotbot, Hotbot.UK, Lycos. It’s a web-search monoculture.

Comparing Results

Yahoo will return the largest number of results for any given query, while the others will return about a quarter of Yahoo's number. It’s possible that the satellite engines don’t include pages that Yahoo considers very similar – the kind of results that bring up the message -- In order to show you the most relevant results, we have omitted some entries very similar to the ones already displayed.

Descriptions of results tend to be the same across all engines, and consist of excerpts with keyword somewhat in context. Occasionally AltaVista and Alltheweb may have longer extracts taken from the first paragraph.

On the matter or ranking, Yahoo and Altavista seem to be most similar. In a search for baseball taxonomy the first 10 hits were identical at the two engines. Alltheweb, Hotbot, and MSN seem to be in a second group where Alltheweb had 8 of the 10 hits Hotbot listed, and Hotbot had 6 out of 10 at MSN.

Diagram showing overlap among Yahoo-based search engines

Seeing the degree of overlap in results is easily done at Thumbshots Ranking (ranking.thumshots.com). This site will compare the first 100 results from Alltheweb, Altavista, Google, MSN Teoma, Yahoo, Wisenut and determine the percentage of common results.

On the query baseball taxonomy Altavista had a 75% overlap with Yahoo, and Alltheweb had a 75% with MSN.

Why are they different at all?

Mostly it’s because they use different ranking algorithms. Rob Sullivan at Li’l Engine reported on tests they ran to see some cases where a site could rank in the top 10 at one engine and not be in the top 1000 at another engine. An example of this turned up in the baseball taxonomy search. There was a product catalog mybulletline.com (which looked suspiciously like spam) with multiple entries for View Taxonomy. Yahoo and AltaVista ranked several of these pages in the top 20, while the others placed them further back at 50 and higher.

Some differences may also be due to paid inclusion. Barry Lloyd at the Search Engine Guide discussed this when describing Yahoo’s program called SiteMatch by which a company pays to have its web site indexed more deeply and frequently. Inktomi, Overture, Alltheweb and AltaVista had their own paid inclusion plans. When Yahoo replaced these with SiteMatch it didn’t convert old contracts. They are just being allowed to play out to the end of their terms at the individual engines.


The Gains

[ Yahoo! ]

options

  1. Yahoo became a better search portal. On any search it can show a variety of content from the web search, the directory, the image database, news, special collections such as movies, travel and much more.
  2. Yahoo.com will show the cached copy of the indexed page. Yahoo Canada (yahoo.ca) will not.
  3. The search syntax is adequate for searching in title or at a site or for a particular filetype.
  4. It indexes the standard set of filetypes (pdf, doc, xls, ppt, txt, html) and also rss news feed formats (xml, rdf, rss) and can identify the feeds – useful to those who use RSS newsreaders to pick up news and weblogs.
  5. Yahoo can handle Boolean operators but it doesn’t advertise this. Use upper case – AND, OR, NOT – but you can get away with leaving out the AND. AltaVista and Alltheweb have better interfaces and help files for using Boolean.
  6. Not documented but inherited from Inktomi is the ability to look for pages that have a special feature such as audio, image.
  7. Yahoo has also put a lot of effort into creating shortcuts to maps, phone number, definitions, weather, calculations and many more. These code words can save time, but always be aware of the source Yahoo uses. You might have a better one in your bookmark list.

The Losses

Diversity

Dan Giancaterino writing in ResouceShelf.com may have been the first to describe the new search scene as Search Engine Monoculture (April 7, 2004) Searchers lost diversity in search features and databases. Different ranking algorithms aren't sufficient especially since there will often be an overlap of more that 40%.

Proximity

AltaVista was the only engine to provide proximity operators – NEAR for words within 10 of each other, and WITHIN for a number you specified. Its Advanced search was the best available for constructing sophisticated searches using Boolean operators.

We can fake proximity at Google by using the * as a wildcard word in phrases at Google. “Three * ducks” will find blind ducks, wood ducks, rubber ducks. Each * represents a word. You can use as many * as you wish. There is also the Google API Proximity Search tool – GAPS - http://www.staggernation.com/cgi-bin/gaps.cgi. It will look for words up to three apart and allow words in that order or either.

The asterisk will work at Yahoo too as will use of a stop word as in “three a ducks”. Yahoo doesn’t index common words like a, an, the even when they are in phrases. [Common word as wildcard also works at Alltheweb, but neither it nore the * are accepted at AltaVista at the moment. This changes back and forth - it might work for you.] Added July 24, 2004

Truncation

AltaVista was the only search engine to allow the * as a truncation operator for picking up word variants or plural forms. Yahoo used to allow it on searches on the directory. Open Directory Project stands out as the only service that still supports *.

How then to easily pick up word variants and plural forms? Use the Google synonym operator ~. This will search and rank on terms that are strongly associated with your search term. ~canadian will pick up Canadians, Canada, canada’s. ~elder will find senior, older, elderly.

Prisma Terms

AltaVista also had the wonderful Prisma Terms that would clusters results for easier narrowing of results. There is still a refining option but it is pale and feeble in comparison. Now it is more like a set of related terms based on common phrases and it is displayed only for very broad queries. The same phrases are offered at Yahoo (as Related) and at Alltheweb (Refine your search).

Gigablast does a much better job grouping documents by shared terms through Gigabits. For the query – feline thyroid conditions – Gigablast can identify hyperthyroidism, hypothyroidism, thyroid disease, and thyroid conditions in cats. Yahoo has nothing.

Flash Files

Alltheweb used to index flash files. It was easy to find tutorials or animations done in flash for different subjects.

Google picked this up quietly. Use filetype to limit the search to swf files – cooking lesson filetype:swf

Yahoo will let you look for pages that link to a flash file. Use feature:flash. Added July 24, 2004


The Remains

Why then would anyone continue to use any of these clones? Because they still have some unique features.

Languages

Note: In June 2004 Yahoo! added a translation facility to Yahoo.com. Translate-this-page will show for French, German, Portuguese, Spanish, Italian and Russian (for translation into English). It will translate pages successively but has trouble displaying diacriticals and images. It's not as good as AltaVista's but getting there. Added July 24, 2004

AltaVista handles languages better than Yahoo. It’s easier to search either all languages or only pages in English (or another language you specify through Settings). The Babel Fish translation tool can handle Chinese, Dutch, English, French, German, Greek, Italian, Japanese, Portuguese, Russian, and Spanish.

However, Google goes one better with its language tools. It doesn’t have the choice that AltaVista offers (only English, French, German, Italian, Spanish and Portuguese) but it will progressively translate pages as you click through the links.

Alltheweb will automatically detect your language according to your country. You can change this through Customize Preferences. But, it won’t translate.

Boolean

The Advanced Search pages at Alltheweb and AltaVista are much more Boolean friendly with boxes for the search queries and help links.

Dates

AltaVista and Alltheweb let you specify a particular date range, rather than be limited to the set periods at Yahoo. Hotbot actually displays the date for each search result as well as allows searching for dates. This isn’t a big issue since dates are notoriously poor on the Web, but being able to note the date can be helpful.

Geography vs Country

Alltheweb has geographic region – Africa, Asia, Canada, and others. Yahoo has country.

Information on Sites

ATW URL Investigator

Alltheweb’s URL Investigator is still the best tool for getting information on a site. Enter the url of the page to get a description, links to ownership, links from other pages, and a link to the Wayback Archive for earlier versions. [No More - Alltheweb dropped this sometime in June 2004.]

Multimedia

Altavista search bar showing tabs

AltaVista has one of the best search interfaces for searching images, audio and video.

For images one can constrain the search to photos or graphics. (Buttons / banners is a third choice but is ineffective alone.) Other choices are for the collection - News, Corbis, the Web; and size of file. Try finding photos of the Peace Tower in Ottawa from the News collection.

Audio can be searched by format, duration (less than or over a minute) and source (web vs news). AltaVista describes the file and points to the page where one will find it. Try this for bird songs.

Similarly one can search for different video formats by duration and source. Watch videos about the bluejay.

Alltheweb uses the same multimedia collections for images and video (though not audio). Search parameters are under Advanced Search for images and video (not audio). They are different from AltaVista’s. For example, under Video there is the choice of downloads or streamed files. For images, format can be jpeg, gif or bmp.

MSN Search and Hotbot don’t search multimedia directly but the Advanced Search form does make it easy to search for pages that link to multimedia content such as image formats, audio, video, shockwave and others.

You can make Yahoo do the same thing by using a search statement with the field name feature – bluejay feature:audio – but it’s not reliable.


Conclusion

In the May 2004 analysis of search engine global market share by OneStat AltaVista still had 1.7% - down from 2.2 % the year before. It is slipping but so is Yahoo – from 21.7% in May 2003 to 21.1% in 2004. Google, on the other hand, continued its climb to 56.4%.

It is too early to know whether Yahoo will succeed in winning over Google users. Both comanies are putting their R&D efforts into making sponsored listings more relevant to the query and more local to the searcher. The last significant new feature was Google's synonym operator, although Yahoo has done some useful work on shortcuts. In fact, as we’ve seen, Yahoo cut good features when it scuttled the Alltheweb and AltaVista search engines.

Whether one uses Yahoo or AltaVista, Alltheweb, may be a matter of taste or even habit. AltaVista and Allthweb have a simpler search interface, some nice customizing options, and will return fewer results - which may be a good thing. Yahoo has said that it will use both as experimental engines for new ranking algorithms. This may hurt rather than help. Experimental search engines aren’t generally popular with searchers – searchers like to know what to expect.

Yahoo

Marker WSG Research Tutorials

Learn more about Yahoo! Search in WSG More Searching. Check the WSG Search Guide for Yahoo.

All links open in a New Window

Exploring Search Engine Overlap by Chris Sherman (June 9, 2004) ClickZ - About the Thumbshots Ranking Tool.

Different results in Altavista/MSN/Yahoo! Rob Sullivan. Li'L Engine (April 13, 2004) Attributes the differences to relevance ranking algorithms.

Sorting Out SiteMatch By Barry Lloyd. Search Engine Guide ( April 23, 2004 )

Google Gains Overall, Competition Builds Niches By Robyn Greenspan ClickZ Stats (June 2, 2004 )

Canadians Are More Active Online Searchers Than Their U.S. Counterparts, According to comScore Networks Press Release (May 13, 2004) Canadians favor Google much over Yahoo, whereas Americans use them almost equally.

Web Search - Yahoo by Gary Price. ResourceShelf. (March 2, 2004) Reports on Yahoo!'s announcement to index content of several public databases that would otherwise be invisible. Includes NPR audio files, Library of Congress, Project Gutenberg.Added July 24, 2004

The New Yahoo! Search By Greg R. Notess, Online (July / August 2004) - recaps changes and looks to future. Added July 24, 2004

,

 

 

 


Newsletter by Gwen Harris.


Copyright Gwen Harris
A service to subscribers of WebSearchGuide (http://www.websearchguide.ca)


Where to Next?

Return to list of newsletters.

 

home tutorials newsletter what's new about