WSG Newsletter:
Fallout from Yahoo! Search
Issue: June 9, 2004
In April 2004, Yahoo pulled the plug on AltaVista and Alltheweb to cries of
consternation from all who write about web search. The sites for these once
eminent search engines still exist but the databases of indexed web pages and
many of their search features are no more. Today searches at Altavista,
Alltheweb, and all the Inktomi-based search services are almost
indistinguishable from Yahoo.
What remains? What alternatives do we have for what was lost? What is the
larger significance?
Some History
Yahoo built the new engine using parts of Inktomi search technology to
create a database that is likely as large as Inktomis of 3 billion pages
and possibly close to Googles 4 billion.
Yahoo, like other search services, struggled for revenue during the dotcom
bust and waning of banner advertisements. Overture (previously GoTo) hit on the
idea of selling small text ads that were keyword activated. Sites that wanted
to be found would bid for keywords. It was the magic formula for great
fortunes. Google, who was then known for its "pure search", followed
up with its own money-making version for paid listings.
Yahoo charged sites for being listed in the directory but it needed to do
more. Its history of acquisitions began in 2002 when it bought Inktomi. This
engine powered other services, notably MSN Search, Hotbot in the US and the UK.
Early in 2003 Overture bought AltaVista, a former search star that suffered
when searchers discovered Google, and Alltheweb, once the showcase for FAST
Search technology. People feared then that AltaVista and Alltheweb would be
neglected as Overture exploited the technologies for better placement of the
paid listings. There was little time to find out. Yahoo gobbled up Overture
only a few months later (October 2003). In February 2004 it launched Yahoo!
Search with a new crawler - a blend of the Inktomi, Alltheweb, and AltaVista
crawlers - and a new database.
Today versions of Yahoo! Search are used at AltaVista, Alltheweb, and,
through the former Inktomi customers, MSN, Hotbot, Hotbot.UK, Lycos. Its
a web-search monoculture.
Comparing Results
Yahoo will return the largest number of results for any given query, while
the others will return about a quarter of Yahoo's number. Its possible
that the satellite engines dont include pages that Yahoo considers very
similar the kind of results that bring up the message -- In order to
show you the most relevant results, we have omitted some entries very similar
to the ones already displayed.
Descriptions of results tend to be the same across all engines, and consist
of excerpts with keyword somewhat in context. Occasionally AltaVista and
Alltheweb may have longer extracts taken from the first paragraph.
On the matter or ranking, Yahoo and Altavista seem to be most similar. In a
search for baseball taxonomy the first 10 hits were identical at the two
engines. Alltheweb, Hotbot, and MSN seem to be in a second group where
Alltheweb had 8 of the 10 hits Hotbot listed, and Hotbot had 6 out of 10 at
MSN.
Seeing the degree of overlap in results is easily
done at Thumbshots Ranking (ranking.thumshots.com). This site will compare the first 100
results from Alltheweb, Altavista, Google, MSN Teoma, Yahoo, Wisenut and
determine the percentage of common results.
On the query baseball taxonomy Altavista had a 75% overlap with
Yahoo, and Alltheweb had a 75% with MSN.
Why are they different at all?
Mostly its because they use different ranking algorithms. Rob Sullivan
at Lil Engine reported on tests they ran to see some cases where a site
could rank in the top 10 at one engine and not be in the top 1000 at another
engine. An example of this turned up in the baseball taxonomy search. There was
a product catalog mybulletline.com (which looked suspiciously like spam) with
multiple entries for View Taxonomy. Yahoo and AltaVista ranked several of these
pages in the top 20, while the others placed them further back at 50 and
higher.
Some differences may also be due to paid inclusion. Barry Lloyd at the
Search Engine Guide discussed this when describing Yahoos program called
SiteMatch by which a company pays to have its web site indexed more deeply and
frequently. Inktomi, Overture, Alltheweb and AltaVista had their own paid
inclusion plans. When Yahoo replaced these with SiteMatch it didnt
convert old contracts. They are just being allowed to play out to the end of
their terms at the individual engines.
The Gains
- Yahoo became a better search portal. On any search it can show a variety of
content from the web search, the directory, the image database, news, special
collections such as movies, travel and much more.
- Yahoo.com will show the cached copy of the indexed page. Yahoo Canada
(yahoo.ca) will not.
- The search syntax is adequate for searching in title or at a site or for a
particular filetype.
- It indexes the standard set of filetypes (pdf, doc, xls, ppt, txt, html)
and also rss news feed formats (xml, rdf, rss) and can identify the feeds
useful to those who use RSS newsreaders to pick up news and weblogs.
- Yahoo can handle Boolean operators but it doesnt advertise this. Use
upper case AND, OR, NOT but you can get away with leaving out the
AND. AltaVista and Alltheweb have better interfaces and help files for using
Boolean.
- Not documented but inherited from Inktomi is the ability to look for pages
that have a special feature such as audio, image.
- Yahoo has also put a lot of effort into
creating shortcuts to maps, phone number, definitions,
weather, calculations and many more. These code words can save time, but always
be aware of the source Yahoo uses. You might have a better one in your bookmark
list.
The Losses
Diversity
Dan Giancaterino writing in ResouceShelf.com may have been the first to
describe the new search scene as
Search Engine Monoculture (April 7, 2004) Searchers lost
diversity in search features and databases. Different ranking algorithms aren't
sufficient especially since there will often be an overlap of more that 40%.
Proximity
AltaVista was the only engine to provide proximity operators NEAR for
words within 10 of each other, and WITHIN for a number you specified. Its
Advanced search was the best available for constructing sophisticated searches
using Boolean operators.
We can fake proximity at Google by using the * as a wildcard word in phrases
at Google.
Three
* ducks will find blind ducks, wood ducks, rubber ducks. Each *
represents a word. You can use as many * as you wish. There is also the Google
API Proximity Search tool GAPS -
http://www.staggernation.com/cgi-bin/gaps.cgi. It will look
for words up to three apart and allow words in that order or either.
The asterisk will work at Yahoo too as will use of a stop word as in
three a ducks. Yahoo doesnt index
common words like a, an, the even when they are in phrases. [Common word
as wildcard also works at Alltheweb, but neither it nore the * are accepted at
AltaVista at the moment. This changes back and forth - it might work for you.]
Added July 24, 2004
Truncation
AltaVista was the only search engine to allow the * as a truncation operator
for picking up word variants or plural forms. Yahoo used to allow it on
searches on the directory. Open Directory Project stands out as the only
service that still supports *.
How then to easily pick up word variants and plural forms? Use the Google
synonym operator ~. This will search and rank on terms that are strongly
associated with your search term. ~canadian will pick up Canadians, Canada,
canadas. ~elder will find senior, older, elderly.
Prisma Terms
AltaVista also had the wonderful Prisma Terms that would clusters results
for easier narrowing of results. There is still a refining option but it is
pale and feeble in comparison. Now it is more like a set of related terms based
on common phrases and it is displayed only for very broad queries. The same
phrases are offered at Yahoo (as Related) and at Alltheweb (Refine your
search).
Gigablast does a much better job grouping documents by shared terms through
Gigabits. For the query feline thyroid conditions Gigablast can identify
hyperthyroidism, hypothyroidism, thyroid disease, and thyroid conditions in
cats. Yahoo has nothing.
Flash Files
Alltheweb used to index flash files. It was easy to find tutorials or
animations done in flash for different subjects.
Google picked this up quietly. Use filetype to limit the search to swf files
cooking
lesson filetype:swf
Yahoo will let you look for pages that link to a flash file. Use
feature:flash. Added July 24, 2004
The Remains
Why then would anyone continue to use any of these clones? Because they
still have some unique features.
Languages
Note: In June 2004 Yahoo! added a translation facility to Yahoo.com.
Translate-this-page will show for French, German, Portuguese, Spanish, Italian
and Russian (for translation into English). It will translate pages
successively but has trouble displaying diacriticals and images. It's not as
good as AltaVista's but getting there. Added July 24,
2004
AltaVista handles languages better than Yahoo. Its easier to search
either all languages or only pages in English (or another language you specify
through Settings). The Babel Fish translation tool can handle Chinese, Dutch,
English, French, German, Greek, Italian, Japanese, Portuguese, Russian, and
Spanish.
However, Google goes one better with its
language tools.
It doesnt have the choice that AltaVista offers (only English, French,
German, Italian, Spanish and Portuguese) but it will progressively translate
pages as you click through the links.
Alltheweb will automatically detect your language according to your country.
You can change this through Customize Preferences. But, it wont
translate.
Boolean
The Advanced Search pages at Alltheweb and AltaVista are much more Boolean
friendly with boxes for the search queries and help links.
Dates
AltaVista and Alltheweb let you specify a particular date range, rather than
be limited to the set periods at Yahoo. Hotbot actually displays the date for
each search result as well as allows searching for dates. This isnt a big
issue since dates are notoriously poor on the Web, but being able to note the
date can be helpful.
Geography vs Country
Alltheweb has geographic region Africa, Asia, Canada, and others.
Yahoo has country.
Information on Sites

Allthewebs URL Investigator is still the best tool for getting
information on a site. Enter the url of the page to get a description, links to
ownership, links from other pages, and a link to the Wayback Archive for
earlier versions. [No More - Alltheweb dropped this sometime in June
2004.]
Multimedia
AltaVista has one of the best search interfaces for searching images, audio
and video.
For images one can constrain the search to photos or graphics. (Buttons /
banners is a third choice but is ineffective alone.) Other choices are for the
collection - News, Corbis, the Web; and size of file. Try finding photos of the
Peace Tower in Ottawa from the News collection.
Audio can be searched by format, duration (less than or over a minute) and
source (web vs news). AltaVista describes the file and points to the page where
one will find it. Try this for bird songs.
Similarly one can search for different video formats by duration and source.
Watch videos about the bluejay.
Alltheweb uses the same multimedia collections for images and video (though
not audio). Search parameters are under Advanced Search for images and video
(not audio). They are different from AltaVistas. For example, under Video
there is the choice of downloads or streamed files. For images, format can be
jpeg, gif or bmp.
MSN Search and Hotbot dont search multimedia directly but the Advanced
Search form does make it easy to search for pages that link to multimedia
content such as image formats, audio, video, shockwave and others.
You can make Yahoo do the same thing by using a search statement with the
field name feature bluejay feature:audio but its not
reliable.
Conclusion
In the May 2004 analysis of search engine global market share by OneStat
AltaVista still had 1.7% - down from 2.2 % the year before. It is slipping but
so is Yahoo from 21.7% in May 2003 to 21.1% in 2004. Google, on the
other hand, continued its climb to 56.4%.
It is too early to know whether Yahoo will succeed in winning over Google
users. Both comanies are putting their R&D efforts into making sponsored
listings more relevant to the query and more local to the searcher. The last
significant new feature was Google's synonym operator, although Yahoo has done
some useful work on shortcuts. In fact, as weve seen, Yahoo cut good
features when it scuttled the Alltheweb and AltaVista search engines.
Whether one uses Yahoo or AltaVista, Alltheweb, may be a matter of taste or
even habit. AltaVista and Allthweb have a simpler search interface, some nice
customizing options, and will return fewer results - which may be a good thing.
Yahoo has said that it will use both as experimental engines for new ranking
algorithms. This may hurt rather than help. Experimental search engines
arent generally popular with searchers searchers like to know what
to expect.
|