WSG Newsletter: Altavista
Update
Issue: March 4, 2002
"First you say you will and then you won't.
Then you say you do and then you don't.
You're undecided now. Well, what are you going to do?"
If ever a song applied to a search engine, this one fits Altavista.
Altavista has been tweaking and fiddling with its search engine since the
beginning of the year. It has a new look and a new logic but a questionable
freshness and reliability.
Update: March 14, 2002 - Altavista has
changed the Advanced Search to include the option to sort the results by
selected words.
Update: August 29, 2002 - By sometime in June
AltaVista completely converted to an AND engine to the great relief of its
users.
|
 The Look
The look is now tabbed and much better for it. It is much easier to
check a search against the Web, the multimedia collection, News, and
AltaVistas Looksmart directory.
Advanced Search
But Search Assistant with the forms-based input has been merged with
Advanced Search for boolean searching. The result is neither fish nor fowl. The
forms-based input covers searching for all the words, any, phrase, or none, and
also location but not title, not backwards link, and not related pages
all useful features at AltaVista.
Boolean
There is a box for free-form boolean query (use of the word
free-form is an interesting touch). This is similar to the old Advanced. For a
couple of weeks AV dropped the sort-by feature but restored it in mid-March.
Searchers should identify the words they most want to see - these will control
the sort. Use this to do several slices through the results set. (Updated March
14, 2002)
The free-form boolean used in AltaVista Advanced is different
from the boolean in Basic. In Basic, the operators must be in Upper case
AND, OR, AND NOT, NEAR. In Advanced they can be either (though this does
change).
Proximity
As well Advanced accepts some undocumented proximity operators. Greg Notess
lists these in
his review of AltaVista. We can ask for words or phrases to
be within n words of each other: within 2, within 5, within 10
any number (within 10 is the equivalent of Near). Also there is a Before
(<) one word before another which can be combined with Near
(~) like this: origin <~ bicameral mind. This makes AltaVista
Advanced the tool of choice when we need to tighten a search by looking for
words near each other.
Case Sensitivity
The two vary in case sensitivity also. In Basic, phrases are case sensitive.
Tales and Fables will pick up less than tales and
fables. The quotation marks can be used on a single word to pick up upper
case; eg. ASCII. Advanced is case sensitive all the time no
quotation marks are needed to find ACSII. When capitalization is not used,
AltaVista searches for both cases.
Is it All Or Any?
Notice: August 29, 2002 - AltaVista now
defaults to a true AND in its Web search. The following counts no longer apply
for web searching. OR still applies for Directory and Images.
|
Most search engines default to looking for all the words in a query.
AltaVista plays at being both an AND and an OR engine. For months it would look
for ALL words when there were 3 or 4 entered, and was more likely to pick up
ANY words beyond that. During January 2002, AltaVista looked for ALL words
regardless of number. In fact, John Ellis Sr. VP of Engineering at AltaVista
confirmed this with Tara Calishain of ResearchBuzz in February (See
ResearchBuzz Feb 6 13, 2002) saying Extensive
testing on our index over the past few months has shown that ANDing of the
query terms provides users with better overall results. This change is the
latest step in AltaVista's continuing mission to provide users with the best
search results on the web." But by March 1, 2002 it had switched back
to a mix.
Part of the puzzling counts has been due to AltaVistas facility to
identify phrases.
Consider the following search statements.
| Word Search |
| roch voisine |
2658 results |
| "roch voisine" |
2042 results. Drops with the phrase. |
| roch voisine concert - |
2165 hits. Looks like an AND but a true AND is 398 hits |
| "roch voisine" concert |
1805 hits. Also looks like an AND but ALL words would have been 324 hits
(see below). |
| +"roch voisine" concert |
1805 results. This requires roch voisine and makes concert optional gives
same results as without the +. A true OR engine would have shown 2042
hits. |
| "roch voisine" +concert |
2.5 million results. But put the + on concert and the results soar. |
| +"roch voisine" +concert |
324 results. Put the + on both terms to get the AND and results drop. |
| Title Search |
| title:"roch voisine" |
27 results |
| title:"roch voisine" concert |
244,916 results. No AND here. We must add the AND ourselves. |
| title:"roch voisine" AND concert |
6 results |
The WEB Directory is a full OR.
chomsky linguistics 7726 results. This is roughly the sum of chomsky
(85) and linguistics (7674).
+chomsky linguistics 85 results. This requires chomsky and ranks
linguistics at the top. Gets the same number as chomsky alone (as it should).
+chomsky +linguistics 33 results. This is the AND search and could be
written as chomsky AND linguistics.
The Web search accepts AND, OR, AND NOT, but not NEAR.
NEWS Search is an AND
Chomsky alone finds 5 stories on March 4, 2002. Linguistics has 6. Together
chomsky linguistics finds 2.
The News search ignores AND, OR, AND NOT. It will accept the minus sign
() to exclude stories.
IMAGE Search is an OR.
Chomsky has 211 images, and linguistics has 819. Together they have 1023
results (close enough). Put a + sign in front of either chomsky or linguistics
or both, and youll get 5 results. Whatever you do it switches to an AND.
Image search will accept boolean.
What are we to make of this?
What we do to broaden or narrow a search query is influenced by whether a
search engine looks for all or any words. It may be best to continue to treat
AltaVista as an OR engine. If the results are very high, require the key words
and concepts with the + sign (or connect the words with AND) as well as using
more specific terms. If the results are too few, use fewer or more general
words and/or construct some alternatives using the Boolean OR. At AltaVista,
only the News search wont recognize AND, OR.
Display
The description AltaVista displays for a page is now dynamically generated.
It will use part or all of a meta description when available and will create
the rest from text in the page that seems to best relate to the query.
In this search for "Margaret Atwood" poetry, AltaVista
finds:
"By Brittney Goodman Margaret Atwood is my favorite author.
Her works include the ... Alias Grace. The following are some links concerning
Margaret Atwood and her works. The official M. Atwood ..."URL:http://www.moorhead.msus.edu/chenault/atwood.htm
The actual page has Margaret Atwood WWW Resources by Brittney Goodman. The
first paragraph is:
"Margaret Atwood is my favorite author. Her works include the
novels, Surfacing, The Edible Woman, The Handmaid's Tale, Lady Oracle, and The
Robber Bride, and also some fine poetry and short works. I've included links
concerning her latest novel, Alias Grace. The following are some links
concerning Margaret Atwood and her works"
AltaVista Canada
There is some good news about AltaVista Canada but not enough for Canadian
searchers to rely on this search engine. Google.ca is much better.
AltaVista Canada seems to have recovered its ability to identify Canadian
sites regardless of domain. It is also showing the number of results again
(that had been suspended for a time).
However, it seems to have dropped many government pages. The option on the
front page to search Government pages has not worked for several weeks
(AltaVista has not responded to questions about this). More troubling is that a
search at AltaVista World (and Canada) for host:.gc.ca (to find the number of
pages indexed that are in the Government of Canada domain) produces only 46,988
pages compared to 1.8 million in November 2001. That includes French and
English.
AltaVista Canada still has the old AltaVista Advanced Search where one can
use sort-by to hand rank results for both Canada and World. The undocumented
proximity operators work here too within and before ( <~ to look for
a word before another but within 10 words).
By my count AltaVista Canada has indexed 11.4 million pages. This seems low.
When Telus was co-owner they claimed at least 14 million in 1999.
The breakdown was: .ca 8,278,968 / .com 2,171,775 / .net 401,643 / .org
562,936 / .info 6 / .biz 4
Freshness does not look good either. AltaVista Canada cant find
stories at the Globe and Mail or the National Post on Israel with a last
modified date of more than February 3, 2002.
Will AltaVista Survive?
Searchers have been leaving AltaVista in droves. During January Jupiter
Media Metrix watched where web searchers searched. Only 5.7% used AltaVista,
whereas 24.5% went to Google, and 36.3% used MSN (since searches in the
location bar of the IE browser were part of the count, searchers may not have
intentionally used MSN). The report has some flaws specifically it
counts unique users rather than real traffic but the message is clear
AltaVista is slipping. In part this is because Google really is so good,
but all the gyrations of the last year, the summer months when they didnt
update the databases, the use of paid listings (Products and Services) and the
practice of paid inclusion and the bad press surrounding that all these
factors surely contributed.
Today it is more a niche tool to be used when we wish to do more
complicated boolean constructions or narrow a search by using the proximity
operators to find words close together. AltaVista is also the only one to
really work with case well. For these reasons, as well as the News search and
the multi-media collections, it has a place in our tool kit. This could change
if AltaVista fails to keep its databases fresh.
|