WSG Newsletter: Search News from
Internet Librarian
Issue: November 17, 2002
 Search gurus, mavens and wizards were
plentiful at the Internet Librarian conference (November 4 6, 2002) held
in Palm Springs, California. On the podium were Danny Sullivan and Chris
Sherman, Search Engine
Watch; Gary Price, the
virtual acquisition
shelf man; Greg Notess, author and librarian; Avi Rappoport,
search tools analyst.
Reva Basch, editor of the super searcher series, moderated a panel discussion
of Super Searcher Stories: Tips and Techniques. While Internet Librarian is not
exclusively about Internet search, search engines and search strategies do draw
the crowds. The following is a synthesis of the major themes (with some
additional comment) together with updates and tips from the masters.
The Engines are Getting Better
Relevance is quite high. People do get answers from the first page. Chris
Sherman and Danny Sullivan put several engines through a torture
test. They ran 10 searches at 8 engines to see how well the search
engines would do returning known perfect pages for a set of search
terms. The sites had been selected from submissions by SEW as best for a topic.
From these they selected a variety of types (corporate, government, personal,
non-profit) and determined two or three keywords as search terms. The engines
were Ask Jeeves, AllTheWeb, AltaVista, Google, Inktomi, Lycos, MSN Search, and
Yahoo.
Engines scored a point if they had the target site in the first 10 results
and a half a point for a related page. All engines did very well Google,
Yahoo, and MSN Search at 9.5. Altavista was the lowest with 6.5. (Ask Jeeves
improved its score upgrading Teoma, its search engine. The same will likely be
true of Altavista from its recent changes.
| Google
|
9.5
|
Inktomi
|
8.5
|
| Yahoo
|
9.5
|
Lycos
|
8.5
|
| MSN
|
9.5
|
Ask Jeeves
|
8.0
|
| AlltheWeb
|
9.0
|
Altavista
|
6.5
|
Out of curiousity, they also tested Overture and its listing of
pay-for-placement. None of the perfect pages turned up in pay-for-placement
(although 4 did from the Inktomi search that Overture runs).
The Search Engine "Perfect Page" Test
www.searchenginewatch.com/searchday/02/sd1104-pptest.html
Perfect Page Test: Criteria and Detailed Results
www.searchenginewatch.com/searchday/02/pptest-details.html
Another significant improvement has been the big bite taken out of the
Invisible Web, thanks, in large part, to the indexing of more document types on
the Web. Google was the first to index pdf documents. AlltheWeb followed, and
in November 2002 Altavista joined. This is a tremendous aid to people searching
for government and corporate papers and publications.
Google indexes several other popular formats excel (xls), word (doc),
powerpoint (ppt), and rich text format (RTF) - and Alltheweb indexes text in
Flash files handy when looking for instructional materials.
Googlification of the Web
Google is everywhere. It powers Alexa, AOL, Netscape, iWON, BBCi in the UK,
Sympatico in Canada, and makes up most of the search results at Yahoo.
Its also the engine at the new
MyWay.com portal, just
launched and a direct competitor to Yahoo.
Danny Sullivan recapped some figures from Nielsen / NetRatings that show
Google very much in the lead.
Based on August 2002 data, searchers spent 15 million hours at Google itself
and another 4.2 at AOL and 6.2 at Yahoo both Google powered. Combining
these, Google received 76 % of the total estimated 33.5 million hours, far
exceeding MSNs 15% or Ask Jeeves 6%. Alltheweb didnt even make it
to the charts.
Sullivan suggests that there may be some backlash against Google. Certainly,
Google has been singled out in legal actions and most recently received flack
for complying with the laws of France and Germany to exclude racist sites.
Nielsen//NetRatings Search Engine Ratings (Sept 17,
2002)
www.searchenginewatch.com/reports/netratings.html
Google: Can The Marcia Brady Of Search Stay Sweet? (Sept
3, 2002)
searchenginewatch.com/sereport/02/09-google.html
Directories in Trouble
Tied in with the soaring success of Google is the decline of directories.
Danny Sullivan charted the presence of human compiled directories against
crawler-based search services. For the period 1999 to 2001 the directories
reigned: Yahoo, Looksmart, Open Directory, Ask Jeevess knowledge base,
and NBCis Snap. In the last few months this has been reversed. Sullivan
believes that the success of Google prodded other search engines to catch up.
The cost of an editorial staff has surely been a factor too.
Signs of decline are clear. The most dramatic was Yahoos new look to
show off its portal parts and make search, and especially its directory,
secondary. Search results are predominantly from Google with just a few marked
as selected for the Yahoo directory. Its a shame. Sullivan considers the
one line description done by Yahoo editors to be much better than the scrambled
text provided by Google.
As well, there has been the commercialization of listings at Yahoo and
Looksmart, and, not mentioned, the very tired appearance and performance of
Open Directory Project.
Karen Schneider, coordinator of the Librarianss Index to the Internet
(lii.org), is bringing librarians of the scholarly, librarian-built directories
together in a project named Fiat Lux the objective to build
a Yahoo! with values and a brain.
Creating a Yahoo! with Values (July 15, 2002) Library
Journal
libraryjournal.reviewsnews.com/
index.asp?layout=article&articleid=CA232358
Teoma Pushes Forward
Teoma introduced several changes in the first half of November: fielded
searches on language, title, site, and url; spell check; and the beginning of
Boolean logic. An Advanced Search page is planned.
Paul Gardi, Vice President at Ask Jeeves, described Teomas community
approach to web search. Teoma identifies local subject communities
within a set of search results. These are clusters that are highly inter-linked
and have similar words. Teoma labels each with a frequently occurring phrase
and lists these communities under Refine. These clusters will often have a hub
page one page that links to many others. These are link collections and
are listed as Resources. Lastly. Teoma ranks results according to text
analysis, popularity (links), and status. They call this subject-specific
popularity by which the more highly regarded (linked) sites within a subject
community are ranked more highly. At Ask Jeeves / Teoma, relevance is the most
important consideration. Gardi feels that Teomas three-dimensional search
provides the most options to searchers.
New search options are:
- site: limit to a domain, eg, site:utoronto.ca mcluhan
- intitle:"marshall mcluhan" - looks for at least one word
in title. Can be combined with site: and inurl.
- inurl:mcluhan
- Boolean OR -- "marshall mcluhan" OR "harold
innis". Note that nesting and logical combination with other terms is not
supported. A AND (B OR C) results in (A AND B) OR C.
Greg Notess Teoma Review (Nov 11, 2002)
www.searchengineshowdown.com/
features/teoma/review.html
Altavista Starts to Catch Up
Altavista has had a rocky time keeping its database fresh and relevant. The
new AV Prisma has been the one feature to win praise. AV picks frequently
occurring terms from the best matching pages and presents these as possible
refinements. Clicking on one will add it to the search query. Its
available for two levels of search.
But Phoenix will rise again (as Altavista calls its index). On November 12
AV added pdf documents, increased its overall index to over 1 billion items,
promised more frequent indexing of pages (50% of the ones viewed), took on a
new logo, simplified the front page, and mercy upon mercies, banned the pop-ups
and pop-unders.
Altavista is the only engine to offer the NEAR operator to request proximity
and allows one to tweak relevance ranking (through Sort By on the Advanced
Search).
Best Features
Features become more important as search engines become roughly the same
size in indexed pages and use variations of link analysis.
Important ones to consider:
- Search by filetype, especially pdf. Available at Google, Alltheweb, and
Altavista.
- Spell check
- Personalization. Alltheweb has skins work with the colours you
like.
- Power commands. Altavista and Alltheweb are stronger for field searching
(title, site, link etc) than Google. Teoma is building its.
- Search folders that group search results by topic.
- Proximity operators at Altavista.
Tips
-
Use specialized tools, advised Gary Price and Mary Ellen Bates. Think
of the type of tool you need, find it, bookmark it. As an example of a
specialized tool, Gary Price and Chris Sherman showed Flight Tracker at
Trip.com, an example of a tool
for finding real-time information about flights in the air.
-
Use the Web to find facts, when the topic is new, and when you
have more time than money, advised Reva Basch. Greg Notess says the Web
is excellent for everyday answers. Absolutely, in a flash, you can find airline
schedules, tax rules, locations of store, best sellers, and even the time of
day from the Web. To this, Paula Hane adds point of view and opinion and
breaking news.
-
Use the Wayback machine (www.archive.org) to check earlier versions of a page. Paula
Hane uses it to follow up on claims made in press releases.
-
Change search terms. In the perfect-page test, Chris Sherman found that
choice of query terms is very important. cop jokes found a target
site but police humor did not.
-
Use a variety of engines. The perfect-page test also showed that the
engines do rank differently. One site might get a second page position at one
engine, and first page at another.
-
Use search engines that have topic clustering or folders in order to
get an idea of where to go next a tip from Marydee Ojala.
-
Keep your page of results as an anchor page open results in a
new window. Greg Notess has found that search results can change from one
moment to the next. (iLor
Hydralinks is a tool that can help one work more easily from the main
page.)
-
Reload the last page of results, just in case the search engine
terminated the search early. This works for Greg Notess to bring up a few more
results.
Go beyond the first 10 hits. In fact, Greg recommends setting the
display of results to 100 per page to make it easier and faster to scan.
-
Dont bother with dates.
Gigablast is the only
engine to show the date spider and modified. Google will show spidered dates on
recently indexed pages. It dropped even the date on its cached copies. Dates
are only useful when searching Google Groups (for newsgroups).
-
Save any page that has information you think youd like to refer
to again. Gary Price says pages on the Web are ephemeral. SurfSaver (www.surfsaver.com) from
AskSam is a good tool for this.
-
Seek out authorities on a subject using Teoma and checking the
Resources listed. Gary Price recommends this as a way to build a collection.
Other News
-
Wisenut, which was bought by Looksmart in early 2002, is being
upgraded. The index has been refreshed somewhat and significant improvements
are expected in the new year.
Lycos is revamping Hotbot. Rumour has it that it will become a
meta-search engine.
Conclusion
Has web searching changed in the last year? Reva Basch thinks so. She took
the year off and on her return saw that searchers could do much more. We can
see this for ourselves. The indexes are larger, its much easier to find
answers to everyday questions; relevance, as several have noted, is better;
specialized search tools for news and multimedia have improved. But, search
engines are sure to reach a limit in what they can crawl and index. And, as
Danny Sullivan explained, link manipulation has become an issue. What next?
Some say more specialized tools and likely personalization. Check back
in a year.
|
|