Web Search Guide banner
 

WSG Newsletter: Search News from Internet Librarian

Issue: November 17, 2002

palm tree

Search gurus, mavens and wizards were plentiful at the Internet Librarian conference (November 4 – 6, 2002) held in Palm Springs, California. On the podium were Danny Sullivan and Chris Sherman, Search Engine Watch; Gary Price, the virtual acquisition shelf man; Greg Notess, author and librarian; Avi Rappoport, search tools analyst. Reva Basch, editor of the super searcher series, moderated a panel discussion of Super Searcher Stories: Tips and Techniques. While Internet Librarian is not exclusively about Internet search, search engines and search strategies do draw the crowds. The following is a synthesis of the major themes (with some additional comment) together with updates and tips from the masters.

The Engines are Getting Better

Relevance is quite high. People do get answers from the first page. Chris Sherman and Danny Sullivan put several engines through a “torture test”. They ran 10 searches at 8 engines to see how well the search engines would do returning known “perfect pages” for a set of search terms. The sites had been selected from submissions by SEW as best for a topic. From these they selected a variety of types (corporate, government, personal, non-profit) and determined two or three keywords as search terms. The engines were Ask Jeeves, AllTheWeb, AltaVista, Google, Inktomi, Lycos, MSN Search, and Yahoo.

Engines scored a point if they had the target site in the first 10 results and a half a point for a related page. All engines did very well – Google, Yahoo, and MSN Search at 9.5. Altavista was the lowest with 6.5. (Ask Jeeves improved its score upgrading Teoma, its search engine. The same will likely be true of Altavista from its recent changes.

Google

9.5

Inktomi

8.5

Yahoo

9.5

Lycos

8.5

MSN

9.5

Ask Jeeves

8.0

AlltheWeb

9.0

Altavista

6.5

Out of curiousity, they also tested Overture and its listing of pay-for-placement. None of the perfect pages turned up in pay-for-placement (although 4 did from the Inktomi search that Overture runs).

Reading glassesThe Search Engine "Perfect Page" Test
www.searchenginewatch.com/searchday/02/sd1104-pptest.html

Reading glassesPerfect Page Test: Criteria and Detailed Results
www.searchenginewatch.com/searchday/02/pptest-details.html

Another significant improvement has been the big bite taken out of the Invisible Web, thanks, in large part, to the indexing of more document types on the Web. Google was the first to index pdf documents. AlltheWeb followed, and in November 2002 Altavista joined. This is a tremendous aid to people searching for government and corporate papers and publications.

Google indexes several other popular formats – excel (xls), word (doc), powerpoint (ppt), and rich text format (RTF) - and Alltheweb indexes text in Flash files – handy when looking for instructional materials.

Googlification of the Web

Google is everywhere. It powers Alexa, AOL, Netscape, iWON, BBCi in the UK, Sympatico in Canada, and makes up most of the search results at Yahoo. It’s also the engine at the new MyWay.com portal, just launched and a direct competitor to Yahoo.

Danny Sullivan recapped some figures from Nielsen / NetRatings that show Google very much in the lead.

Based on August 2002 data, searchers spent 15 million hours at Google itself and another 4.2 at AOL and 6.2 at Yahoo – both Google powered. Combining these, Google received 76 % of the total estimated 33.5 million hours, far exceeding MSN’s 15% or Ask Jeeves 6%. Alltheweb didn’t even make it to the charts.

Sullivan suggests that there may be some backlash against Google. Certainly, Google has been singled out in legal actions and most recently received flack for complying with the laws of France and Germany to exclude racist sites.

Reading glassesNielsen//NetRatings Search Engine Ratings (Sept 17, 2002)
www.searchenginewatch.com/reports/netratings.html

Reading glassesGoogle: Can The Marcia Brady Of Search Stay Sweet? (Sept 3, 2002)
searchenginewatch.com/sereport/02/09-google.html

Directories in Trouble

Tied in with the soaring success of Google is the decline of directories. Danny Sullivan charted the presence of human compiled directories against crawler-based search services. For the period 1999 to 2001 the directories reigned: Yahoo, Looksmart, Open Directory, Ask Jeeves’s knowledge base, and NBCi’s Snap. In the last few months this has been reversed. Sullivan believes that the success of Google prodded other search engines to catch up. The cost of an editorial staff has surely been a factor too.

Signs of decline are clear. The most dramatic was Yahoo’s new look to show off its portal parts and make search, and especially its directory, secondary. Search results are predominantly from Google with just a few marked as selected for the Yahoo directory. It’s a shame. Sullivan considers the one line description done by Yahoo editors to be much better than the scrambled text provided by Google.

As well, there has been the commercialization of listings at Yahoo and Looksmart, and, not mentioned, the very tired appearance and performance of Open Directory Project.

Karen Schneider, coordinator of the Librarians’s Index to the Internet (lii.org), is bringing librarians of the scholarly, librarian-built directories together in a project named Fiat Lux – the objective – “to build a Yahoo! with values and a brain”.

Reading glassesCreating a Yahoo! with Values (July 15, 2002) Library Journal
libraryjournal.reviewsnews.com/
index.asp?layout=article&articleid=CA232358

Teoma Pushes Forward

Teoma introduced several changes in the first half of November: fielded searches on language, title, site, and url; spell check; and the beginning of Boolean logic. An Advanced Search page is planned.

Paul Gardi, Vice President at Ask Jeeves, described Teoma’s community approach to web search. Teoma identifies “local subject communities” within a set of search results. These are clusters that are highly inter-linked and have similar words. Teoma labels each with a frequently occurring phrase and lists these communities under Refine. These clusters will often have a hub page – one page that links to many others. These are link collections and are listed as Resources. Lastly. Teoma ranks results according to text analysis, popularity (links), and status. They call this subject-specific popularity by which the more highly regarded (linked) sites within a subject community are ranked more highly. At Ask Jeeves / Teoma, relevance is the most important consideration. Gardi feels that Teoma’s three-dimensional search provides the most options to searchers.

Teoma - one search, three responses

New search options are:

  • site: limit to a domain, eg, site:utoronto.ca mcluhan
  • intitle:"marshall mcluhan" - looks for at least one word in title. Can be combined with site: and inurl.
  • inurl:mcluhan
  • Boolean OR -- "marshall mcluhan" OR "harold innis". Note that nesting and logical combination with other terms is not supported. A AND (B OR C) results in (A AND B) OR C.

Reading glassesGreg Notess Teoma Review (Nov 11, 2002)
www.searchengineshowdown.com/
features/teoma/review.html

Altavista Starts to Catch Up

Altavista has had a rocky time keeping its database fresh and relevant. The new AV Prisma has been the one feature to win praise. AV picks frequently occurring terms from the best matching pages and presents these as possible refinements. Clicking on one will add it to the search query. It’s available for two levels of search.

But Phoenix will rise again (as Altavista calls its index). On November 12 AV added pdf documents, increased its overall index to over 1 billion items, promised more frequent indexing of pages (50% of the ones viewed), took on a new logo, simplified the front page, and mercy upon mercies, banned the pop-ups and pop-unders.

Altavista is the only engine to offer the NEAR operator to request proximity and allows one to tweak relevance ranking (through Sort By on the Advanced Search).

Best Features

Features become more important as search engines become roughly the same size in indexed pages and use variations of link analysis.

Important ones to consider:

  • Search by filetype, especially pdf. Available at Google, Alltheweb, and Altavista.
  • Spell check
  • Personalization. Alltheweb has skins – work with the colours you like.
  • Power commands. Altavista and Alltheweb are stronger for field searching (title, site, link etc) than Google. Teoma is building its.
  • Search folders that group search results by topic.
  • Proximity operators at Altavista.

Tips

  • Use specialized tools, advised Gary Price and Mary Ellen Bates. Think of the type of tool you need, find it, bookmark it. As an example of a specialized tool, Gary Price and Chris Sherman showed Flight Tracker at Trip.com, an example of a tool for finding real-time information about flights in the air.

  • Use the Web to find facts, when the topic is new, and “when you have more time than money”, advised Reva Basch. Greg Notess says the Web is excellent for everyday answers. Absolutely, in a flash, you can find airline schedules, tax rules, locations of store, best sellers, and even the time of day from the Web. To this, Paula Hane adds point of view and opinion and breaking news.

  • Use the Wayback machine (www.archive.org) to check earlier versions of a page. Paula Hane uses it to follow up on claims made in press releases.

  • Change search terms. In the perfect-page test, Chris Sherman found that choice of query terms is very important. “cop jokes” found a target site but “police humor” did not.

  • Use a variety of engines. The perfect-page test also showed that the engines do rank differently. One site might get a second page position at one engine, and first page at another.

  • Use search engines that have topic clustering or folders in order to get an idea of where to go next – a tip from Marydee Ojala.

  • Keep your page of results as an anchor page – open results in a new window. Greg Notess has found that search results can change from one moment to the next. (iLor Hydralinks is a tool that can help one work more easily from the main page.)

  • Reload the last page of results, just in case the search engine terminated the search early. This works for Greg Notess to bring up a few more results.

  • Go beyond the first 10 hits. In fact, Greg recommends setting the display of results to 100 per page to make it easier and faster to scan.

  • Don’t bother with dates. Gigablast is the only engine to show the date spider and modified. Google will show spidered dates on recently indexed pages. It dropped even the date on its cached copies. Dates are only useful when searching Google Groups (for newsgroups).

  • Save any page that has information you think you’d like to refer to again. Gary Price says pages on the Web are ephemeral. SurfSaver (www.surfsaver.com) from AskSam is a good tool for this.

  • Seek out authorities on a subject using Teoma and checking the Resources listed. Gary Price recommends this as a way to build a collection.

Other News

  • Wisenut, which was bought by Looksmart in early 2002, is being upgraded. The index has been refreshed somewhat and significant improvements are expected in the new year.

  • Lycos is revamping Hotbot. Rumour has it that it will become a meta-search engine.

Conclusion

Has web searching changed in the last year? Reva Basch thinks so. She took the year off and on her return saw that searchers could do much more. We can see this for ourselves. The indexes are larger, it’s much easier to find answers to everyday questions; relevance, as several have noted, is better; specialized search tools for news and multimedia have improved. But, search engines are sure to reach a limit in what they can crawl and index. And, as Danny Sullivan explained, link manipulation has become an issue. What next? Some say – more specialized tools and likely personalization. Check back in a year.


Start Your Engines Report from Internet Librarian 2002 Conference by Aaron Schmidt, SLS Illinois (Dec 2002)


 

 

 


Newsletter by Gwen Harris - in the audience and at the lecturn at Internet Librarian 2002.


Copyright Gwen Harris
A service to subscribers of WebSearchGuide (http://www.websearchguide.ca)


Where to Next?

Return to list of newsletters.

 

home tutorials newsletter what's new about