Web Search Guide banner
Web Searching   
home  tutorials  newsletter  what's new  about

Web Pages

The information you are searching for may be on the home page - first or front page of a Web site, or it may be two or more levels deep in the site. Most Web sites are constructed in levels - the top levels providing overall content and direction, and the lower levels the actual content.

 
Subject directory vs search engine indexing
 

Many organizations have a large number of pages flowing from their home page. For example, Health Canada (hc-sc.gc.ca) has over 321,000 pages according to Google.

Subject directories mostly list the home page of a web site. Some organizations will have many sub-sections, each with its own home page. A university, for example, will have a main home page, and additional home pages for each of the schools - science, humanities, engineering - and possibly their departments. All of these might be listed in the subject directory under their own classifications. Open Directory has categorized over 4.6 million pages - largely web sites.

Search engines, on the other hand, gather words from web pages. Google said in 2008 that it knows of one trillion items on the web, although it doesn't index them all. Yahoo is probably competitive, and the others are smaller.

Total size of the web is unknown and it's very hard to even come up with a good guess. According to Netcraft.com there were 225 million web hostnames as of August 2009 (though this was a drop of 13.7 million sites from the previous month). As a guess, if each hostname had 500 pages, there could be around 11 billion web pages. Suffice to say, the search engines capture just a small fraction of it.


TIP: When you are starting your search, keep in mind that the words and descriptions found on home pages will describe the main topic and likely be fairly general, whereas the words and descriptions found on individual pages will be the guts of the content, more descriptive, technical, and precise.


Invisible Web

But search engines do not reach all parts of the Web or necessarily index all pages at a site. In fact, they may index only a fraction of the content of the Web. The Invisible Web or Deep Web, as it is called, is largely comprised of databases not easily indexed by the search engines, pages deep in a web site that don't get crawled, areas that exclude the robots, file formats that the search engines ignore (although there are very few of these now), and services for subscribers only (and often for a fee). No one has an estimate, but it is very big.

The trick to finding information on the Invisible Web is to think big. Where would one likely find the answer? Examples: for definitions of words, look in a dictionary; for specifics on a drug, find a drug database; for information on stars and the universe, look for a virtual planetarium.


Definitions

Home Page - First page to site. AKA, front page, start page, welcome page. The Home page for the Professional Learning Centre at the Faculty of Information at the University of Toronto is
plc.ischool.utoronto.ca

Web Site - Complete set of pages and files. A site can have many thousands of pages.

University of Toronto at www.utoronto.ca is a web site. So is PLC at plc.itoronto.utoronto.ca.

Web Page - Individual page of text. Usually has images. Often will link to other pages. A Web page will have a unique address (the URL).

PHP, ASP - you'll often pages that show as xxx.asp or xxx.php. These are the marks of a web page that is delivered to you dynamically using php or asp scripting languages to extract the information from a database.

PDF - Portable Document Format created through Adobe Acrobat software. Many companies prefer to publish reports in pdf for better display.

Invisible Web - Resources on the Web that search engines don't reach. Could be many times larger the public web.

Deep Web - Synonym for invisible web and may be more descriptive - refers to pages that are on the web but too deep in a site or database to be indexed by a search engine.

 

Where to next?

Learn more about Using Subject Directories.

 

home tutorials newsletter what's new about