Archiving the web

University of Toronto Libraries has been archiving web content in four areas through its relations with the Internet Archive and its service for capturing pages, Archive-It.

These are available at Archiving-It.

  • Canadian Government Information
  • Canadian Labour Unions
  • Canadian Political Parties and Political Interest Groups
  • University of Toronto Web Archives

The collections are searchable and one can refine by format.

To see the list of sites included, enter the collection and click on the collection name. There are excellent filters for narrowing the search: subject, creator, year, language.

University of Toronto - Archive - Canadian Government Information

University of Toronto – Archive – Canadian Government Information

World War One in the London Illustrated News

Archive material from the London Illustrated News and the “Great Eight illustrated magazines”  1914 to 1919 will be available at the Illustrated First World War site.

Browse the wartime pages of the Illustrated London News in a new online archive, First World War Centenary (Aug 13)

The project means that for the first time in 100 years, the public will be able to browse the wartime pages of The Illustrated London News and its sister titles; discover paintings, illustrations and sketches by war artists; and read articles, many of which have not been seen since they were first published.

Website has

  • timeline to the war
  • ILN articles – that seem to be timed to the current week 100 years ago
  • War artists who were illustrators.
  • A blog

WW1 Prisoners of War database

Here’s another resource of war records for the  researcher into World War One  – the names and information about people held in the prisoner-of-war camps.

New, Free Website Has Millions of World War I Prisoner of War Records, Genealogy Insider (Aug )

Records were collected by and are made available through the International Committee of the Red Cross —

Records include the ledger entries for prisoners, some postcards or pictures of camps, and a few personal accounts.

Must have been a mammoth job to digitize it all – see the video at

Riches in the Internet Archive

The Internet Archive is best known for the Wayback Machine to archived web pages but it has much more – books, images, music, and specialized collections.

5 Types Of Free Content Riches You Can Dig Up At The Internet Archive by Jessica Coccimiglio, Make Use Of (Jul 16)

Canadians will be interested in the long list of texts and collections from Canadian schools and associations — Canadian Libraries

Digital Commons Network for free, scholarly articles

Those seeking grey literature or scholarly will want to explore the Digital Commons Network of free, full-text scholarly works.  These are sourced from 330 universities and colleges worldwide (although most are in the United States) and curated by the university librarians. Among Canadian universities I noted McMaster University, Wilfrid Laurier, University of Windsor, University of Western Ontario, and Osgoode Hall Law School of York University. There are surely more.

Digital Commons holds “peer-reviewed journal articles, book chapters, dissertations, working papers, conference proceedings, and other original scholarly work.”

The site opens to a multicoloured wheel for visually exploring the disciplines. Continue browsing by spinning the wheel. You may also click on an academic discipline and begin to narrow by journal, author, or keyword search.

Search wheel for exploring the Digital Commons Network

Search wheel for exploring the Digital Commons Network

The collection is made available through Berkeley Electronic Press (

This stupendous resource was featured in the Digital Shift – Uncommonly Open: The New Digital Commons Network (June 19, 2013)

This resource was reviewed in the BestBizWeb newsletter.

Wayback Machine Update

Important facts about the Wayback Machine from the Internet Archive.

Wayback Machine Adds 160 Billion Indexed Pages In A Year, Surpasses 400 Billion Indexed Pages, Barry Schwartz, Search Engine Land (May 12)

  • It has over 4oo billion indexed pages since 1996
  • It added 160 billion pages in about 14 months (Jan 2013 to May 2014)
  • In October 2013 it added capability to quickly view new content.
  • Individuals can also save specific pages to the archive. See How To Save URLs To The Wayback Machine On Demand, Gary Price, Search Engine Land (May13)

HathiTrust Digital Library and fair use

The US Court of Appeal (2nd. Circuit) has ruled that “full-text book scanning is generally going to be considered “fair use” and protected from claims of copyright infringement”.

Book Scanning Suits Against Google, Others Wind Down With Fair Use Rulings, Greg Sterling, Marketing Land (Jun 11)

This ruling allows 90+ member libraries of the HathiTrust Digital Library to continue their work to digitize their collections for access.

The Baylor University Library webpage provides some background on HathiTrust:

HathiTrust was established in 2008 with the mission to “contribute to the common good by collecting, organizing, preserving, communicating and sharing the record of human knowledge.” The original HathiTrust libraries were partners with Google and/or the Internet Archive for the digitization of books in their collections. In part, HathiTrust was created so these libraries could work collaboratively to manage, provide access to, and preserve their digital assets in ways that Google could not.

In search of specialty search engines

Bev Butula at Wisconsin Law Journal wrote about five specialty search engines as alternatives to Google: for science questions, – medical, – economic and financial, – financial, and MagPortal for business (and other) magazine articles.

BEV BUTULA: In search of the best search Wisconsin Law Journals (Feb 12, 2014)

It’s been some time since I looked at MagPortal – good breakdown of subject areas but doesn’t seem to have many magazines. More for browsing than searching.

Deep Web Technologies blogged about the article because two on that list are theirs ( and – In search of the best search

It named three others that we should add to our lists.

  • – Energy and the Environment.
  • National Library of Energy – the DOE’s National Resource for Energy Literacy, Innovation and Security.
  • – “a federated search portal that aggregates social networks, financial sources, government sources, and news for business researchers”

Europeana 1914-1918

This year is the 100th anniversary of the start of World War 1. We are sure to see many announcements of historical materials being made available online.

Europeana ( brings together content from 20 countries – film, images, family papers, and memorabilia.

Europeana 1914-1918: A new website that brings all sides of World War One together launches in Berlin, featuring 10,000 items from the British Library’s collections, Press release (Jan 29)

Europeana 1914-1918 is full of original source material – digitised photographs, maps, diaries, newspapers, letters, drawings and other content that can be used by teachers, historians, journalists, students and interest groups to create new resources.