Archiving the Web

I have often wondered  this too – Why Aren’t We Doing More With Our Web Archives?, Kalev Leetaru, Forbes (Jan 13)

Of course there is  The Internet Archive, but it deserves much more support. Why doesn’t it get it?

Article has many interesting figures and observations.

As of last October the Archive had preserved more than 510 billion distinct URLs (images, videos, style sheets, scripts, PDFs, Microsoft Office files, etc) from over 273 billion web pages gathered from 361 million websites and taking up more than 15 petabytes of storage. Much of this collection is available through the Archive’s public-facing Wayback Machine that allows you to plug in any URL and see all of the Archive’s snapshots capturing its evolution over the past 20 years.

Internet Archive Disaster Plan

The Internet Archive hopes to create another copy of the archive to be stored in Canada – because redundancy will protect against loss.  Good idea.  They need money to do this. Donations are tax deductible but I presume that is for residents of the United States – not Canada. Certainly it’s in our interest since there is a great deal of Canadian materials from websites and digitization projects stored in the archive. For example, view this page listing Canadian Libraries and the number of items digitized.

Help Us Keep the Archive Free, Accessible, and Reader Private, Brewster Kahle, Internet Archive Blog (Nov 29)

The comments are interesting although not consistently supportive or elevating. Many headlines attribute this decision as protection against Trump or the Trump administration. CNET said it straight out: Trump inspires Internet Archive to build replica in Canada

Several Canadians responded – attracted to the posting by Canada in the title. They seem keen to donate but note absence of charitable status. A couple of American writers regard Canada with suspicion because Canada restricts freedom of speech through its laws against hate speech and because Prime Minister Trudeau spoke favourably about Fidel Castro.

One interesting point to note is that the owner of a domain can “use their robots.txt file to remove current AND past archives” from the WayBack Machine.

All in all the Internet Archive is an extremely valuable resource to Canadians especially for historical research – we do need to help keep it safe from whatever disaster could befall it. It’s just good disaster planning.

Digital Public Library of America takes the lead

Digital Public Library of America becomes even richer resource for researchers thanks to the newly inked agreement by the Library of Congress to be a “content hub partner”. This could make DPLA everyone’s first stop for finding materials on US cultural history.

Library of Congress, Digital Public Library of America To Form New Collaboration, Dick Eastman, Eastman’s Online Genealogy Newsletter (Nov 29)

Quoted – “The Digital Public Library of America is a portal — effectively, a searchable catalog—that aggregates existing digitized content from major sources such as libraries, archives, museums and cultural institutions. It provides users with links back to the original content-provider site where the material can be viewed, read or, in some cases, downloaded.”

Searching the Internet Archive

Internet Archive is getting easier to search. Keyword and site combinations have  worked in Google (eg  your terms ), but now the Archive offers faceted filtering according to media type and topic, and full text – but in beta, and we all know the problems with converting with optical character recognition – tread carefully.

Searching Through Everything, Internet Archive Blog, Oct 26

Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.

LLRX for information professionals

LLRX – Law and Technology Resources for Legal Professionals – an important web journal for legal researchers – has been redesigned into a fresh and contemporary WordPress site. Sabrina Pacifici, the founder and publisher, wrote, “Your support is appreciated, and I will continue to maintain LLRX as a community of best practice and knowledge sharing for a wide range of professionals who are critical members of organizations in all sectors.” offers a monthly edition of new articles, guides and topical resources comprised of comprehensive, reliable and wide ranging topical content to support actionable projects, research, teaching/training/learning components for professionals and students in law, academia, the public, private, and advocacy sectors. [Source]

Pacifici also blogs her own findings and observations on a variety of legal topics and information resources in beSpacific.

Digital History

The availability of digital resources on the Web fraises new issues for historical research . The February 2016 issue of the American Historical Review has five articles concerning “reviewing digital history.”  The Introduction by Alex Lichenstein  discusses the state of digital history today and  introduces reviews of two new sites.

“Both of these exchanges remind us that the burgeoning world of digital scholarship deserves fuller critical engagement through these kinds of in-depth reviews. “

Background and issues are given in Googling History: The AHR Explores Implications of Using Digital Sources for Historians (May 16) on the American Historical Association blog.

That article refers to a list of online collections of primary sources — Digital Primary Sources – very eclectic – useful for browsing to see the types, and you might find something related to your research interest.

BBC Recipes

The BBC decision  to remove 11,000 recipes from its website reminds us of the ephermeral nature of web content and the consequences.

BBC unveils shake-up of online services including recipes website, BBC (May 17) — “The BBC has announced that a number of websites, including BBC Food and Newsbeat, are to close as part of plans to save £15m.”

UK Web Archive blog reported that it will archive the recipes from the food pages — Saving BBC Recipes Website (May 17).  So will three other archiving services: Internet Archive, Library of Alexandria and the National Library of Iceland.

BBC announced the next day (May 18) that it would move most of the content to BBCGoodFood, its commercial food site. Publishers are not happy. BBC’s recipes U-turn is a cynical move, say its rivals observed the Guardian (May 18)

Mike Jeffs at Branded3 looked at the change from another point of view — How will removing BBC recipes affect search? Other recipe sites will rank more highly in searches especially on niche items.    BBCGoodFood which already has had larger visibility than the BBC food section will gain 1.8 million keywords to optimize on.