I have often wondered this too – Why Aren’t We Doing More With Our Web Archives?, Kalev Leetaru, Forbes (Jan 13)
Of course there is The Internet Archive, but it deserves much more support. Why doesn’t it get it?
Article has many interesting figures and observations.
As of last October the Archive had preserved more than 510 billion distinct URLs (images, videos, style sheets, scripts, PDFs, Microsoft Office files, etc) from over 273 billion web pages gathered from 361 million websites and taking up more than 15 petabytes of storage. Much of this collection is available through the Archive’s public-facing Wayback Machine that allows you to plug in any URL and see all of the Archive’s snapshots capturing its evolution over the past 20 years.
The Internet Archive hopes to create another copy of the archive to be stored in Canada – because redundancy will protect against loss. Good idea. They need money to do this. Donations are tax deductible but I presume that is for residents of the United States – not Canada. Certainly it’s in our interest since there is a great deal of Canadian materials from websites and digitization projects stored in the archive. For example, view this page listing Canadian Libraries and the number of items digitized.
Help Us Keep the Archive Free, Accessible, and Reader Private, Brewster Kahle, Internet Archive Blog (Nov 29)
The comments are interesting although not consistently supportive or elevating. Many headlines attribute this decision as protection against Trump or the Trump administration. CNET said it straight out: Trump inspires Internet Archive to build replica in Canada
Several Canadians responded – attracted to the posting by Canada in the title. They seem keen to donate but note absence of charitable status. A couple of American writers regard Canada with suspicion because Canada restricts freedom of speech through its laws against hate speech and because Prime Minister Trudeau spoke favourably about Fidel Castro.
One interesting point to note is that the owner of a domain can “use their robots.txt file to remove current AND past archives” from the WayBack Machine.
All in all the Internet Archive is an extremely valuable resource to Canadians especially for historical research – we do need to help keep it safe from whatever disaster could befall it. It’s just good disaster planning.
Digital Public Library of America becomes even richer resource for researchers thanks to the newly inked agreement by the Library of Congress to be a “content hub partner”. This could make DPLA everyone’s first stop for finding materials on US cultural history.
Library of Congress, Digital Public Library of America To Form New Collaboration, Dick Eastman, Eastman’s Online Genealogy Newsletter (Nov 29)
Quoted – “The Digital Public Library of America is a portal — effectively, a searchable catalog—that aggregates existing digitized content from major sources such as libraries, archives, museums and cultural institutions. It provides users with links back to the original content-provider site where the material can be viewed, read or, in some cases, downloaded.”
The invaluable Internet Archive has added two search features: faceted filtering – media type, topics and subjects; and full text searching across 9 million text items (but in beta).
Searching Through Everything, Internet Archive blog post (Oct 26)
Internet Archive is getting easier to search. Keyword and site combinations have worked in Google (eg site:archive.org your terms ), but now the Archive offers faceted filtering according to media type and topic, and full text – but in beta, and we all know the problems with converting with optical character recognition – tread carefully.
Searching Through Everything, Internet Archive Blog, Oct 26
Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.
Internet Archive has been preserving the Web past for 20 years. Defining Web pages, Web sites and Web captures, Internet Archive Blog (Oct 23)
As of today, the Internet Archive officially holds 273 billion webpages from over 361 million websites, taking up 15 petabytes of storage.
LLRX – Law and Technology Resources for Legal Professionals – an important web journal for legal researchers – has been redesigned into a fresh and contemporary WordPress site. Sabrina Pacifici, the founder and publisher, wrote, “Your support is appreciated, and I will continue to maintain LLRX as a community of best practice and knowledge sharing for a wide range of professionals who are critical members of organizations in all sectors.”
LLRX.com offers a monthly edition of new articles, guides and topical resources comprised of comprehensive, reliable and wide ranging topical content to support actionable projects, research, teaching/training/learning components for professionals and students in law, academia, the public, private, and advocacy sectors. [Source]
Pacifici also blogs her own findings and observations on a variety of legal topics and information resources in beSpacific.
THOMAS, the US government site launched in 1995 to provide online access to legislative and Congressional information, has been replaced by Congress.gov with more content and features.
Time to Say Goodbye to THOMAS, In Custodia Legis (April 28)
The availability of digital resources on the Web fraises new issues for historical research . The February 2016 issue of the American Historical Review has five articles concerning “reviewing digital history.” The Introduction by Alex Lichenstein discusses the state of digital history today and introduces reviews of two new sites.
“Both of these exchanges remind us that the burgeoning world of digital scholarship deserves fuller critical engagement through these kinds of in-depth reviews. “
Background and issues are given in Googling History: The AHR Explores Implications of Using Digital Sources for Historians (May 16) on the American Historical Association blog.
That article refers to a list of online collections of primary sources — Digital Primary Sources – very eclectic – useful for browsing to see the types, and you might find something related to your research interest.
The BBC decision to remove 11,000 recipes from its website reminds us of the ephermeral nature of web content and the consequences.
BBC unveils shake-up of online services including recipes website, BBC (May 17) — “The BBC has announced that a number of websites, including BBC Food and Newsbeat, are to close as part of plans to save £15m.”
UK Web Archive blog reported that it will archive the recipes from the food pages — Saving BBC Recipes Website (May 17). So will three other archiving services: Internet Archive, Library of Alexandria and the National Library of Iceland.
BBC announced the next day (May 18) that it would move most of the content to BBCGoodFood, its commercial food site. Publishers are not happy. BBC’s recipes U-turn is a cynical move, say its rivals observed the Guardian (May 18)
Mike Jeffs at Branded3 looked at the change from another point of view — How will removing BBC recipes affect search? Other recipe sites will rank more highly in searches especially on niche items. BBCGoodFood which already has had larger visibility than the BBC food section will gain 1.8 million keywords to optimize on.