Internet Archive Disaster Plan

The Internet Archive hopes to create another copy of the archive to be stored in Canada – because redundancy will protect against loss.  Good idea.  They need money to do this. Donations are tax deductible but I presume that is for residents of the United States – not Canada. Certainly it’s in our interest since there is a great deal of Canadian materials from websites and digitization projects stored in the archive. For example, view this page listing Canadian Libraries and the number of items digitized.

Help Us Keep the Archive Free, Accessible, and Reader Private, Brewster Kahle, Internet Archive Blog (Nov 29)

The comments are interesting although not consistently supportive or elevating. Many headlines attribute this decision as protection against Trump or the Trump administration. CNET said it straight out: Trump inspires Internet Archive to build replica in Canada

Several Canadians responded – attracted to the posting by Canada in the title. They seem keen to donate but note absence of charitable status. A couple of American writers regard Canada with suspicion because Canada restricts freedom of speech through its laws against hate speech and because Prime Minister Trudeau spoke favourably about Fidel Castro.

One interesting point to note is that the owner of a domain can “use their robots.txt file to remove current AND past archives” from the WayBack Machine.

All in all the Internet Archive is an extremely valuable resource to Canadians especially for historical research – we do need to help keep it safe from whatever disaster could befall it. It’s just good disaster planning.

AI-Based Scholarly Search Engines

Google Scholar has competition in two AI-based scholarly search engines: the new Semantic Scholar strong in the sciences, and the relaunched Microsoft Academic with content from many fields of study.

AI science search engines expand their reach “Semantic Scholar triples in size and Microsoft Academic’s relaunch impresses researchers”, by Nicola Jones, Nature (Nov 11)

[Semantic Scholar] A free AI-based scholarly search engine that aims to outdo Google Scholar is expanding its corpus of papers to cover some 10 million research articles in computer science and neuroscience, its creators announced on 11 November. Since its launch last year, it has been joined by several other AI-based academic search engines, most notably a relaunched effort from computing giant Microsoft.

Scholarly Databases

Students and researchers can learn more about scholarly databases through Beyond Citation.  Started as a project through CUNY Graduate Center Digital Praxis Seminar in 2014, its objective is to “aggregate information about academic databases to encourage critical thinking about how these resources affect scholarship”. In other words, it’s important to be aware of the limitations of Google Books, and of the other twelve databases reviewed. At the very least this is a good starting point for learning of the existence and qualities of these databases. But updates and additions to the site seem to have stopped in August 2015. The Twitter feed for @beyondcitation however is alive and active.

Academic Search

Academic researchers need to know about citation research. This article in Online Searcher will shed some light on how Google Scholar really works.

Set Your Cites High: The Value of Quality Citation Information by Amy Affelt and David Pauwels (Sept/Oct 2016)

In Google Scholar, “Dates and citation counts are estimated and are determined automatically by a computer program.” Really!

This poses problems for the information profession who needs the exact number of cited references.

Article describes and examines: Google Scholar, Microsoft Academic Search, HeinOnline, Ebsco databases, Scopus, and Web of Science. Identifying and tracking down the cited references will take more than one approach or one tool.

Digitizing orphan works

Orphan works are books and articles that are still under copyright but for whom copyright holder can’t be found. The Harvard Library is looking for ways to “solve the legal complexities of the orphan works problem by identifying no-risk or low-risk ways to digitize and distribute orphan works under U.S. copyright law”. It recently released David Hansen’s study “Digitizing Orphan Works: Legal Strategies to Reduce Risks for Open Access to Copyrighted Orphan Works”

Libraries, Orphan Works, and the Future of Copyright by Nancy K Herther, Information Today (Oct 4)

Article provides background to the current state of copyright law with some comparison of the US to Canada and the UK a 2013 law ““allows the government to grant firms or organisations the right to use orphaned material, providing ‘a diligent search’ for the copyright owner is first carried out. It also allows for the creation of an organisation that might levy licensing fees on behalf of absent content creators—and which would pay out to rights holders who subsequently discover their work has been sold.”

Hopefully the Harvard Library will succeed in its goal to “to help clear the way for U.S. universities, libraries, archives, museums, and other cultural institutions to digitize their orphan works and make the digital copies open access.””

LLRX for information professionals

LLRX – Law and Technology Resources for Legal Professionals – an important web journal for legal researchers – has been redesigned into a fresh and contemporary WordPress site. Sabrina Pacifici, the founder and publisher, wrote, “Your support is appreciated, and I will continue to maintain LLRX as a community of best practice and knowledge sharing for a wide range of professionals who are critical members of organizations in all sectors.” offers a monthly edition of new articles, guides and topical resources comprised of comprehensive, reliable and wide ranging topical content to support actionable projects, research, teaching/training/learning components for professionals and students in law, academia, the public, private, and advocacy sectors. [Source]

Pacifici also blogs her own findings and observations on a variety of legal topics and information resources in beSpacific.

Dealing with null results

Love this line from Greg Notess’ article Tips for Avoiding, or Celebrating, Zero Search Results in Information Today — “Only librarians like to search; everyone else likes to find”.

Notess examines the reasons for and the significance of a null set of results. Mostly, searchers need to know the structure and scope of the database; literary and academic databases are much different than Google; specialty searches such as for patents take special skills.

Digital Archiving

Digital archiving has reached an urgency as more records begin in digital format, and older ones are digitized. Jan Zastrow in Information Today introduces Top 10 Digital Archives Blogs (July 5)  There is much for the archivist or records manager to investigate here and some for the individual interested in personal archives and genealogy..

Here’s a list of bests to help you sift through the noise—online journals, blogs, and RSS and Twitter feeds—to keep you abreast of what’s happening in the quickly evolving world of digital archives, electronic records, digital preservation and curation, personal archiving, digital humanities, and more. Some are sponsored by august institutions, while others are more informal, idiosyncratic offerings from thought leaders in the industry. A caveat: These are all U.S.-centric, English-language sources, which do not span the universe of ideas about digital cultural heritage globally (for that, get started at the World Digital Library;


It’s not often that we come across an article that compares Web of Science, Scopus, and Google Scholar, and Microsoft Academic for citation analysis. Google Scholar Digest posted this study by Anne-Wil Harzing of Middlesex University, UK (June 13, 2016) Microsoft Academic versus Google Scholar, Scopus, and Web of Science: Anne-Wil Harzing’s case.

This article assesses Microsoft Academic coverage through a detailed comparison of the publication and citation record of a single academic for each the four main citation databases: Google Scholar, Microsoft Academic, the Web of Science, and Scopus. Overall, this first small-scale case study suggests that the new incarnation of Microsoft Academic presents us with an excellent alternative for citation analysis. If our findings can be confirmed by larger-scale studies, Microsoft Academic might well turn out to combine the advantages of broader coverage, as displayed by Google Scholar, with the advantage of a more structured approach to data presentation, typical of Scopus and the Web of Science. If so, the new Microsoft Academic service would truly be a Phoenix arisen from the ashes.

Spoiler alert: Microsoft Academic was found to be stronger than Web of Science for publication and citation coverage, and at least equal to Scopus. “Only Google Scholar outperforms Microsoft Academic interms of both publications and citations.”