Riches in the Internet Archive

The Internet Archive is best known for the Wayback Machine to archived web pages but it has much more – books, images, music, and specialized collections.

5 Types Of Free Content Riches You Can Dig Up At The Internet Archive by Jessica Coccimiglio, Make Use Of (Jul 16)

Canadians will be interested in the long list of texts and collections from Canadian schools and associations — Canadian Libraries

Bing, Cortana, and Academic Search

Bing will integrate results from academic sources in the general web search – as explained in Bing & Cortana To Get Academic Search Integration At A Whole New Level by Barry Schwartz, Search Engine Land (Jul 15)

Adding a blow to Google, Microsoft added that “instead of treating scholarly information as a separate search engine – as competitors,” clearly implying Google here. Microsoft Bing will make the academic data as “a first-class citizen in Bing search results.”

Cortana, Microsoft’s personal asistant technology, is the agent. Full description is at Making Cortana the Researcher’s Dream Assistant, Inside Microsoft Research.

That’s great – but is Bing adding records to its Academic Search?

Digital Commons Network for free, scholarly articles

Those seeking grey literature or scholarly will want to explore the Digital Commons Network of free, full-text scholarly works.  These are sourced from 330 universities and colleges worldwide (although most are in the United States) and curated by the university librarians. Among Canadian universities I noted McMaster University, Wilfrid Laurier, University of Windsor, University of Western Ontario, and Osgoode Hall Law School of York University. There are surely more.

Digital Commons holds “peer-reviewed journal articles, book chapters, dissertations, working papers, conference proceedings, and other original scholarly work.”

The site opens to a multicoloured wheel for visually exploring the disciplines. Continue browsing by spinning the wheel. You may also click on an academic discipline and begin to narrow by journal, author, or keyword search.

Search wheel for exploring the Digital Commons Network

Search wheel for exploring the Digital Commons Network

The collection is made available through Berkeley Electronic Press (

This stupendous resource was featured in the Digital Shift – Uncommonly Open: The New Digital Commons Network (June 19, 2013)

This resource was reviewed in the BestBizWeb newsletter.

Reference Works Online

An reference work that is online may not be what you expect based on the print version: There may be issues of authority, ease of use, incompleteness, and even the reader’s privacy. This article points you to the weaknesses to watch for and areas of improvement.

The Ebb and Flow of Reference Products , By Denise Beaubien Bennett, Online Searcher, July/August 2014 Issue

Of interest:

New items available and, more importantly, findable online include items formerly known as “grey literature” or “vertical file” material from organizations and associations, such as local brochures, statistical reports, and policy papers.

HathiTrust Digital Library and fair use

The US Court of Appeal (2nd. Circuit) has ruled that “full-text book scanning is generally going to be considered “fair use” and protected from claims of copyright infringement”.

Book Scanning Suits Against Google, Others Wind Down With Fair Use Rulings, Greg Sterling, Marketing Land (Jun 11)

This ruling allows 90+ member libraries of the HathiTrust Digital Library to continue their work to digitize their collections for access.

The Baylor University Library webpage provides some background on HathiTrust:

HathiTrust was established in 2008 with the mission to “contribute to the common good by collecting, organizing, preserving, communicating and sharing the record of human knowledge.” The original HathiTrust libraries were partners with Google and/or the Internet Archive for the digitization of books in their collections. In part, HathiTrust was created so these libraries could work collaboratively to manage, provide access to, and preserve their digital assets in ways that Google could not.

More about Google Scholar

Google has never been forthcoming about the workings of Google Scholar. In this posting, Aaron Tay, senior librarian at the National University of Singapore Libraries, figures out from his reading and experimentation eight things all users should know.

8 surprising things I learnt about Google Scholar, Musings about Librarianship (June 11, 2014)

1. Google Scholar indexes the entire article even if the text is only accessible through a paywall. Google may have the title, but not the text.

2. As many have shown, GS has good recall, but poor precision – you get lots, but can’t surgically search for exactly what you need.

3. Query is limited to 256 characters – presumably matches the 32 word restriction in Google.

4. Google has never listed journal included or even publishers. Is this because it “harvests at the article level”?

5. site: operator does not give an accurate count. There are reasons for this as the posting explains. BTW – it’s not accurate in Google either.

6. GS recognizes some tags.

And a couple of others. To which I add one more – GS might not have an article at all from a small publisher, but a pdf of the article will be findable through Google. Webmasters take note – make sure GS knows about your journal.

Reviewing Google Scholar

The Google Scholar Digest reports on two revealing articles of analysis about the depth – or coverage – of Google Scholar.

How many academic documents are visible and freely available on the Web? by Granada & Valencia (June 12).

Digest of The number of scholarly documents on the public web. Khabsa, M. & Giles, C. Lee. PLOS One v.9, n.5.

The Digest recapped the four main research questions.

  1. Estimate of number of academic papers are circulating on the web in Englaish: 114 million
  2. Number of documents written in English in Google Scholar: 99.3 million or 87%
  3. Estimate of number available for free: 27 million or one in four.
  4. Differences between scientific fields and disciplines: Significant – about 50% for computer science; 45% for multidisciplinary and economics and business; declining to 10% for engineering, agricultural science, material science

Findings are preliminary.

This makes the next article about how much of the the World Bank’s database of reports that Google Scholar has indexed all the more interesting.

The World Bank’s policy reports in Google Scholar. Are they visible, cited, and downloaded?, by Granada & Valencia (June 12)

Digest of Which World Bank reports are widely read? World Bank Policy Research Working Paper, n. 6851,by Doemeland, Doerte & Trevino, James. (May 2014)

The authors found that GS had indexed 74.5% of the reports classified as “Economic and Sector Work” or as “Technical Assistance” for 2008-2012.

“The most suggestive results of this work concerning our object of study (scientific knowledge about Google Scholar) are the empirical evidences provided on the wide and diverse coverage of Google’s academic search engine. They confirm something well-known: Google Scholar, unlike other traditional bibliographic databases that are mainly focused on indexing journal articles and conference proceedings, collects instances of all the types of documents produced in the scientific domain (articles, conference proceedings, books and book chapters), as well as the academic circles (doctoral theses, master’s or undergraduate theses, teaching materials) and of special interest in this work, the professional world (patents, scientific/technical reports).”

But, Google Scholar Digest noted that the World Bank launched Open Knowledge Repository in 2012. Does GS index 75% of it? It appears not – only 17.1%.

As a preliminary conclusion, we found that, even though Google Scholar gathers more document types than any other database, the visibility of World Bank reports in Google Scholar is far from being complete. And this is only considering the material deposited in the official repository, not to mention the remaining material that may be allocated in other subdomains of the World Bank.

Microsoft Academic Search – status?

Is Microsoft still supporting scholarly research through  Microsoft Academic Search (MAS)? An article referenced in the Google Scholar Digest blog indicated that updates had all but completely stopped in 2013.

Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search dead? by Enrique Orduña-Malea1 and others (Apr 29, 2014)

The authors compared Google Scholar (GS) and MAS for coverage, journals, usage. Results were poor from MAS – so much so that the authors identified a very dramatic drop in the number of documents indexed “from 2,346,228 in 2010 to 8,147 in 2013.” They noted that this seems to have gone unnoticed. Does no one use MAS?

Would seem a shame, since MAS has  many attractive features, far exceeding the very rough GS. Updates haven’t entirely stopped, but they are hardly substantial. My search for articles published in 2014 came up with a paltry 1,554, most in multi-disciplinary. Examining Journal of ACM shows no new documents since 2010.

Microsoft Academic Search - indexed in 2014 as of June 17, 2014

Microsoft Academic Search – indexed in 2014 as of June 17, 2014


In spite of the small numbers, the search options and display are so attractive that one could still make good use of this resource to research academic sources and documents. One example is learning more about an academic organization in this search for university of toronto.

Microsoft Academic Search - University of Toronto

Microsoft Academic Search – University of Toronto

Open Access update

This article describes the growth in acceptance of open access journals and repositories and the economics of open access publishing.

Open Access: Progress, Possibilities, and the Changing Scholarly Communications Ecosystem By Abby Clobridge, Online Searcher (March/April 2014)

Of interest: “Their study, produced for the European Commission DG Research & Innovation, found that by the end of 2012, nearly half of all peer-reviewed, scholarly research published in 2008 was freely-available on the web in some form.”

Virtual Library on International Resources

VLWWW Virtual Library is still a resource to check.  This is as old as the Web itself and began as a “virtual library” of recommended resources on topics developed and maintained by academics.

Professor Wayne A. Selcher at Elizabethtown College in Pennsylvania has been one of these experts. His collection is on International Resources.  He reports that it “now has over 2000 carefully selected, annotated links in 35 international affairs categories. It is frequently maintained and should be of use to, students, professors, researchers, and website visitors, among others.” Explore the collection at

This guide is a premier tool for finding the best resources related to international affairs, for digging into the authoritative resources in this specialty, and for getting out of the Google search trap.It will direct you to scholarly  journals, search guides, specialized search engines, news sources, and other web resources.   Unfortunately, the subject guide/directory is  a dying breed because of the labour needed to check, assess, and stay abreast.  Make use of this one while you can.