Information Retrieval Techniques

It takes experience and discipline to be skilled researcher. Judith Tinnes, in this article on the art of searching for terrorism literature, presents a very thorough and clear description of the research process and types of online resource and provides sound advice on how to track down the best material. It is the best article I have ever read on information retrieval techniques: It instructs the reader in matters of approach and tools, and includes detail on techniques such as those of snowballing, citation searching, and citation analyses. You don’t need to be researching terrorism literature to benefit from this, but the topic does give it an extra frisson.


The Art of Searching: How to Find Terrorism Literature in the Digital Age
by Judith Tinnes, Perspectives on Terrorism, Vol 7, No 4 (2013)

This guide provides an overview of information retrieval techniques for locating high-quality literature on terrorism and counter-terrorism. Starting from general considerations on conducting a literature search – taking into account the specifics of terrorism studies – instructions are provided on how to find particular literature types by using different search methods and information retrieval systems, followed by information on how to refine a search by employing focused search strategies. The explanations are enriched with numerous links to recommendable resources. The included examples are focused on terrorism studies, but the general search mechanics can be applied to other research domains as well.

Reference Works Online

Reference sources have been moving to online for several years. This article in Online Searcher takes a good hard look at the pros and cons of online reference products. Ease of access has  greatly improved, but authority may not be clear and the online version of a reference work may not be complete.

The Ebb and Flow of Reference Products , By Denise Beaubien Bennett, Online Searcher (July / August 2014)

“Have reference sources eroded in quality in the online era? We’re all aware of the challenges facing us and our users in vetting (or not) the authority and credibility of free sources available online. But the quality of contemporary vetted sources is worth examining in its own right.”

Many excellent online reference works are commented upon in this article.

Riches in the Internet Archive

The Internet Archive is best known for the Wayback Machine to archived web pages but it has much more – books, images, music, and specialized collections.

5 Types Of Free Content Riches You Can Dig Up At The Internet Archive by Jessica Coccimiglio, Make Use Of (Jul 16)

Canadians will be interested in the long list of texts and collections from Canadian schools and associations — Canadian Libraries

Bing, Cortana, and Academic Search

Bing will integrate results from academic sources in the general web search – as explained in Bing & Cortana To Get Academic Search Integration At A Whole New Level by Barry Schwartz, Search Engine Land (Jul 15)

Adding a blow to Google, Microsoft added that “instead of treating scholarly information as a separate search engine – as competitors,” clearly implying Google here. Microsoft Bing will make the academic data as “a first-class citizen in Bing search results.”

Cortana, Microsoft’s personal asistant technology, is the agent. Full description is at Making Cortana the Researcher’s Dream Assistant, Inside Microsoft Research.

That’s great – but is Bing adding records to its Academic Search?

Digital Commons Network for free, scholarly articles

Those seeking grey literature or scholarly will want to explore the Digital Commons Network of free, full-text scholarly works.  These are sourced from 330 universities and colleges worldwide (although most are in the United States) and curated by the university librarians. Among Canadian universities I noted McMaster University, Wilfrid Laurier, University of Windsor, University of Western Ontario, and Osgoode Hall Law School of York University. There are surely more.

Digital Commons holds “peer-reviewed journal articles, book chapters, dissertations, working papers, conference proceedings, and other original scholarly work.”

The site opens to a multicoloured wheel for visually exploring the disciplines. Continue browsing by spinning the wheel. You may also click on an academic discipline and begin to narrow by journal, author, or keyword search.

Search wheel for exploring the Digital Commons Network

Search wheel for exploring the Digital Commons Network

The collection is made available through Berkeley Electronic Press (bepress.com).

This stupendous resource was featured in the Digital Shift – Uncommonly Open: The New Digital Commons Network (June 19, 2013)

This resource was reviewed in the BestBizWeb newsletter.

Reference Works Online

An reference work that is online may not be what you expect based on the print version: There may be issues of authority, ease of use, incompleteness, and even the reader’s privacy. This article points you to the weaknesses to watch for and areas of improvement.

The Ebb and Flow of Reference Products , By Denise Beaubien Bennett, Online Searcher, July/August 2014 Issue

Of interest:

New items available and, more importantly, findable online include items formerly known as “grey literature” or “vertical file” material from organizations and associations, such as local brochures, statistical reports, and policy papers.

HathiTrust Digital Library and fair use

The US Court of Appeal (2nd. Circuit) has ruled that “full-text book scanning is generally going to be considered “fair use” and protected from claims of copyright infringement”.

Book Scanning Suits Against Google, Others Wind Down With Fair Use Rulings, Greg Sterling, Marketing Land (Jun 11)

This ruling allows 90+ member libraries of the HathiTrust Digital Library to continue their work to digitize their collections for access.

The Baylor University Library webpage provides some background on HathiTrust:

HathiTrust was established in 2008 with the mission to “contribute to the common good by collecting, organizing, preserving, communicating and sharing the record of human knowledge.” The original HathiTrust libraries were partners with Google and/or the Internet Archive for the digitization of books in their collections. In part, HathiTrust was created so these libraries could work collaboratively to manage, provide access to, and preserve their digital assets in ways that Google could not.

More about Google Scholar

Google has never been forthcoming about the workings of Google Scholar. In this posting, Aaron Tay, senior librarian at the National University of Singapore Libraries, figures out from his reading and experimentation eight things all users should know.

8 surprising things I learnt about Google Scholar, Musings about Librarianship (June 11, 2014)

1. Google Scholar indexes the entire article even if the text is only accessible through a paywall. Google may have the title, but not the text.

2. As many have shown, GS has good recall, but poor precision – you get lots, but can’t surgically search for exactly what you need.

3. Query is limited to 256 characters – presumably matches the 32 word restriction in Google.

4. Google has never listed journal included or even publishers. Is this because it “harvests at the article level”?

5. site: operator does not give an accurate count. There are reasons for this as the posting explains. BTW – it’s not accurate in Google either.

6. GS recognizes some tags.

And a couple of others. To which I add one more – GS might not have an article at all from a small publisher, but a pdf of the article will be findable through Google. Webmasters take note – make sure GS knows about your journal.

Reviewing Google Scholar

The Google Scholar Digest reports on two revealing articles of analysis about the depth – or coverage – of Google Scholar.

How many academic documents are visible and freely available on the Web? by Granada & Valencia (June 12).

Digest of The number of scholarly documents on the public web. Khabsa, M. & Giles, C. Lee. PLOS One v.9, n.5.

The Digest recapped the four main research questions.

  1. Estimate of number of academic papers are circulating on the web in Englaish: 114 million
  2. Number of documents written in English in Google Scholar: 99.3 million or 87%
  3. Estimate of number available for free: 27 million or one in four.
  4. Differences between scientific fields and disciplines: Significant – about 50% for computer science; 45% for multidisciplinary and economics and business; declining to 10% for engineering, agricultural science, material science

Findings are preliminary.

This makes the next article about how much of the the World Bank’s database of reports that Google Scholar has indexed all the more interesting.

The World Bank’s policy reports in Google Scholar. Are they visible, cited, and downloaded?, by Granada & Valencia (June 12)

Digest of Which World Bank reports are widely read? World Bank Policy Research Working Paper, n. 6851,by Doemeland, Doerte & Trevino, James. (May 2014)

The authors found that GS had indexed 74.5% of the reports classified as “Economic and Sector Work” or as “Technical Assistance” for 2008-2012.

“The most suggestive results of this work concerning our object of study (scientific knowledge about Google Scholar) are the empirical evidences provided on the wide and diverse coverage of Google’s academic search engine. They confirm something well-known: Google Scholar, unlike other traditional bibliographic databases that are mainly focused on indexing journal articles and conference proceedings, collects instances of all the types of documents produced in the scientific domain (articles, conference proceedings, books and book chapters), as well as the academic circles (doctoral theses, master’s or undergraduate theses, teaching materials) and of special interest in this work, the professional world (patents, scientific/technical reports).”

But, Google Scholar Digest noted that the World Bank launched Open Knowledge Repository in 2012. Does GS index 75% of it? It appears not – only 17.1%.

As a preliminary conclusion, we found that, even though Google Scholar gathers more document types than any other database, the visibility of World Bank reports in Google Scholar is far from being complete. And this is only considering the material deposited in the official repository, not to mention the remaining material that may be allocated in other subdomains of the World Bank.

Microsoft Academic Search – status?

Is Microsoft still supporting scholarly research through  Microsoft Academic Search (MAS)? An article referenced in the Google Scholar Digest blog indicated that updates had all but completely stopped in 2013.

Empirical Evidences in Citation-Based Search Engines: Is Microsoft Academic Search dead? by Enrique Orduña-Malea1 and others (Apr 29, 2014)

The authors compared Google Scholar (GS) and MAS for coverage, journals, usage. Results were poor from MAS – so much so that the authors identified a very dramatic drop in the number of documents indexed “from 2,346,228 in 2010 to 8,147 in 2013.” They noted that this seems to have gone unnoticed. Does no one use MAS?

Would seem a shame, since MAS has  many attractive features, far exceeding the very rough GS. Updates haven’t entirely stopped, but they are hardly substantial. My search for articles published in 2014 came up with a paltry 1,554, most in multi-disciplinary. Examining Journal of ACM shows no new documents since 2010.

Microsoft Academic Search - indexed in 2014 as of June 17, 2014

Microsoft Academic Search – indexed in 2014 as of June 17, 2014

 

In spite of the small numbers, the search options and display are so attractive that one could still make good use of this resource to research academic sources and documents. One example is learning more about an academic organization in this search for university of toronto.

Microsoft Academic Search - University of Toronto

Microsoft Academic Search – University of Toronto