Digitizing early newspapers

many countries have digitization programs for their newspapers. The DPLA in the US is investigating making access to all papers from one place.

DPLA Announces Knight Foundation Grant to Research Potential Integration of Newspaper Content, DPLA

In other countries — “Other national digital libraries including Trove in Australia and Europeana have undertaken efforts to make full-text newspaper discovery a priority. Europeana recently launched Europeana Newspapers by aggregating 18 million historic newspaper pages. The intent of the DPLA staff is to engage the state newspaper projects, as well as Trove and Europeana Newspapers, over the next year as we consider the viability of a US-based newspaper aggregation. DPLA will also engage with the International Image Interoperability Framework (IIIF) community to discuss how IIIF may play a role in centralized newspaper discovery.”

What’s Canada doing?

Google Scholar and OAPEN

Google Scholar is indexing Open Access books at OAPEN, nearly 2,500 in number.

Google Scholar Indexes Open Access Books, Press Release, Knowledge Unlatched (October 28, 2015)

Read more about it at Knowledge Unlatched – starting with this excerpt from the page about benefits to readers.

Knowledge Unlatched has made it possible for anyone in the world with an Internet connection to read books published through the Knowledge Unlatched project for free. A PDF Open Access version of unlatched books has been posted on OAPEN and HathiTrust immediately upon publication. The Open Access version does not not carry DRM restrictions.

Semantic Scholar

Semantic Scholar is a new search engine that uses machine learning to extract concepts. For now its corpus has computer science papers.

Academic Search Engine Grasps for Meaning, Will Knight, MIT Technology Review (Nov 2)

Etzioni says the goal for Semantic Scholar is to go further by giving computers a much deeper understanding of new scientific publications. His team is developing algorithms that will read graphs or charts in papers and try to extract the values presented therein. “We want ultimately to be able to take an experimental paper and say, ‘Okay, do I have to read this paper, or can the computer tell me that this paper showed that this particular drug was highly efficacious?’”

GDELT Project

Imagine being able to conduct  data analysis of 3.5 million books. This may now be possible through the GDELT Project. GDELT stands for Global Database of Events, Language and Tone and has been capturing world events in a very large dataset.

3.5 Million Books 1800-2015: GDELT Processes Internet Archive and HathiTrust Book Archives and Available In Google BigQuery, The GDELT Project (Sept 12)

More than a billion pages stretching back 215 years have been examined to compile a list of all people, organizations, and other names, fulltext geocoded to render them fully mappable, and more than 4,500 emotions and themes compiled. All of this computed metadata is combined with all available book-level metadata, including title, author, publisher, and subject tags as provided by the contributing libraries.

ProQuest content to be indexed in Google Scholar

ProQuest Scholarly Content Now Discoverable in Google Scholar, PRNewswire (Aug 11)

ProQuest has marked another milestone in ease of access to its rich research content. The full text of its scholarly content – including journals and working papers – is now indexed in Google Scholar, enabling Google Scholar users to seamlessly discover and access their library’s ProQuest collections. Efficiency and productivity for both ProQuest and Google Scholar users is improved, while libraries benefit from increased usage for their subscribed collections.

Best Biz Web Newsletter and Site

There were more good items in the Best Biz Web Newsletter this month. This newsletter is available for free but you must have a subscription. If you have any interest in business resources, sign up now at Best of the Business Web. When you visit, check the blog – Thinking Out Loud – for thoughtful postings by Robert Berkman on the research process.

Of interest to me in the June newsletter were:

CORE – Connecting Repositories — aggregates open access research outputs from repositories and journals worldwide. CORE provides “services for different stakeholders including academics and researchers, repository managers, funders and developers”.

Lies, Damn Lies and Viral Content at TOW Center for Digital Journalism that describes and links to a report by Craig Silverman on “How News Websites Spread (and Debunk) Online Rumors, Unverified Claims and Misinformation.” Beware the viral story.

Journalists today have an imperative—and an opportunity—to sift through the mass of content being created and shared in order to separate true from false, and to help the truth to spread. This report includes a set of specific and, where possible, data driven recommendations for how this anti-viral viral strategy can be executed.