Digitization and Preservation

Two gems from Research Buzz for people with an interest in archives and digitization.

The Twitter Archive at the Library of Congress: Challenges for information practice and information policy by Michael Zimmer, First Monday (July 6) — Five years into the project to archive public tweets and The Library of Congress has still not been able to provide access, or even news on the project’s status. Suffice to say – there are huge challenges.

This paper explores the challenges faced by the Library that have prevented the timely realization of this valuable archive, divided into two categories: challenges involving practice, such as how to organize the tweets, how to provide useful means of retrieval, how to physically store them; and challenges involving policy, such as the creation of access controls to the archive, whether any information should be censored or restricted, and the broader ethical considerations of the very existence of such an archive, especially privacy and user control.

Not fade away .. how robots are preserving our old newspapers, by Nicola Davis, The Guardian (July 5)

Fascinating account of the work of the British Library to digitize 300 years of old, crumpling newspapers. There is lots of high tech involved. As well, new imaging software is enabling glimpses into otherwise unreadable blobs such as the Great Parchment Book damaged in the Guildhall fire of 1786. The artifact was kept in the hope that someday it could be examined – and that day has come. It’s not all smooth – how will all these digitized materials be organized, preserved, and managed?

There’s plenty of work to be done. A study in 2014 found that, on average, only 17% of collections in heritage institutions across Europe has been digitised in some form. But if digitisation offers new opportunities it also provides fresh headaches. “Libraries, archives, museums don’t have the capacity to look after this digital data long term,” says Terras. And with standards for the documentation, archiving and accessing of data – official and personal – still being thrashed out, Terras is concerned we could be creating a timebomb. “There is a huge danger that future historians will be spending a large amount of time trying to piece together stuff which just doesn’t exist.”

Does Google favour its own services?

Google has been accused again of promoting its own services over others in search results. The latest study was sponsored by Yelp which has complained about this before.

Study Claims Google Is Delivering “Degraded” Search Results, Adding Steam To EU Antitrust Case, Search Engine Land (Jun 29)

According to a report from the Wall Street Journal, researchers from Columbia University and Harvard Business School claim Google is delivering a “degraded version of its search engine,” outranking its own services over more relevant results for local searches on restaurants and hotels.

Best Biz Web Newsletter and Site

There were more good items in the Best Biz Web Newsletter this month. This newsletter is available for free but you must have a subscription. If you have any interest in business resources, sign up now at Best of the Business Web. When you visit, check the blog – Thinking Out Loud – for thoughtful postings by Robert Berkman on the research process.

Of interest to me in the June newsletter were:

CORE – Connecting Repositories — aggregates open access research outputs from repositories and journals worldwide. CORE provides “services for different stakeholders including academics and researchers, repository managers, funders and developers”.

Lies, Damn Lies and Viral Content at TOW Center for Digital Journalism that describes and links to a report by Craig Silverman on “How News Websites Spread (and Debunk) Online Rumors, Unverified Claims and Misinformation.” Beware the viral story.

Journalists today have an imperative—and an opportunity—to sift through the mass of content being created and shared in order to separate true from false, and to help the truth to spread. This report includes a set of specific and, where possible, data driven recommendations for how this anti-viral viral strategy can be executed.


RIP IPL2 and Infomine

Two subject directories held in high regard for several years have closed.

IPL2.org was a collection of resources aimed at the public library clientele and was maintained by students at a consortium of Library and Information schools in the U.S.A.. As of June 2015 it will no longer be updated.

Infomine was compiled by librarians at the University of California, Riverside and at its peak held 26,000 expert-selected resources, including “substantive databases, electronic journals, guides to the Internet for most disciplines, textbooks and conference proceedings”.  It was closed completely in December 2014.

These were both excellent resources for many years. But directories are labour intensive for people maintaining them and, to some degree, those using them.

Problems with Google Answers

Google’s expanded use of answers at the top of search results has some problems.  As always, the searcher has to know enough to vet the results received from a search engine.

When Google Gets It Wrong: Direct Answers With Debatable, Incorrect & Weird Content, Search Engine Land (June 17)

The addition of more direct answer content is fraught with problems as Google’s algorithms attempt to find answers to tricky queries. With no human review process in place for the results, the opportunity grows for debatable, incorrect and sometimes completely inappropriate content showing up as a top search result.

Google gives preference to mobile-friendly

ResearchBuzz picks up the darnedest items — What Google’s Algorithm Change Means for Library Websites (Public Libraries Online, June 9) —

“On April 21, Google changed its algorithm to give preference to mobile-friendly sites on searches performed on mobile devices. This means that sites that aren’t designated as “mobile-friendly” by Google sink to the bottom in mobile search results while sites that do pass the test appear toward the top.”

Article advises libraries on what to do to make their websites more mobile-friendly.

Of interest – “WordPress, for example, offers WPtouch, a plug-in that automatically enables a mobile theme for visitors reaching you by way of their phones”