Internet Archive is getting easier to search. Keyword and site combinations have worked in Google (eg site:archive.org your terms ), but now the Archive offers faceted filtering according to media type and topic, and full text – but in beta, and we all know the problems with converting with optical character recognition – tread carefully.
Searching Through Everything, Internet Archive Blog, Oct 26
Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.
At SearchResearch, Dan Russell, Google’s search guru, provides an excellent worked example of investigating “How Healthy is the Mediterranean?” (Oct 26) He outlines four strategies (which I paraphrase): get basic information and leads from Wikipedia, use scholarly resources, find a scientific journal, find an association or organization that is an authority.
Looking for training in researching prospective donors as part of prospect development? Helen Brown of the Helen Brown Group has several leads in her posting Expanding Your Prospect Research Horizons (Oct 27) that includes websites, the APRA association, and books.
Of particular interest to Canadians she notes a new book, Prospect Research in Canada.
Most excitingly, our cousins up North have just released a brand new book called Prospect Research in Canada! Edited by Tracey Church and Liz Rejman, two leaders (and forces of nature!) in our field, the book features expert authors taking a soup-to-nuts overview of fundraising research with a special focus on – but not limiting themselves solely to – Canada.
It’s in beta, but we can do keyword searches on the Internet Archive’s Wayback Machine – the best (and often only) way to see a web page as it used to be.
Beta Wayback Machine – Now with Site Search!, Internet Archive Blogs (Oct 24)
With this new beta search service, users will now be able to find the home pages of over 361 Million websites preserved in the Wayback Machine just by typing in keywords that describe these sites (e.g. “new york times”).
Internet Archive has been preserving the Web past for 20 years. Defining Web pages, Web sites and Web captures, Internet Archive Blog (Oct 23)
As of today, the Internet Archive officially holds 273 billion webpages from over 361 million websites, taking up 15 petabytes of storage.
Academic researchers need to know about citation research. This article in Online Searcher will shed some light on how Google Scholar really works.
Set Your Cites High: The Value of Quality Citation Information by Amy Affelt and David Pauwels (Sept/Oct 2016)
In Google Scholar, “Dates and citation counts are estimated and are determined automatically by a computer program.” Really!
This poses problems for the information profession who needs the exact number of cited references.
Article describes and examines: Google Scholar, Microsoft Academic Search, HeinOnline, Ebsco databases, Scopus, and Web of Science. Identifying and tracking down the cited references will take more than one approach or one tool.
This doesn’t sound good for the information professional working at a desktop – Google will have two indexes: one for mobile users for quick response, and a desktop version that will be less current.
Within months, Google to divide its index, giving mobile users better & fresher content Barry Schwartz, Search Engine Land (Oct 13)
“Google is going to create a separate mobile index within months, one that will be the main or “primary” index that the search engine uses to respond to queries. A separate desktop index will be maintained, one that will not be as up-to-date as the mobile index.”
Maybe it would be a good idea to break the habit of searching Google all the time.
Although Google has denied that ranking of search results is influenced by a user’s social connections, Google is certainly extremely interested in the methods and the possibilities.
Exploring a newly-granted Google patent around social signals, Dave Davies, Search Engine Land (Oct 7)
After considerable analysis of the patent, Davies concludes, “In short, according to this patent, what people you’re connected to recommend, like and engage with could be used to impact your rankings.” If Google is doing, the source of information isn’t clear – yet.
Also of great interest:
Another aspect of Google search that we need to be constantly aware of is that RankBrain now applies to all queries. Essentially, this means that artificial intelligence (AI) is interpreting all queries to some degree. While at this time the AI implementation revolves more around using machine learning to understand the nature of the query (and likely type of content and format being sought), its rollout to all queries and the promotion of John Giannandrea to Head of Search at Google marks the push into AI control over larger portions of the Google algorithm.
Orphan works are books and articles that are still under copyright but for whom copyright holder can’t be found. The Harvard Library is looking for ways to “solve the legal complexities of the orphan works problem by identifying no-risk or low-risk ways to digitize and distribute orphan works under U.S. copyright law”. It recently released David Hansen’s study “Digitizing Orphan Works: Legal Strategies to Reduce Risks for Open Access to Copyrighted Orphan Works”
Libraries, Orphan Works, and the Future of Copyright by Nancy K Herther, Information Today (Oct 4)
Article provides background to the current state of copyright law with some comparison of the US to Canada and the UK a 2013 law ““allows the government to grant firms or organisations the right to use orphaned material, providing ‘a diligent search’ for the copyright owner is first carried out. It also allows for the creation of an organisation that might levy licensing fees on behalf of absent content creators—and which would pay out to rights holders who subsequently discover their work has been sold.”
Hopefully the Harvard Library will succeed in its goal to “to help clear the way for U.S. universities, libraries, archives, museums, and other cultural institutions to digitize their orphan works and make the digital copies open access.””
LLRX – Law and Technology Resources for Legal Professionals – an important web journal for legal researchers – has been redesigned into a fresh and contemporary WordPress site. Sabrina Pacifici, the founder and publisher, wrote, “Your support is appreciated, and I will continue to maintain LLRX as a community of best practice and knowledge sharing for a wide range of professionals who are critical members of organizations in all sectors.”
LLRX.com offers a monthly edition of new articles, guides and topical resources comprised of comprehensive, reliable and wide ranging topical content to support actionable projects, research, teaching/training/learning components for professionals and students in law, academia, the public, private, and advocacy sectors. [Source]
Pacifici also blogs her own findings and observations on a variety of legal topics and information resources in beSpacific.