AI-Based Scholarly Search Engines

Google Scholar has competition in two AI-based scholarly search engines: the new Semantic Scholar strong in the sciences, and the relaunched Microsoft Academic with content from many fields of study.

AI science search engines expand their reach “Semantic Scholar triples in size and Microsoft Academic’s relaunch impresses researchers”, by Nicola Jones, Nature (Nov 11)

[Semantic Scholar] A free AI-based scholarly search engine that aims to outdo Google Scholar is expanding its corpus of papers to cover some 10 million research articles in computer science and neuroscience, its creators announced on 11 November. Since its launch last year, it has been joined by several other AI-based academic search engines, most notably a relaunched effort from computing giant Microsoft.

Searching the Internet Archive

Internet Archive is getting easier to search. Keyword and site combinations have  worked in Google (eg site:archive.org  your terms ), but now the Archive offers faceted filtering according to media type and topic, and full text – but in beta, and we all know the problems with converting with optical character recognition – tread carefully.

Searching Through Everything, Internet Archive Blog, Oct 26

Every day, we see an average of 50,000 hits on our search pages, as you, our users, search for title, creator, and various other metadata about the items we’ve archived. But you have long asked when you would be able to search not only across all items but within them as well. For years you’ve been able to search within the text of a single book using our BookReader, but never before have you been able to search across and within all 9 million available text items at the Internet Archive in a single shot. Until now.

Search the Wayback Machine

It’s in beta, but we can do keyword searches on the Internet Archive’s Wayback Machine – the best (and often only) way to see a web page as it used to be.

Beta Wayback Machine – Now with Site Search!, Internet Archive Blogs (Oct 24)

With this new beta search service, users will now be able to find the home pages of over 361 Million websites preserved in the Wayback Machine just by typing in keywords that describe these sites (e.g. “new york times”).

Google – two indexes

This doesn’t sound good for the information professional working at a desktop – Google will have two indexes: one for mobile users for quick response, and a desktop version that will be less current.

Within months, Google to divide its index, giving mobile users better & fresher content Barry Schwartz, Search Engine Land (Oct 13)

“Google is going to create a separate mobile index within months, one that will be the main or “primary” index that the search engine uses to respond to queries. A separate desktop index will be maintained, one that will not be as up-to-date as the mobile index.”

Maybe it would be a good idea to break the habit of searching Google all the time.

Searching privately

Helen Brown gives us a good reason to use a search engine that does not track searches or pitch ads – Because it’s none of their business (Sept 8).  Of importance to the professional researcher, the filtering done by search and ranking algorithms may cloud results. Solution – use tools that don’t track but do have a broad reach. She offers a comparison of 13 search engines that indicates for an engine whether there are ads, personalized results, or  tracking.

Some to particularly note are:

Disconnect Search – web version of browser add-on. Operates through a proxy server to direct your queries to the search engine and the results back to you. See short video about the browser add-on. Also – more about Disconnect in Information Week — Disconnect Search: Google In Private (Mar 2014)

Duckduckgo – Bing-based but more of a meta search engine. Does not log any personally identifiable information.

Startpage – does not store personal history. But even more valuable is that  search results can be viewed  through the IxQuick Proxy. See StartPage Proxy Explained.

Oscobo in the UK claims to store no personal data. I suspect the web search results come from the Bing database. It also searches Twitter.

I have also used Carrot2, a meta-search engine developed in Europe (mostly Poland) that clusters search results by topic. Its web search uses Google and Bing. Carrot2 doesn’t promise privacy but as an intermediary it blocks personalization.

But for full privacy you may want to consider access through a virtual private network. Paul Gil at About.com explains why — 10 Reasons to Use a VPN for Private Web Browsing

Mobile searches over 50%

Desktop search has been declining for a couple of years – now down to less than 50% of searches – where Google still dominates with 63% of market share in the US and Bing, Yahoo, and Ask still hang on according to ComScore. Mobile search is a different story – Google is up to 94% in the US.

Billions served: PC search is down but query volume is way up for Google, Greg Sterling, Search Engine Land (Aug 31)

My Activity at Google

There may be some advantages to being able to see your activity on all the Google properties – and on pages that serve up Google ads. This would especially be the case if you are researching a topic across media and need to keep a trail. Or you need to confirm something you found earlier.

Google’s new My Activity page lets you see all your Google history in one place, Napier Lopez, The Next Web (June 28)

Nonetheless, it’s a bit scary to realize that Google could track all activity rather than just web search and therefore deliver more ads. But it might also be true that the ads will be better directed. “Mainly, you can control which kind of ads show up everywhere, across various devices and websites.”

You can find this through “My Account” – or go directly to https://myactivity.google.com/