March 07, 2010

Forward look for Google Search

Peter Norvig offers an insider's look at Google Research during SMX West, Search Engine Watch (Mar 3)

"... Peter Norvig, who spearheads Google's wide-ranging research efforts, offered a behind-the-scenes look at the cool technology projects Google is developing for future products and services."

On the list of 21 project Norvig showed the audience were image swirl, Google Squared, clustering, attribute extraction.

Posted by Gwen at 02:18 PM

March 04, 2010

Google doing real-time indexing?

Google To Begin Indexing The Internet In Real-Time? by Alex Wilhelm, The Next Web (March 4)

Is real-time indexing a good thing?

"In a move that might rewrite the entire search market, Google is rumored to be creating a system that will let allow web publishers to submit content to Google for search indexing in real-time."

It's a kind of PubSubHubBub for moving syndicated content quickly online and into the readers.

"This move by Google, if it comes to fruition, would be a super-PubSubHubBub, not just moving your content into Google Reader at light speed, but also into the hands of the tens of millions of people searching Google every few hours. It would be a bigger move towards a real-time web than Twitter will ever be."

Google Index to Go Real Time , Marshall Kirkpatrick, ReadWriteWeb (Mar 3)

Apparently there are significant benefits.

"PuSH is much more computationally efficient for Google but Slatkin says that even more important is the impact of such a move for small publishers. Right now many small sites get visited by Google maybe once a week. With a PuSH system in place, they would be able to get their content to Google automatically right away.

A richer, faster, more efficient internet would be good for everyone, but the benefits in search wouldn't be limited to Google, either. The PubSubHubbub is an open protocol and the feeds would be as visible to Yahoo and Bing as they would be to Google."

Posted by Gwen at 01:03 PM

March 02, 2010

Book: Search Patterns

Search is the Web's fun and wicked problem, Mac Slocum, O'Reilly Radar (Feb 19)

Mac Slocum interviewed Peter Morville, author of the new book "Search Patterns" which looks at the next wave of search.

"He shows how "weird ideas" will shape search's future, and he also reveals the one recent innovation that unlocked a watershed moment for search (it's not what you'd expect)."

+ "web search works well for basic lookup" - a la Google - but not for much else
+ "Search is a complex, adaptive system and an iterative, interactive experience."

+ watch for emerging technologies as the base for changes in search

+ "search works best as a conversation"

+ "Social search" - not a threat to Google - but social search is changing the web search experience

+ users have to take some responsibility - information literacy is critical - "I'm convinced that information literacy is among the most important subjects we can teach our kids. They must learn where to search and how to evaluate what they find."

+ "Plus, search isn't only about findability. We created a searcher's edition of the user experience honeycomb to argue that search must also be useful, usable, desirable, accessible, credible, and valuable."

+ autocomplete - "new life in Web and mobile search."

Book's website - http://searchpatterns.org/ - Chapter 1 free plus pages about behaviour patterns, design patterns, and some illustrations.

Here's a preview:

Posted by Gwen at 04:54 PM

February 27, 2010

Location Aware Browsing

Location Aware Browsing

Some sites can tell you more and personalize your results if they know where you are. Firefox has "location aware browsing" for this.

It assures us that privacy is protected - you as user do get the last word.

Firefox says this on its location aware browsing page.

"Your privacy is extremely important to us, and Firefox never shares your location without your permission. When you visit a page that requests your information, you’ll be asked before any information is shared with the requesting website and our third-party service provider."

konsrtr is such a site - it shows upcoming concerts for a city - with images and videos. (Unfortunately it's not searchable - and the music events is all bands-in-town kind of thing.) It does recognize Toronto.

Charles Knight commented on konsrtr's simple design - When Less is More – concert search engine konsrtr (Feb 26)

Posted by Gwen at 01:46 PM

February 18, 2010

Kngine - Stunning new Semantic Search Engine

Kngine - Web 3.0 Search Engine

I nearly missed this new search engine that aims to "unlock meaning" in search. Amazingly, this search engine comes to us from Cairo, Egypt. The about page says -

"We are working on next generation of searching technologies to unlock meaning; rather than indexing the document in Inverted Index fashion, Kngine tries to understand the documents and the search queries in order to provide customized meaningful search result."

"Our goal is to build Web 3.0 Web Search Engine on the advances of Web Search Engine, Semantic Web, Data Representation technologies -- a new form of Web Search Engine that will unleash a revolution of new possibilities."

The Tour page shows the ways this approach can assist in searches.

* Read Perception Words with Multiple Meanings.
* Smart Information.
* Available Results. NEW !
* Concept List (List of things).
* Answer your questions.
* Comparisons.
* Updated Information (Weather, Stock, Currency Price, and Sport Matches Results).
* Link the data, and view direct data.

There could be a Canadian connection. One of the sample questions is When did the Toronto Dominion Centre open (but the link that Kngine provides for this is a bad search - somebody is not good with detail.)

Kngine gets its reference and question-answering information from Freebase, web results from Google, and maps from Google.

You have to stay high level in your queries to see the concepts. There are none for exploring the Canadian Arctic, but Canadian Arctic as a search identifies one concept and several "nearest" concepts. The full treatment of a topic shows in this query for green living.

Kngine screenshot Feb 2010

Choices for search are Web, Web with full information, and Photos. I haven't seen any difference in the two webs. The concept analysis doesn't apply to Photos - it seems to be the standard photo search.

I'm quite surprised that Kngine can answer questions like - top 10 countries for oil production, or top 10 countries for wheat. with a clickable map no less.

Since the web results are coming from Google, we can use Google's syntax. This can somewhat defeat the purpose of a semantic search engine, but may be helpful if you want pages from a domain: eg gov for US Government (site:gov), or ca for Canada (site:ca).

Kngine is very promising. For now it seems to deal with high level topics very well, and can handle some kinds factual questions. I don't know how far we can push that but it does a very good job on population of Toronto.

A note on the site reveals that "Kngine contains 1.2+ billion of pieces of data about more than 8 million concepts".

This is one to use and watch. Let's hope Kngine succeeds.

Posted by Gwen at 01:16 PM

February 17, 2010

Entity Cube - Uncanny

Entity Cube from Microsoft Asia is an extremely interesting experiment that works with entities and a faceted view.

From the site: "EntityCube is a research prototype for exploring object-level search technologies, which automatically summarizes the Web for entities (such as people, locations and organizations) with a modest web presence."

" EntityCube is just a prototype service for collating and summarizing the specific entity information appearing in the Internet web content. Its technology is built to help users to perform Web data mining for text contents around the names and get the search results based on the search of available web contents. "

It's very good on names of people who are very present on the web. See this profile on Stephen Abram. or Elizabeth May, leader of the Green Party. Will show related names, and academic references (though this part on most searches seems completely unrelated.)

See Entity Cube, Experiencing Information (Dec 19, 2009) for further comments.

Posted by Gwen at 10:27 PM

Google Personalized Results

Is Google Getting Too Personal?, SEOMoz (Feb 16)

Dr Pete ran some searches to find out how much Google actually personalizes results.

Key finding: "This data suggests that being logged out has very little impact on rankings, assuming that you're on the same machine with the same IP. Move to a new machine/IP, and the difference is much more substantial."

Posted by Gwen at 09:30 PM

February 04, 2010

Semantic Search in Search Engines Today

Web 3.0 and Semantic Search by Abhishek Gattani, AltSearchEngines (Jan 310

Guest writer, Abhishek Gattani, of Kosmix gives his take on the semantic web. Kosmix has been utilizing some of the semantic analysis and technology to present search results, and notes that Google, Yahoo and Bing have recently been adopting aspects.

He wrote - "Semantic web is about annotating facets and attributes associated with web content and linking data. In other words, semantic web is about teaching machines to read web pages, which are designed to be read by humans. So how can semantics improve search?"

One manifestation of semantic understanding, is he new ability of search engines to present the structure of a web page. "For instance, for events we [Kosmix] list the date, time, event snippet, and even ticket prices, which really let you decide if you should be clicking to book a ticket or it is not aligning with your schedule and budget."

Semantic technologies come into play into ranking results too - in the occurence of related words.

Posted by Gwen at 03:24 PM

January 31, 2010

Trend to Semantic Web Search

Bing, Google, And The Enigmatic T2: The Race For A Complete Semantic Search Engine, Erick Schonfeld, TechCrunch (Jan 22)

Bing in introducing its recipe search is getting closer to semantic search. But there are others interested in doing this well.

"Bing is big on guided search (showing relevant search categories to help narrow results), but this goes one step further towards semantic search (the ability to index and search the Web by different facets). Recipes are just the beginning, and it’s not just Bing. Google and a handful of startups, including Evri, Hakia, and Radar Networks, are hard at work on making semantic search a reality. The race is on to bring this type of semantic filtering for nearly every category of search across the Web."

Posted by Gwen at 11:26 PM

Google's Algorithms

Cracking the Google Algorithm, and Understanding Search Patents with Ted “tedster” Ulle, Stuntdubl (Jan 28)

Outside of Google itself, Ted Ulle of WebMasterWorld is the expert on Google's algorithms. In this interview he answers questions about "5 most significant algorithm change" and "top 5 changes in the next 5 years"

Of particular interest:

+ "Phrase-based indexing, as described in the 2006 patents, brought a deeper level of semantic intelligence to the search results."

+ "Geo-located results began to create different rankings even for various areas of the same US and UK city somewhere around 2005 or so."

+ "Google’s user "intention engine" has had a major effect, and that rolled out in a big way in 2009. This was coupled with a kind of automated taxonomy of query terms."

This was especially interesting because it suggests that Google is clustering results in the background but not showing the "taxonomy labels". Instead it selects from the clusters. See Ted's post from August 2009

I've been studying one of the "phrase-based indexing" patents that Google filed, in particular Automatic taxonomy generation in search results using phrases [patft1.uspto.gov]. It's giving me new thoughts on how search results can be blended to include representatives from different clusters, or different taxonomies related to the original query phrase.

Walking through the patent's logic: a search phrase is associated with several clusters of web pages. Each one of those clusters is a group that includes some other phrase, in addition to the requested keyword phrase. This assumes that the phrases that create a cluster are groups of words that offer what the patent calls "information gain".

This patent would automatically create a taxonomy label for each cluster, based on that second phrase. A given web page could be a member of more than one cluster, and therefore be part of several different taxonomies related to the principal search term.

From Webmaster World - Google Search News - Blended Results, QDF and User Intention at Google

+ "The beginnings of sentiment analysis may begin to show up in the next few years. I expect to see it first on the level of rating for where content falls on a fact-to-opinion spectrum. Full sentiment analysis (rating content on a "favorable-to-critical" opinion spectrum) is already in use for some social media monitoring, but that is probably too big a technical challenge to expect Google to go with it in the general search results."

Posted by Gwen at 03:13 PM

January 25, 2010

Search from a User's Perspective

Bing’s Stefan Weitz: Rethinking The Search Experience, by Gord Hotchkiss, Search Engine Land (Jan 22)

Gord Hotchkiss through a series of articles will explore where web search might be going from a user 's perspective. He begins with talking with Stefan Weitz, Director of Bing Search, about using search to make decisions.

Weitz described the main concept to Bing as a decision engine.

" The Decision Engine was built around three big areas. The first was providing great core results. That’s the standard “block and tackle,” keyword to keyword algo based search. ... The Decision Engine comes when you add in the other two big things.

The first is that organization of results to help people explore topics that they don’t understand. Can we do a better job with related searches? Can we organize results using categorized search. Can we semantically break down the 160 million results in a way that makes more sense. The third thing is how can we provide tools that help you make decisions? We focused in the initial release on the travel, the shopping, the local, the health. We built fairly complex computer science tools to help you when you do decide you want to a search engine and book that trip to Florida. What can we do differently that will help you get that job done faster? In the travel vertical, it’s the Farecast technology, the ranking within airfares… all those types of things. That’s how it practically manifests in the engine and it is designed to respond to those data points I mentioned earlier."

Posted by Gwen at 04:41 PM

December 29, 2009

Making Search Engines Kid Friendly

Helping Children Find What They Need on the Internet
By STEFANIE OLSEN, International Herald Tribune (Dec 25)

Search engines have been mainly designed for adults who are equipped to think up keywords. Children haven't reached that stage. Several search engines are trying to serve them better, and in doing so will probably help adults too.

"Children’s choices of search engines differ only slightly from the preferences of adults. Google ranks most popular among children, followed by Yahoo, Google Image search, Microsoft’s Bing and Ask.com, according to the research firm Nielsen. (Among adults, Bing is ahead of Google Image.) "

From this may come new and better visual aids and search prompts.

Posted by Gwen at 02:44 AM

December 27, 2009

Google Expands Search with Synonyms

How Google May Expand Searches Using Synonyms for Words in Queries, SEO by the Sea (Dec 22)

Google sometimes searches on words related to the ones you use. This patent discovery confirms that.

"A patent granted to Google this week explores how the search engine might expand the search terms that searchers use to include synonyms in searches, to make it easier for searchers to locate information on the Web. In the Ft. Wayne example, this could mean that Google would look for pages on the Web that were relevant for both [web hosting Fort Wayne] and [web hosting Ft. Wayne]."

This posting describes the process for finding the synonyms (or related words) and evaluating the quality of the words in context.


"What does this mean for you as a searcher or as a site owner if Google is using this process?

For searchers, it might mean that Google may add pages to your search results based upon words it perceives as synonyms to words you used in your query. Search for something while including the words “District of Columbia” in your search, and you may see also see pages that use “Washington, D.C.” or “D.C.” instead of “District of Columbia.” "

Posted by Gwen at 01:57 AM

December 08, 2009

What's Behind Snippets

How a Search Engine May Choose Search Snippets By Bill Slawski, SEO by the Sea (Dec 4)

Today search engines show a snippet from the page - this might be summary of page, section where your search terms occur, description from DMOZ (true at Google), or the publisher's metadescription (rare).

Yahoo has a patent that should make those snippets more relevant.

"The Yahoo patent filing tells us that it could look at the following for each line on a page, to come up with a score for each line to use as a snippet:

* A query-independent relevance for each line of text – a degree to which the line of text of the document summarizes the document.
* A query-dependent relevance of each of the lines of text – a relevance of the line of text to the query.
* The intent behind a query.

Posted by Gwen at 12:00 AM

December 02, 2009

Book on Search Interfaces

Search User Interfaces by Marti Hearst, Cambridge University Press 2009.

Marti Hearst has been involved in the Flamingo Project and knows a lot about search behaviour and designing search interfaces. This book is available online for free.

"This book has two intended primary audiences. The first is academic researchers, graduate students, and those teaching graduate level courses in information retrieval, user interfaces, and other information management-related topics. The second intended audience is practitioners who design and build search interfaces."

There are also some webcast videos from the course on Search Engines: Technology, Society, and Business (2005)

Book was mentioned in A Roundup Of 2009’s Best SEO Books by Chris Sherman, Search Engine Land.

Check that article for short reviews of four titles on search engine optimization.

Posted by Gwen at 02:28 AM

November 09, 2009

Whatever Happened To?

Not all good products succeed.

+ Groxis - maker of the visual search tool, Groker - has closed. Happened early in 2009. Groker was a front end for Yahoo (in a demo), and was used by Internet Public Library.

+ Siderean Software - notable for relational navigation that was once used by LII.org is floundering. Web site is still up but news is from 2008.

Source: Information Today (Oct 2009)

Posted by Gwen at 07:36 PM

November 07, 2009

Microformats demystified

Microformats: What, How, and Why by Steven Bradley, vanseodesign (Nov 3)

Steven Bradley to the rescue - in this post he describes microformats clearly and succinctly - it's a markup (code) that makes it easier to "share and reuse information across different applications and websites."

"The goal of this post is to introduce you to the what, how, and why of microformats and point you in the right direction so you can begin using them where appropriate."

Microformats are used for various purposes, but one that will matter to searchers is in the creation of rich snippets.

"Google recently introduced the idea of rich snippets into search results. Rich snippets make use of microformats to add additional details about your site in the snippet below your link in search results. Reviews about your products or contact information for ordering might be included directly on the search results page."

Posted by Gwen at 06:48 PM

November 02, 2009

Cognition and syntactic parser

Cognition Technologies Parses Its Way to a Better Understanding of Language, at Altsearchengines (Nov 1)

Cognition is doing a lot of work on semantic mapping to make natural language processing effective. This posting at Altsearchengines talks about a new syntactic parser.

"Cognition Technologies has recently added an advanced syntactic parser module to its language-understanding technology. “What does that mean?”, you ask? It means that Cognition can now “parse”, or break down, the component parts of sentences to deliver an even more accurate and complete understanding of the content. Since words often have more than one meaning, the ability to parse sentences enhances the technology’s ability to understand the context and sentence structure of the material being analyzed."

Try it at http://www.cognition.com/. This is one of the directions search technology is taking.

Posted by Gwen at 10:13 PM

October 30, 2009

Wikipedia has a role in Semantic Web

Two posts from the Kosmix Blog about Semantic Web.

Basically, Digvijay Lamba observes - "that Wikipedia can provide a global and ever improving vocabulary bloggers and other content creators to provide richer context around what they write." - Wikipedia can help with identifying the entitities and providing context.

Why Wikipedia Can Make a Giant Leap Ahead for the Semantic Web

Wikipedia and the Semantic Web – Part 2

The future:

"In the end, we have to take baby steps in our goal for rich semantic annotation of Web content. Automated tools are already attempting to do this for content that has already been created. Will the automated methods improve fast enough that there will never be a need for content creators to annotate? Or will having a vocabulary and an easy method of annotation give enough advantage to the content creators that we will see widespread adoption? My guess is that the answer lies somewhere in between."

Posted by Gwen at 12:55 PM

October 18, 2009

How important is PageRank at Google?

Google Removes PageRank Data From Webmaster Tools, Search Engine Roundtable (Oct 15)

Reports that Google has dropped PageRank information from the Google Webmaster Tools - presumably to encourage webmasters to attend to other ways to improve rankings. Why, then, does it remain on the Google Toolbar for searchers to use?

But - I don't think searchers use it much if at all. And in my analysis of search results, albeit loosely done, very often high page rank does not rule in the top 3 results. In fact I was beginning to suspect that pagerank was much less a factor than people have thought. It may be that Google really has changed its algorithms so much that PR - or at least the "many links" understanding of it has been downgraded. This posting in Search Engine Roundtable might be more evidence of that.

However, Google's technology page puts PageRank first. It is clear that this is no longer a simple links-in algorithm (if it ever was):-

"PageRank reflects our view of the importance of web pages by considering more than 500 million variables and 2 billion terms. Pages that we believe are important pages receive a higher PageRank and are more likely to appear at the top of the search results.

PageRank also considers the importance of each page that casts a vote, as votes from some pages are considered to have greater value, thus giving the linked page greater value. We have always taken a pragmatic approach to help improve search quality and create useful products, and our technology uses the collective intelligence of the web to determine a page's importance."

Posted by Gwen at 03:01 PM

October 09, 2009

Evolution of Search from Boolean to Entity

An Evolution of Search, by John D. Holt and David J. Miller, ASSIS&T(Oct/Nov 2009)

John D. Holt and David J. Miller, senior architects in the Lexis Nexis Risk and Information Analytics Group, review the progression in information search and retrieval technologies since the early times of Boolean. The tugging match is still between precision (precise results) and recall (comprehensive). Search has evolved to Entity search - retrieval by the attributes of the information item.

"This paper provides a brief review of some of the earlier stages of search evolution in the context of the evolutionary pressures of the concurrent improvement of both precision and recall. "

Most especially note the conclusion:

Entity search is another step in the evolution of information retrieval systems. Entity search builds upon Boolean and relevance ranking techniques. Entity search provides improvements in both precision and recall over traditional Boolean and relevance ranked search techniques.

Boolean search techniques require the researcher to be knowledgeable of the words and expressions used in the document or record collection. Precise results can be obtained, but at the cost of a significant drop in recall. Recall can be achieved, but only at a significant drop in precision.

Relevance ranking via statistical techniques can be used to improve apparent precision in some cases. However, the statistical techniques do not apply well to searching structured and semi-structured data with attribute values.

The linking or clustering of the documents or records into sets of references that describe an entity can be used for much more than just reporting on an entity. The information from the set can be used in some cases to improve recall by broadening the search. Alternatively, and more powerfully, the entity can become the object of the search.

A search expression that specifies a set of attribute values can be used when the entity is the object of the search. Both precision and recall are improved. Precision is improved because the entities returned are all consistent with the attribute values supplied in the search. Recall is improved because the combination of entity values specified in the search expression need not appear in any particular underlying reference document or record.

Posted by Gwen at 03:08 PM

October 06, 2009

In-Depth on Google Search

Business Week ran a lengthly series on Google Search in which Silicon Valley bureau chief Rob Hof interviewed CEO Eric Schmidt and the major heads of search technology.

The main article was Can Google Stay on Top of the Web? (Oct 1)

Below are the four interviews with the company's search gurus plus one with CEO Eric Schmidt.

Matt Cutts: How Google Deals With Web Spam, Rob Hof, Business Week (October 04)

This interview with Google's Matt Cutts tells us more about how Google search departments work together: ranking, spam control, and ads. It's all part of their mission to deliver quality results.

Evaluting search results:

+ "we’ve built up a lot of evaluation metrics"

Understanding search intent:

+ "We try to do a lot so we can understand queries better. Some people will mistype queries, so we try to do a real good spell-check system. A lot of people will type in synonyms, like "automobile" instead of "cars" when the name of the business is Cars R Us. So we try to take the query as a suggestion."

+ "We used to require an absolute perfect match, but over time we’ve gotten better at spelling, morphology, synonyms, all these sorts of things like stemming, where somebody types in “runners” and maybe they meant “runner,” or “running.”"

Delivering freshness:

+ "But in general, Google is fresher. Google is not only fresher but more comprehensive. Those are three key things: freshness; comprehensiveness (you want to crawl as much of the Web as possible); and relevance (core ranking and Web spam). And you want the user experience to be really clean."

Detecting hackers

+ "We write detectors. We’ve written classifiers—an algorithm, a heuristic that essentially takes a bunch of signals and tries to say yes, this site has been hacked or no, it hasn’t, and at what level of the directory and things like that."

Other articles in this series on Google search:

Google's Udi Manber: Search Is About People, Not Just Data

Udi Manher is VP of technology for search.

Excerpts:

Q: Can you give me a sense of the types of methods you use to improve search?

A: Humans are involved, formulas are involved, experiments are involved. We often do A/B tests, give one set of people an algorithm, give another set of people another set of algorithms and see how they behave. We measure lots of things, not just clicks.

Q: So you have to determine what does change and focus on indexing that?

A: We have to determine from the query whether it can benefit from something in real-time. Like “history of the Renaissance.” It’s possible that somebody on Twitter just mentioned that. But a) it’s not that likely and b) it’s probably not what you want. You want the best article on the Renaissance. So time is not as important on that kind of query.

But search for “earthquake” and time is much more important. Or a particular celebrity that had news in the last five minutes. So we have to change the algorithm based on the query. We do that now.

Google Search Guru Singhal: We Will Try Outlandish Ideas

Amit Singhal looks after ranking algorithms. His team ran 6,000 experiments last year which led to roughly 500 changes in how search works.

Google's Scott Huffman: Many More Search Features Coming

Scott Huffman's team evaluates the effects of every proposed change to Google. Last year there were 6,000 experiments.

"Huffman explained in detail how Google runs all those experiments—which include the use of hundreds of human evaluators in addition to Google’s massive computer infrastructure."

Google uses people and statistical analysis of clicks to evaluate the results. It especially works on relevance for a country or locale.

Excerpts:

Q: What does the evaluation unit do?

A: We try to measure every possible which way we can think of how good is Google, how good are our search results, how well are they serving our users. And we break that down all kinds of ways—by 100 locales [country plus language pairs], by different genres (product queries, health queries, local queries, long queries, queries that don’t happen very often, queries that are very popular) times how are we doing on those in France and Switzerland and other places

Q: Can you give me a sense for how you approach evaluation?

A: We use two main kinds of evaluation data. One kind is we have human evaluators all over the world for whom we have a workflow system. They come to it and are fed things to evaluate. A typical thing is: Here is a query, you’re speaking French in Switzerland, here’s a URL, tell us on some kind of scale or some set of flags and description how good of a URL is that for that query.

The other data source we use is live experimentation with our users. A typical example where we use that more is for user interface changes to search. It’s hard to guess what people’s reaction will be to any particular UI change.

Q: How are personalized search results evaluated—any differently?

A ... Another thing that we spend a lot of time on is at the country level. Many countries speak English, but when I type in, say “bank,” I want pretty different answers if I’m in the U.S. vs. the U.K. vs. India vs. Australia. And today Google gives you very different answers for those. It also applies inside the country—in Dallas and Atlanta, you’ll get different results for “First Baptist Church.” Those kind tend to be a little trickier for us.

How Google Plans to Stay Ahead in Search

"CEO Eric Schmidt discusses how Google is handling challenges from Microsoft and upstarts Twitter and Facebook—and why search remains its priority "

Q You said recently that you worry about where growth for a large company such as Google comes next. Where will that growth come from, and what does that say about what Google will be in five to 10 years?

A We are first and foremost a search company. Of course, search changes. Location will become more important, for example. As long as we can be first to invent the new solutions to search, we'll be fine. We're still investing a lot in search and search quality. In our case, growth will come from businesses we're already in.

Posted by Gwen at 02:11 PM

October 02, 2009

Concept Approach

Organizing the Web around Concepts by Mitul Tiwari, Kosmix Blog (Sept 30)

Identifies the next wave of web search as being one which reorganizes the "Internet by topic or concept". Examples of the concept orientation are "Freebase, Google Squared, DBLife, and Kosmix topic pages".

Kosmix sees web pages being comprised of three types: search pages, topic/concept pages, and articles. Searchers benefit from seeing concepts related to those in their query.

People / editors organize pages by concept - this is the essence of a directory. But today topical approach is done algorithmically through:

+ concept extraction
+ relationship mining
+ linking data with concepts

Concluded - "In short, organizing the web around concepts is a promising area and a stepping stone to bring meaning behind the web data."

Posted by Gwen at 08:16 PM

September 11, 2009

Factors in Ranking Search Results

17 Ways Search Engines Judge the Value of a Link, Rand Fish, SEOmoz Blog (Sept 10)

Illuminating article on the most important factors to a search engine in ranking results. It opens with the importance of links between domains.

"As you've likely noticed, search engines have become more and more dependent on metrics about an entire domain, rather than just an individual page. It's why you'll see new pages or those with very few links ranking highly, simply because they're on an important, trusted, well-linked-to domain. In the ranking factors survey, we called this "domain authority" and it accounted for the single largest chunk of the Google algorithm (in the aggregate of the voters' opinions). Domain authority is likely calculated off the domain link graph, which is unique from the web's page-based link graph (upon which Google's original PageRank algorithm is based). In the list below, some metrics influence only one of these, while others can affect both."

Posted by Gwen at 05:52 PM

September 10, 2009

Search Evolving

Choose Your Own Adventure: Alternatives to Bing and Google by Jason W Bunyan, DMB (July 30)

This article was in response to the question - is Bing better than Google, but is really about whether or how much search will evolve.

Nice quote from Sue Feldman:

"Search has a long way to go, according to Dalhousie University Canada Research Chair in Management Informatics and associate professor Dr. Elaine Toms. “Sue Feldman used a wonderful analogy: we have ovens, microwaves, toasters and barbeques, which all have heating technology, and we use each for its specific purpose. So why not multiple search tools? In her example, I think the common outdoor fire is the current search engine. And we have not yet developed tools for task-specific environments that need and use rich information. [Compare] what a scholar needs and what a health consumer needs. We are not even close.”"

Posted by Gwen at 07:07 PM

September 03, 2009

HealthBase with Content Intelligence

HealthBase--medical search engines maturing by Elizabeth Armstrong Moore, CNet (Sept 2)

HealthBase uses a "content intelligence platform" as semantic technology to understand health content.

"Culling through 10 million health articles and sorting search results on two types of data, "conditions" and "treatments," into manageable subsets, HealthBase includes "causes of," "treatments for," "complications of," and "pros and cons of treatment." Content sources are also provided and ranked. And Jens Tellefsen vice president of marketing and product strategy, said it might include user collaboration akin to Digg's voting articles up or down in the near future."

For more about Content Intelligence see Is Content Intelligence the New Business Intelligence?

"Content intelligence is about creating new content and information services derived from a company’s own premium content, and then optionally combining and enriching it with insights from the Internet, resulting in new sets of content that can power new and differentiated information services. But how is this achieved? By using semantic technologies to mine the breadth and depth of relevant, targeted information from the Web, or proprietary or enterprise sources."

Postscript

Comments from Gary Price - Netbase Debuts HealthBase Demo (Sept 2)

Posted by Gwen at 01:27 PM

September 02, 2009

Future of Image Search

It’s semantic – easier solution to annotate and search images , ICT Results (Aug 27)

Indicates the direction of image search - a mix of text mining (surrounding text and name), object identification and face identification - plus semantic annotation or additional assigned terms.

Posted by Gwen at 01:26 PM

July 30, 2009

Semantic Search Explained

Search: The Last Frontier by Barbara Brynko, Information Today (June 2009) - via AllBusiness.com

Report from the 2 day Infonortics Conference in Boston in April 2009. This is always cutting edge. Semantic search was the main topic.

Why Semantic Search?

Since searchers have begun wading through the quagmire of information, their needs have changed and so have their tolerance levels. There are many times when ? age -ranking results just don't produce what users are searching for on the web. Dmitri Soubbotin from Semantic Engines elaborated on three reasons users need semantic search. First, he says users deal with insufficient relevance of traditional search results; users just spend too much time searching for information but not always finding what they want. Second, users are pressed for time and have short attention spans; users want relevant information retrieved quickly. Third, most users only look at the first page of the results and don't even peek at the useful sources beyond. Far too many users say they will "settle for what I have here," he says.

But what's really under the hood? Instead of using ranking algorithms as Google does to try to predict relevancy for the user, semantic search uses the science of the meanings in language to produce point- on results. Natural language processing, linguistics, and text mining can be matched against an ontology that works especially well for verticals. Homogeneous content yields better results; there's just "less noise" and less disambiguation for users to deal with.

After all, the goal of the web is to extract more relevant results and to retrieve accurate answers for users while discovering additional content and digging deeper for pertinent data.

A search engine such as Sensebot provides an overview of a topic's hard facts interspersed in text results. Users receive a multidocument summary and links that go beyond simple information search and retrieval.

For Diane Burley of Nstein, "Search is so yesterday. ... It's now all about the finding."

But to make the process of finding information easier, we need to take a look at how people seek information, how they orient themselves, and what their sources of frustration may be, she says.

"Until users are inconvenienced, they don't see the value in the search process," she says. If concepts and entities are extracted, links give users more reason to stay on a site and make it easier for them to mine and to aggregate results, even across different languages and country borders.

Bringing Clarity to the Mix

For semantic search to work effectively, users need to maximize relevance and minimize disambiguation. Kathleen Dahlgren from Cognition Technologies explored approaches to tagging, ontology, syntax parsing, and a semantic map. The most common words are the most ambiguous, she says, using the word "lemon" as an example. A word string for "lemon" produces a number of possible definitions: It could be a citrus fruit, a poorly manufactured item, a yellow color property, or a behavioral property.

But word definitions are just part of the puzzle for semantic technology. Add concepts to the mix (lemons are typically yellow) and personal ignorance (pythons are dangerous, but what are pythons anyway?) and social ignorance (the sun revolves around the Earth), and users have the beginnings of a deeper search. In semantic analysis, the word is not only defined by its relevancy, it also takes into account the other words that are present in the sentence and as part of the context of the complete text. Less disambiguation means more-relevant results and a better understanding for the user.


Other topics:

+ Image-driven search and visualization
+ Mobile - voice search
+ Enterprise search - and e-discovery
+ Meaning extraction
+ Aids for engaging in a dialogue with the user

Posted by Gwen at 11:41 PM

July 11, 2009

Linked Data and Semantic Web

Semantic Web is getting much more attention. Richard MacManus interviewed Tim Berners-Lee at MIT in July

Part 1: Linked Data - this is the base - "The Semantic Web and Linked Data connect because when we've got this web of linked data, there are already lots of technologies which exist to do fancy things with it. But it's time now to concentrate on getting the web of linked data out there."

Part 2: Search Engines, User Interfaces for Data, Wolfram Alpha, And More... - Tim Berners-Lee describes how search will be --

"So I think people will search using a search text engine, and find a webpage. On the front of the webpage they'll find a link to some data, then they'll browse with a data browser, then they'll find a pattern which is really interesting, then they'll make their data system go and find all the things which are like that pattern (which is actually doing a query, but they'll not realize it), then they'll be in data mode with tables and doing statistical analysis, and in that statistical analysis they'll find an interesting object which has a home page, and they'll click on that, and go to a homepage and be back on the Web again. "

Posted by Gwen at 12:33 AM

June 30, 2009

Advanced Google Custom Search

Advanced Custom Search Configuration, Google Custom Search Blog (June 29)

Presentation at Google I/O by Nick Weininger on Advanced Custom Search Configuration. [46 minutes]

Key tools for building and presenting - including new features for rich snippets and microformats.

Shows About.com's uses of CSEs on topics. Also - the Google Blogger search gadget for creating a search on the blog's domain of interest.

In last 20 minutes Adobe showed a use case of Custom Search for community help.

Posted by Gwen at 11:20 AM

June 18, 2009

Search Heads discuss Semantic Search

Search leaders debate semantics, by Tom Krazit, Webware (June 17)

"Panelists from the four major search engines--Google, Yahoo, Bing, and Ask.com--joined Web search start-ups TrueKnowledge and Hakia at the W3C's Semantic Technology Conference to discuss the rise of semantic technology as the engine behind the still nascent Internet search industry. Semantic search, or the idea of divining a user's true intent from how they enter their queries and how Web data is structured, is an unfamiliar concept to the majority of Web surfers who tend to think Internet search is actually pretty good as it is."

Semantic technology for search is about:

+ structuring data - Andrew Tompkins, chief scientist at Yahoo Search - "Today on any major search engine, you'll see structured information about a restaurant," he said, basic things like phone numbers, address, or maybe a link to a map of its location. All of those things require agreement on standards to make it happen."

+ analyzing the meaning of plain text

+ answering questions - "The goal of all this work is to make search more intuitive, more like asking a friend or colleague a question, said Riza Berkan, CEO of semantic start-up Hakia. "We believe search is going to move to more conversational techniques," he said."

Posted by Gwen at 12:06 PM

June 16, 2009

The Common Tag

Yahoo! Announces Common Tag: Like The Meta Keywords Tag, But Even Better, Vanessa Fox, SearchEngineLand (JUn 15)

Common Tag - an effort to create a semblance of structured data (or semantic tagging) but hard to know now what will come of it. Somewhat replicates the simpler meta keywords and social bookmarking tagging.

"Not only does Common Tag seem to replicate the purpose of the meta keywords tag, it seems to also replicate Delicious-style tagging and external anchor text."

Posted by Gwen at 12:14 AM

June 06, 2009

Google's Evaluators

Google and the Evolution of Search I: Human Evaluators, by John Paczkowski, Digital Daily (june3)

Are people involved in adjusting the ranking og Google's search results?

"Google, for example, employs a vast team of human search “Quality Raters” (You’ll find a copy of an old training manual here). Spread out around the world, these evaluators, mostly college students, review search returns against established criteria–testing different algorithms and see which works “best” in predicting the quality of a site (though not directly judging the quality of any individual site itself).

They’re aided by Google’s own registered users, who can now, when logged into their Google accounts, promote and delete sites from their own search returns according to their preferences."

Would be helpful to have an estimate of the number of registered users who bother to adjust the rankings.

This is a three-part series of interviews with Engineering director Scott Huffman of the search evaluation team. Senior Google software engineer Matt Cutts, and Google Fellow Amit Singhal.


Amit Singhal closed wtih "AS: I believe that the role of the human evaluator in search will be there until we can understand language by computers, which is a far distance from where we are today. You know, we have made great advances but by no means is our language understanding technology close to saying this person really meant to get this document or not."

Posted by Gwen at 01:00 AM

June 04, 2009

Infonortics 2009 - presentations

Presentations from the April 2009 Infonortics conference held in Boston are available for viewing. Most are pdfs. There are also interviews with the speakers by Stephen Arnold.

Topics

+ Semantic approach to search
+ Semantic web
+ Visualization of results
+ Classification
+ Voice search
+ E-discovery
+ Enterprise search


Posted by Gwen at 05:00 PM

May 20, 2009

Is Ask Smarter

Ask.com Searches Smarter, Ask.com Blog (May 19)

It's a case of blowing own horn, but Ask answers some questions quite well by being able to search structured data. This blog entry points to an article by Jennifer Zaino - Ask.com Answers the Data Extraction Question at SemanticWeb.com

"Ask.com is putting a focus on the structured data search problem, helping searches extract web data that is often not in text but in database tables and XML feeds where keyword searches don’t cut it. For example, a table might have data points around the words Toyota, Prius, and hybrid, and price, but if you ask most search engines to what is the price of a 2009 Toyota hybrid Prius that table won’t come up because those keywords aren’t together in the table."

Interesting - but Ask has concentrated on consumer interests, and the consumer is pretty loyal to Google.

Posted by Gwen at 02:29 AM

May 17, 2009

Changes in the Search Scene

Changes to search tools are coming on fast and furious this spring.

New search engines aspire to supplement Google by John D Sutter, CNN (May 12) notes some themes or trends.

+ "Some sites, like Twine and hakia, will try to personalize searches, separating out results you would find interesting, based on your Web use.

+ Others, like Searchme, offer iTunes-like interfaces that let users shuffle through photos and images instead of the standard list of hyperlinks.

+ Kosmix bundles information by type -- from Twitter, from Facebook, from blogs, from the government -- to make it easier to consume."

+ Wolfram Alpha crunches data

+ community ranking (Wikia) is fading

+ real-time search is at Twitter and Twitter-related engines - and more will do this

+ social search has a future (even if community ranking doesn't) - Twine

+ Google's "show options" - new ways of viewing results

Posted by Gwen at 04:51 PM

May 13, 2009

Infonortics 2009

Infonortics Search Engine Meeting, Boston, April 27-28, 2009 - this is one of the preeminent conferences in the year on search and information technology. Presentations are available for many of the sessions. Topics include:

+ several on semantic web
+ visualization of search results
+ classifying images
+ mobile search
+ e-discovery
+ natural language based text mining
+ text analytics
+ information seeking process

Very rich - spend an hour or two.

Posted by Gwen at 02:41 AM

April 15, 2009

Semantic Search Engines to Try

9 Semantic Search Engines That Will Change the World of Search, by Arun Radhakrishnan, Search Engine Journal (April 13)

We all hope that semantic technology will change search so that results are closer to what we mean (and not necessarily what we said). This article describes nine contenders - briefly but well.

+ Hakia - that uses "concept relations", a list of possible queries connected to answers, and ranking based on sentence analysis.

+ Kosmix - creates a "dashboard of content" - though I prefer to say dossier on a topic.

+ Exalead image search (not the web search) - narrow selection by facet (I'm not sure how "semantic" that is).

+ Sensebot - creates a summary of the top results

+ Cognition Search - maps the English language - has some trial content areas (legal, health, wiki, bible)

+ Lexxe - prefers short questions - and then it applies its natural language analysis. The clustering helps.

+ Swoogle - searches "semantic web" documents created in RDF. Useful mainly to the specialist.

+ Factbites - returns results with understandable sentences (one of my favourite engines).

+ Powerset - studies meaning of sentences rather that word relationships. It built its pilot using Wikipedia. Now owned by Microsoft, we expect that some of the technology will be used to improve Live Search.

Interesting point: "The appeal of semantic search engines is that the content of a page alone decides its utility. This means lesser spam and of course more relevant ads. It would be harder to game a semantic web engine."

Posted by Gwen at 08:42 PM

April 03, 2009

Insight into Federated Search

Federated Search Blog (by Sol Lederman) has a series of interviews with "federated search luminaries": Erik Selberg, Michael Berman, Todd Miller. Kate Noerr of MuseGlobal, a fourth, is on her own page.

From the Michael Berman interview

"Search engines work best in the discovery phase, when searching is a fast, give-and-take, contact sport. Real-time performance is important and interaction and testing are the user mode. I frankly feel deep Web search is not terribly useful or helpful in this phase. Identifying candidate searchable databases can be very important in this phase, but that can be accomplished from a search engine for databases such as CompletePlanet or the DQM rather than going to the site directly (reserving deep Web search for the purposeful harvest mode.)

Once the researcher has got a good bead on their capture requirements, harvesting and the deep Web come to the fore. But, this can be scheduled, and need not meet a real-time criterion. "

Posted by Gwen at 04:08 PM

Semantic Web

Video of My Semantic Web Talk by Nova Spivack (Feb 2008)

Where we were and where we are going, from Web 1.0 to Web 4.0 - what we do and how we search. Nova Spivack at Radar Networks speaks to a group of students about semantic web. Points to the weaknesses of Google and describes alternatives: linguistic approach (expensive), semantic web (using metadata to describe items), artificial intelligence (ontological and reasoning engine). Some "make the software smarter", and others have higher component of "making the data smarter".


Nova Spivack - Semantic Web Talk from Nicolas Cynober on Vimeo.

Posted by Gwen at 02:54 PM

Search Trends

Google Next Victim Of Creative Destruction? (GOOG) by John Northwick, Business Week (Feb 8)

John Northwick, who watched the AOL fall from innovator grace, offered this observation: " I now see search as fragmenting and Twitter search doing to Google what broadband did to AOL."

(Mind, as commenters to the article did point out, John is CEO of betaworks, a Twitter shareholder.)

Search has moved into two main streams: video (YouTube and more) and real-time (Twitter watching).

Video:

* "YouTube generates domestically close to 3BN searches per month — it’s a bigger search destination than Yahoo. "

* "44% of YouTube views happen in the embedded YouTube player (ie off YouTube.com) and late last year they added search into the embedded experience. YouTube is clearly a very different search experience to Google.com. "

* "Video search now represents 26% of Google’s total search volume."

Notificator (the electronic message board)

This really means getting the buzz of the moment whether it's about friends or events and developments.

"Yet at http://search.twitter.com the conversations are right there in front of you. The same holds for any topical issues — lipstick on pig? — for real time questions, real time branding analysis, tracking a new product launch — on pretty much any subject if you want to know whats happening now, search.twitter.com will come up with a superior result set."

It's the social context that is important - people you know (or know of), people you trust.

The post refers to an article by Gerry Campbell on the role of social inference in search. Search is broken – really broken. (Feb 6)

"Our daily lives are rich with social inference, and they happen in real time. Search from Google, Yahoo… you name it – they are all based on published (e.g. considered, thought-through) documents that take minutes-to-weeks to update in the search index."

Campbell wants to see "Realtime search, using social inference for discovery, ranking and prioritization."

Posted by Gwen at 01:31 PM

March 27, 2009

Parsing the URL for Keywords

Do Search Engines Look at Keywords in URLs? By William Slawski, SEO by the SEA (Mar 26)

Judging from this Yahoo patent application, search engines do consider words in the url.

"Keywords may also be extracted from the URLs of pages, by using an algorithm that can break the URL into components, understanding the structure of those URLs, and removing candidate keywords from the different parts found within the URL."

Posted by Gwen at 11:24 AM

March 25, 2009

Hakia's Aspect Categorization

Automated Categorization of Search Results, a New Era? Hakia (Mar 23)

Hakia calls the categorization that we see in the Galleries - aspect categorization.


Aspect categorization is different than what some search engines are already doing. For example, dividing the SERP into Web Results, Videos, News, Images, etc., is not aspect categorization. However, when the categories are related to the query, such as Obama’s Speeches and Quotes, Obama’s Fans, etc., (for the query Obama) then it is aspect categorization.

Posted by Gwen at 12:50 PM

March 20, 2009

Google Search Customization

How Searchers’ Queries Might Influence Customized Google Search Results by William Slawski, SEObytheSea (Mar 19)

Slawski presents a possible explanation for how Google personalizes results by considering earlier queries by you and similar ones by others.

You might see this message from Google:

Recent Searches You or someone else recently searched for infinity auto using this browser.

Possible ways that searches might be found to be related:

+ "if they are typed in by a searcher consecutively"
+ "if they are performed by a searcher with a certain period of time, such as within 30 minutes of one another"

Google has more on this -- Features: Search customization details

Posted by Gwen at 10:20 PM

March 09, 2009

Wolfram Alpha

British search engine 'could rival Google' by Bobbie Johnson, Guardian (Mar 9)

Watch for a new natural language search engine called Wolfram Alpha to be released in May 2009. Stephen Wolfram, a British scientist, aims at succeeding with natural language, not through ontologies (semantic web), but through computations.

Wolfram described this as "explicitly implement methods and models, as algorithms, and explicitly curate all data so that it is immediately computable" in his blog entry on Wolfram / Alpha is coming.

We'll see.

Posted by Gwen at 12:06 PM

March 08, 2009

Semantic GoPubMed

Go3R - search technology from Germany that sorts results into facets. Information page must be striking a humourous pose with its opening paragraph -

"The project aims at developing a knowledge-based search engine for alternative methods to animal experiments in order to provide optimal search options for alternatives to animal experimentation. The first step consists of developing an ontology for the knowledge domain of alternative methods to animal experiments. Such an ontology represents a system of knowledge which permits logical deductions as a result of the numerous relationships between terms describing alternative methods it contains - in rough analogy to the possible connections between synapses in the brain."

Putting aside the 'animal experiments', you can experiment with pubmed searches at GoPubMed.

Posted by Gwen at 11:38 PM

March 07, 2009

Marissa Mayer with Charlie Rose

Marissa Mayer On Charlie Rose: The Future Of Google, Future Of Search by Michael Arrington, Tech Crunch (Mar 6)

Marissa Mayer, VP at Google of Search Product and User Experience, spoke to Charlie Rose about search and technology. This posting has the transcript and video. Duration 54 minutes

Opening question: Is it fair to say that search is in its infancy? - YES

Posted by Gwen at 04:14 PM

Personalized Search at Microsoft

Understanding Intentions and Microsoft Search Personalization by Bill Slawski, SEO by the Sea (Mar 6)

Kumo - means spider or cloud in Japanese - and is the code name for a new version of Live Search. Let's hope that Microsoft is going for cloud - the spider theme is tired.

Bill Slawski has studied a patent filing that points to Microsoft's intent in understanding the searcher's intent.

"The basic premise is that when two different people are searching for the same query term, chances are that the answers that they are trying to find or the sites that they might want to see are different, and that a search engine might be able to help each of those searchers find what they are looking for based upon past experience, and past searches and search result selections."

The search engine will need to get to know you to do this - cookies, and history of search queries (words used) and results clicks (what did you look at).

The nugget: "If you tend to search using the same or a substantially similar query term or phrase, and tend to select the same page or pages in response to that search, don’t be surprised if at some point it might be highlighted or bolded or placed at the top of the search results in the future."

[Google does this today in its personalized search.]

Bottom Line: "The question though is, whether past searches are a good indication of intent for searches in the future? Sometimes the statement “It’s cold in here,” isn’t an invitation to a hug, but rather a request to turn up the thermostat. "

Also - First screenshot of Microsoft's Kumo by Ina Fried, CNet News (Mar 2)

Has the text of an internal Microsoft memo - "Announcement: Internal Search Test Experience " - and a screenshot of Kumo that shows: topical groupings (one word), related searches, your history; and it's in the universal style with mix of web, images, videos. Looks like there is some organization of results perhaps along the idea of what Kosmix does - to create a package.

Posted by Gwen at 03:20 PM

March 06, 2009

Trust and Authority

Dear Monica, We Changed our Algo - Google's Matt Cutts by Andrew Goodman, Traffick (Mar 5)

Has some information on Google's recent tweaks to ranking algorithms to give "trusted" sites a small boost.

"Matt [Cutts] says that Google doesn't think brand when it thinks about quality and authority ("if we did, you'd see Mitsubishi Eclipse ranking #1 for [eclipse]"), but this is disingenous. Indirectly, when you take that VW example, they are thinking brand when they take a shortcut that calls the VW.com domain "known information" and put a higher threshold of "track record required" on pages of sites that aren't as known and trusted. "

Video with Matt Cutts and comment are in this posting at Seroundtable -
Google Confirms Algorithm "Change" But Down Plays Brand Push

Posted by Gwen at 04:07 PM

March 02, 2009

Varieties of Semantic Search

Top 5 Semantic Search Engines, Pandia (Feb 16)

Semantic search is defined in this article as being able to "make sense of search results based on context" - to identify the concepts.

Makes the excellent point that "Semantic search has the power to enhance traditional web search, but it will not replace it. A large portion of queries are navigational and semantic search is not a replacement for these. Research queries, on the other hand, will benefit from semantic search."

Describes five candidates - but it is a mixed bag.

+ hakia - general web search

+ Sensebot - summarizes search results - demo on the web but better as a plugin

+ Powerset - prototype - searches Wikipedia. Best for a defined subject area rather than overall web. Incidentally, this is owned by Microsoft.

+ DeepDyve - digs more deeply into scientific databases. Author does not say why this is considered a semantic tool. It might be because it has a more-like-this option (and one presumes it does more than just match on text); and can cluster results based on concept analysis - but only DeepDyve Pro ($) users will see this.

+ Cognition - has done a "semantic mapping of the English language". Demos are available including one using WIkipedia

Posted by Gwen at 05:55 PM

February 26, 2009

Microsoft Experimental Search

Microsoft plans Google-killing search site - Experimental searh offering out this summer By Nancy Gohring, ComputerWorlduk (Feb 25)

Microsoft is still trying to make its mark in search. It has the Live Search database - now for an interface. This article previews a new site called Viveri where developers will test out new ideas.

"The site will serve Live Search results and is being built using Silverlight, Microsoft's technology for designing online user interfaces. "

The first aim is to dig into databases (everyone wants to do deep web now).

"One technology aims to better deliver search results from vertical search engines. When a user types a search item into the field, a typical list of results pops up. But on the right hand side of the screen several boxes appear. Each box contains results from within a specific domain that is relevant to the search term. The domain could be, for instance, Amazon.com, Craigslist, Consumer Reports or WebMD, depending on relevancy. "

So - this is and will be a direction for web search developments. I wonder if we are prepared to handle all the information that will be extracted from this "deep web", and what that will do to relevance ranking algorithms that have been quite finely honed?

Posted by Gwen at 07:13 PM

February 06, 2009

Case for Semantic Search

Riza Berkan: The Search for Quality on the Web, AltSearchEngines (Feb 5)

Hakia CEO, Riza Berkan, argues that semantic search technology (based on analysis of meaning) is far superior than the established statistical relevance ranking method. The example given of the old is Google, of course, and the point is made that old style is the base for search engine marketing and associated revenues. Berkan does not acknowledge Google's work with semantic technologies or personalization, both of which are throwing SEM off.

Berkan posits that semantic technology will address information quality issues and describes in clear language how it works.

"The underlying idea behind semantic technology is to teach computers how the world operates. For example, when a computer encounters the word “bill,” it would know that “bill” has 15 different meanings in English. When the computer encounters the phrase “killed the bill,” it would deduce that “bill” can only be a proposed law submitted to a legislature, and that “kill” could mean only “stop.”"

The promised benefit is that results will return meaning and be independent of popularity (ie number of links to a site).

"The answer is simple: precision. Once computers can handle natural languages with semantic precision, high-quality information will not need to become popular before it reaches the end user, unlike what is required by Web search today."

Posted by Gwen at 11:00 AM

February 04, 2009

Google Fills in Forms

How Google crawls the deep web by Greg Linden (Jan 31)

Refers to a paper in which Google describes how it fills in web forms to query databases.

"This paper describes a system for surfacing Deep-Web content; i.e., pre-computing submissions for each HTML form and adding the resulting HTML pages into a search engine index."

Interestingly, there was also this today --

Google: "We're Not Doing a Good Job with Structured Data" by Sarah Perez, Read Write Web (Feb 2)

"Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means."

Yahoo and Google are both working on automating the extraction of information from databases on the Web.

Posted by Gwen at 12:45 AM

January 24, 2009

Ask.com "Semantic" Technologies

AnswerFarm Technology from Ask.com, Ask.com Blog (Jan 14)

Ask.com has been blogging about its "semantic" technologies. This post gives us some idea of how the Q&A works.

"The technology behind the Ask Q&A channel is called AnswerFarm technology. We built it by crawling and extracting question/answer pairs from across the web – more than 100 million question/answer pairs from several hundred thousand sources – and it is, no doubt, the most comprehensive and diverse repository of question/answer pairs in the world."

From Semantic Search Technology Advances from Ask.com we learn that Ask can get into databases with its DADS technology - Direct Answers from Databases.

"With DADS, we no longer rely on text-matching simple keywords, but rather we parse users’ queries and then we form database queries which return answers from the structured data in real time. Front and center. Our aspiration is to instantly deliver the correct answer no matter how you phrased your query."

But they are "trialing" all this good stuff on sports, and specifically NASCAR.

Posted by Gwen at 01:41 PM

Failure Bomb Still Blowing Up

Obama Is “Failure” At Google & “Miserable Failure” At Yahoo by Danny Sullivan, Search Engine Land (Jan 22)

That "miserable failure" bomb never seems to go away. This detailed explanation by Danny Sullivan shows how complicated it can get with redirecting urls. Sullivan found that Obama was coming up in Google for the word 'failure' - that seems to have stopped since, and for 'miserable failure' at Yahoo, which is still true (Jan 24). Frankly, White House IT staff should just hire Sullivan to fix it.

Posted by Gwen at 01:20 PM

Will Google Discern Meaning?

Google To Push Semantic Search In 2009? by Matt McGee, Search Engine Land (Jan 23)

Matt McGee spotted a few words spoken by Google CEO Eric Schmidt that suggests that Google will do better at understanding "the meaning of your phrase rather than just the words that are in that phrase".

Posted by Gwen at 01:08 PM

January 21, 2009

Inquira for natural foreign language search

Natural (foreign) language search, Enterprise Search Center (Dec 17, 2008)

"InQuira has introduced Version 8.1 with Multilingual Dictionary (MLD) of its namesake Web self-help software, which is said to improve searching for content for multinational audiences and to facilitate the ability of users to review, translate and post content in their native language."

Posted by Gwen at 06:48 PM

January 15, 2009

All new eLibrary

New eLibrary Interface Shows the Path for ProQuest by Barbara Quint, Newbreaks (Jan 15)

Detailed description of eLibrary - market, content, and new features.

"The eLibrary service (www.proquestk12.com/productinfo/elibrary.shtml) leads ProQuest’s outreach to the K–12 and community college market. " ... "A new interface platform launched by eLibrary throughout this year will introduce special features such as Smart Content and Content Creators that support the multimedia, multisource eLibrary content and user experience."

Of interest - there will be editors working to provide "best of" answers.
"The Smart Content feature, which Beach says was called the "aggretorial" internally, has teams of editors preparing overviews of the most-queried and most-studied topics, a "smart page" that provides top document and multimedia picks, plus suggestions for other research lines. Smart Content combines eLibrary’s 65 million documents with editorially prepared content. This means what users need first, they see first. The editors bring best-of material "above the fold" to provide comprehensive, foundational understanding of the topic as well as pathways for further exploration. For most-queried and most-studied topics searched, results sets will display a "smart page" that provides biographical, historical, and other contextual information, in addition to eLibrary editors’ top document and multimedia picks."

And with Content Creators, users will be able "to create customized web applications within the eLibrary system. "

Best of both worlds - editors organizing material, and users customizing personal views or collaborating.

Posted by Gwen at 02:47 PM

January 13, 2009

Semantics

Did Someone Just Expose Semantic Data?, Dr. Riza C Berkan, CEO, Hakia Blog (Jan 12)

CEO Dr. Riza C Berkan of Hakia takes issues with Marshall Kirkpatrick posting, Did Google Just Expose Semantic Data in Search Results?.

Google seems to be able to answer questions like capital city of oregon, or marlene dietrich's husband, possbly based on some analysis of language. Is Google figuring out meaning?

It could be (and likely is) a simple extraction.

These examples don't show deep understanding of a subject or any attempt to present possible meanings to the searchers.

Posted by Gwen at 01:25 AM

January 09, 2009

Relevance of Search Results

Google Tech Talk: Reconsidering Relevance by Daniel Tunkelang, The Noisy Channel (Jan 8, 2009)

Daniel Tunkelang, Chief Scientist at Endeca, has posted slides on a presentation on Reconsidering Relevance. "We’ve become complacent about relevance", he says. Perhaps search has become a kind of "fast food". We are too easily satisfied by the results we get from web search engines and don't appreciate that there is deeper and better content. Exploring information through tags and facets is one important method by which searchers can search (and learn) more effectively.

Posted by Gwen at 04:40 AM

December 16, 2008

Search in 2100

The future of search by Digvijay Lamba, Kosmix blog (Dec 11)

Opens with what search was like in 1900, compares to today, and looks forward to 2100.

Prediction is that "... we are clearly moving in a direction where a machine will automatically create the perfect article that precisely and completely covers the searched topic."

Of course, this is exactly what Kosmix tries to do.

Posted by Gwen at 06:05 PM

December 05, 2008

Abstracts of search results coming to Yahoo Search

Yahoo technology will offer abstracts of search results John Ribeiro (IDG News Service) via PCWorld (Dec 5)

This will be a dramatic improvement for Yahoo Search - "Yahoo India is developing information extraction technology that will offer abstracts of URLs when users do a search."

"The Bangalore lab is working in the area of automated information extraction, which involves going into the URLs, going through billions of pages, and extracting the relevant information, he said."

Posted by Gwen at 01:33 PM

November 22, 2008

Future of Semantic Web

How the Semantic Web Will Change Information Management: Three Predictions by Silver Oliver, FUMSI (Oct 2008)

Semantic Web means adding structure - making the web of unstructured data more accessible through explicit connections along the lines of Dublin Core metadata. Silver Oliver makes three predictions - the third points to a changing role for the information professional - probably in "modeling the domain of information we are dealing with".

Posted by Gwen at 03:03 PM

October 20, 2008

Google Does Understand

Google and the Real Search for Meaning on the Web By Saul Hansell, New York Times (July 17)

Some insight on how Google works at finding meaning is revealed (somewhat) by posts from Amit Singhal in Googl's search quality group. Essentially - Google can derive concepts from context.

Key post in the Google Blog - Technologies behind Google ranking

Posted by Gwen at 12:53 AM

October 10, 2008

Yahoo BOSS for Academics

Academics sink teeth into Yahoo search service by Stephen Shankland, Webware (Oct 10)

"... Yahoo, is trying to give a little more power back to the professors and grad students through a program called BOSS (Build Your Own Search Service). The service lets academics and start-ups build their own search sites around Yahoo's search engine for free, manipulating results however they want. "

Posted by Gwen at 02:41 PM

October 05, 2008

Search and data entities

How a Search Engine Might Add Related Information about People, Places, and Things into Search Results, Seo by the Sea (Sept 23)

Yahoo has filed a patent that shows interest (and maybe intention) in identifies "data entities" in search results and expanding on them.

"This kind of expansion of search results, to include names of people, places, events, and things found in a search for an original search query is described in a patent filing from Yahoo. While it doesn’t presently appear in use, it’s a possible approach from the search engine."

Posted by Gwen at 01:35 PM

Microsoft Norway

Microsoft shifts R&D for search engine technology to Norway news. domain-b (Sept 30)

Microsoft's announcement that search engine technology will be centered in Norway could be significant.

"Software giant Microsoft Corporation has decided to move its main centre for search engine technology to Norway. This was announced by Microsoft's Steve Ballmer in Oslo after a meeting with Prime Minister Jens Stoltenberg."

Norway is the home of Fast Search & Transfer, a specialty information technology company, which Microsoft recently acquired.

Posted by Gwen at 01:27 PM

September 20, 2008

Semantic Engines - Cognition and Eeggi

Two new semantic engines: Cognition and Eeggi , by Rafe Needleman, Webware (Setp 18)

Two new semantic search engine - Cognition (not entirely new - does have three demo applications and promotes CognitionSearch for the enterprise), and Eeggi , very early stages - has a small demo.

"Rather, they are databases and algorithms that hold the structure of language (in both cases, the English language). At the most basic level semantic engines tell you what's synonymous with what. At the advanced end of the spectrum they know how grammatically similar phrases like "take a seat," "take a stand," and "take a lollipop," mean completely different things. "

Posted by Gwen at 02:08 AM

September 16, 2008

SemantiFind

New SemantiFind Enhances Search Engine Experience, Newsbreaks (Sept 15)

"Semanti Corp., a web services provider offering "find" technology to enhance the results of search engines, announced its flagship product, SemantiFind (www.semantifind.com), at the recent DEMOfall08 conference in San Diego. SemantiFind is a web service that enhances the search engine experience by letting users indicate the precise meaning of their search queries."

Requires registration.

Posted by Gwen at 04:29 AM

July 24, 2008

Results are all Search 1.0

Search 3.0 - Web Search in Evolution by Yihong Ding, Alt Search Engines (Jul )

Distinguishes between types of links in search results according to whether they are 1.0 type resources (hard coded links to pages), 2.0 (tagged, threads, dynamic), 3.0 (???).

"Applying this criterion to evaluate the current Web search engines, we may surprisingly find that almost all of them belong to just Search 1.0, no matter how they have labeled themselves. Some search engines may have great records on its performance (such as Google), some search engines claim to be advanced in integrating new technologies (such as Hakia), or some search engines declare to bring revolutionary new experiences to the world (such as Powerset). "

Posted by Gwen at 02:02 PM

Semantic Search: Hakia v Google

If Google had Semantic Technology…by Dr. Riza C Berkan, Hakia blog (July 18th, 2008 )

Semantic technology for search such as used at Hakia is completely different from the link-based technology used by Google (and others).

In this posting Dr. Berkan of Hakia compares search results of Hakia to Google. "If Google had semantic technology", then why doesn't it do better on the query examples?

Main point: "We have no idea how Google’s algorithm works, and it does a great job in so many ways. But, one thing is clear. The results show no sign of systematic performance to understand the meaning of concepts. They don’t show ranking based on quality. They don’t show aspect categorization beyond statistical clustering. They don’t show question type detection."

Posted by Gwen at 01:33 PM

July 14, 2008

Semantic Search vs Links

SEO For Semantic Search Engines by Pierre Far, Search Engine Land (Jul 14)


Compares 3 semantic search engines to Google: Powerset, Hakia, and Cognition. As the writer notes, these engines are still in beta and very young. Still they shine in their own way and can "beat" Google on the search examples.

Conclusion: "One way or another, semantic search engines will be part of the future of search engines in terms of natural language queries and indexing. This is new to our industry and we have to sit up and pay attention. Failure to do so may mean that you will miss the next big thing."

Posted by Gwen at 10:13 PM

July 09, 2008

Yahoo Algorithms

Eric Enge Interviews Yahoo's Priyank Garg by Eric Enge, Stome Temple Consulting (July 7)

Some aspects of how Yahoo ranks results are demystified in this interview.

+ anchor text matters -- "What we look for are links that would be naturally useful to users in context, and that add to their experience browsing on the Web."

+ importance of links in ranking is declining -- "New sources of data and new features that Yahoo! has built and developed have made our ranking algorithm better. Consequently, as a percentage contribution to our ranking algorithm, links have been going down over time."

+ Yahoo evaluates the quality of the site and looks for signs of "spamminenss".

+ social media sites are being figured in - such as del.icio.us

+ Yahoo uses human editors to watch for new spam techniques. Claims -- "We show the least spam among the search engines, because both of our techniques are in action. Our spam detection techniques run on every page, every time we crawl it. Those detection algorithms are fed directly into our ranking function, where the spam detection is actually pretty high in importance." -- I'm not sure Yahoo has the least spam.

Posted by Gwen at 12:27 PM

June 14, 2008

How Google Does Blended Search

How Google Universal Search and Blended Results May Work Seo By The Sea (

A patent sheds some light on how Google can return "a mix of results from different types of searches, including Web pages, news stories, images, videos, book listings, and others"

Posted by Gwen at 03:46 PM

June 13, 2008

Dynamic Database Content

Indexing Dynamic Databased Content by Stephen Arnold, Beyond Search (April 20, 2008)

Considers the question of "deep web" content in response to Google's announcement that it will begin to index forms to extract and index information from dynamic databases.

Has some startling figures on the growth in use of dynamic databases for web site design -- "Today, more than half of the Web sites created each month are dynamic, and the ratio of static to dynamic sites is changing. I can’t reproduce the data I obtained from one of my clients, but I can highlight two facts. First, the number of dynamic sites is growing more rapidly than this time last year. At soime point in 2009, static sites will be in minority, essentially becoming brochureware that no one will pay much attention to. Second, the people operating dynamic sites want to protect their data from aggregators. Once structured data have been sucked out of a dynamic site, the value of the information decreases sharply."

Mentions BrightPlanet and Deep Web Technologies as two companies that can dig past subscriber logins and through query forms.

Posted by Gwen at 12:33 AM

June 07, 2008

Powerset for broad topics

May 2008 InfoTip: Powerset.com Mary Ellen Bates (May 2008)

See potential in using Powerset, the new semantic search tool that helps one make sense of search results by identifying the facts and creating summaries.

"PowerSet is best used for those searches that cover a number of topics or areas. It's not perfect, and it only searches Wikipedia, but I find it an exciting new approach in the efforts of search engines to make sense out of web content.

Matt Larkin has some comments on Powerset too - Smarter isn't better...yet Traffick.com (June 6)

"While it’s silly not to consider a search engine that “understands” us an exciting prospect, the effectiveness of existing methods makes me wonder if we “need” semantic search yet. Powerset claims it works best for research, for those not searching for specific items but instead seeking general information on a topic. Well, which types of users are most likely to use search for research and inductive gathering of information? Anyone in the educational field. The last time I checked, they have large internal databases through which they can gather boatloads of literature on their topics of study, be it government documents, journal articles or online writings. In other words, they’re doing just fine. Is there a demand yet for a smarter search?"

Posted by Gwen at 03:17 PM

June 05, 2008

What search engines won't index

Search Illustrated: Spider Traps Search Engine Land (Jun 3)

New infographic shows why a search engine won't be able to index a site. Session based coding and unfriendly SEO-CMS systems are two reasons that those pages are staying invisible.

Posted by Gwen at 12:53 AM

June 04, 2008

X Y Axis for Semantic Search

Semantic Search: The Myth and Reality by Alex Iskold, Read Write Web (May 29)

On a grid of Structured Data vs Query Complexity where to Google, SearchMonkey, Freebase, and the two semantic engines, Hakia and Powerset sit? Alex Iskold has done the analysis - fascinating reading that helps to place these tools.

Makes the point that we are misled by the user interface - the simple search box used nearly universally hides too much.

"Semantic search is an upcoming technology that has set the expectations way too high. We have all been misled into thinking that these technologies are here to dethrone Google by delivering better search results. Neither of those things are true. What is true, however is that semantic search is going to be big and it is going to help us answer questions that we simply cannot answer today - complex, inferencing queries asked over the entire web as if it was a database."

Posted by Gwen at 12:29 AM

May 30, 2008

EVRI - Another Way

Evri building a data graph of the Web by Dan Farber, Webware (May 28)

Evri may guide us through a new search experience for navigating through information resources graphically.

CEO Neil Roseman said, "We read sentences, extracting the subject, objects and verbs, and map to other content on the Web."

"Evri creates profile pages, which are like search results, that include a variety of lenses for an entity, such as top connections (entities most closely associated with the target entity), people, location, products, organizations, and events."

Evri may become available in beta in early summer.

Posted by Gwen at 03:19 PM

Meta Tag Use in Search

Meta-Tag Optimization Tips: A Search Usability Perspective by SHari Thurow, Search Engine Land (May 29)

Meta-tag content can matter for ranking results of non-web pages - and specifically video. And the meta descriptions are sometimes shown in the search results.

"Some commercial web search engines use meta-tag content to determine page relevancy. Some do not. Most of the time on a text-based document, meta-tag descriptions and keywords are not used to determine whether or not a page ranks." ... "Meta-tag keywords and descriptions become more important when the search engines are not able to determine (or have a difficult time determining) the "aboutness" of a file, such as a video file."

Article distinguishes between navigational searches (get to the site or a page - the url is important) and informational (get the answer immediately in the snippet).

Posted by Gwen at 03:06 PM

May 29, 2008

Inside Search Monkey

Making the Web Searchable: The Story of SearchMonkey by ALex Iskold, Read Write Web (May 29)

Notes from talk by Peter Mika on Yahoo's SearchMonkey search platform initiative.

"The motivating question for Mika's presentation was: How can we make web search better by leveraging web annotation? There are many kinds of annotations, but Mika focused on simple data and lightweight semantics, and began by reviewing the history and evolution of annotations to explain how we got to where we are today."

Posted by Gwen at 02:53 PM

May 22, 2008

Google's Search Quality

Google Offers Peek At How It Controls Search Quality by Eric Zeman, Information Week (May 21)

Udi Manber, VP of engineering at Google, Search Quality, has begun a series of postings on the Google Search Blog about the Search Quality team and its work (full posting).

This InformationWeek article has a few excerpts of the high points. It's not simple and Google works hard at it. We know that from articles that appear from time to time, and from the results.

The most famous part of our ranking algorithm is PageRank, an algorithm developed by Larry Page and Sergey Brin, who founded Google. PageRank is still in use today, but it is now a part of a much larger system. Other parts include language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes, and so on), query models (it's not just the language, it's how people use it today), time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time), and personalized models (not all people want the same thing).
Posted by Gwen at 12:55 PM

May 21, 2008

Search Engines - Related Words

How Search Engines May Substitute Other Search Terms for Yours, SEO by the Sea (May )

Search engines sometimes expand on your search terms by using some "related words". We see that at Google, Yahoo, Ask and Live. Bill Slawski explains some of the process and refers to a new patent filing by Yahoo on generating and using substitute words.

+ Use query logs to see reformulations of queries with those words.
+ use a dictionary
+ use statistics on "other phrases that tend to show up in documents with the original query".

Posted by Gwen at 01:13 PM

May 16, 2008

Yahoo's Search Monkey

Yahoo beckons coders to gussy up search results By Stephen Shankland, WebWare (May 15)

We'll start to see new search aids and applications thanks to Search Monkey. This is an "application foundation" - developers will build on Yahoo search results.

"The company will offer developer tools to let programmers start using SearchMonkey, technology to make search results more elaborate and, the company hopes, more useful. SearchMonkey lets programmers write applications that can turn dry textual listings in search results into a much more elaborate display, and Yahoo hopes its search business will benefit."

Of interest: "SearchMonkey also is interesting because it fits into the broader sweep of Internet history. Tim Berners-Lee, who initially developed the protocols behind the World Wide Web, has for more than a decade been advocating a move toward a more advanced sequel called the Semantic Web. SearchMonkey specifically takes advantage of Web site features designed to fulfill some of the promise of the Semantic Web."

Article explains microformats.

Related: Eric Enge interviews Yahoo's Andrew Tomkins Stone Temple Consulting (Apr 28)

"This interview expands upon the keynote presentation that Andrew gave at SES NY on the future of search. The presentation covered some very interesting ideas on how to improve the presentation of results within a search results page. The discussion relates to the initiative Yahoo referred to as "SearchMonkey"."

Posted by Gwen at 12:44 PM

May 14, 2008

Personalized Search - A possible method

A Personalized Search Using Advanced Search Operators SEO By the Sea (May 13)

"A personalized search method described in a Yahoo patent application published last week collects information about a searcher’s interests from their search history, their browsing history, and their interests listed in profiles from places like MySpace and other social networks."

Yahoo's interest in this worried the readers of this blog entry.

Posted by Gwen at 02:08 PM

How search engines spot duplicates

Search Illustrated: How A Search Engine Determines Duplicate Content, Eastern by Elliance, Search Engine Land (May 13)

Love this series. "This week's infographic shows how search engines make distinctions between original and duplicate content"

Posted by Gwen at 01:52 PM

May 12, 2008

Powerset Beta

Powerset brings the Semantic Web to Wikipedia By Dan Farber, Webware (May 11)

Powerset is now in public beta showing off its semantic search capabilities on Wikipedia.

"Amid speculation that Microsoft is looking to make an acquisition, Powerset launched a public beta of its Wikipedia search engine. It brings a new, rich semantic dimension via natural language query processing to Wikipedia that greatly improves the search and reading experience."

Longer term plan is to index and analyze 20 billion documents. Might it do that with Microsoft's help?

Posted by Gwen at 11:47 AM

April 30, 2008

WWW2008 Research Papers

Several entries from Search Engine Land on search technology that were submitted at the WWW2008 conference --

WWW2008: Search Research Paper Roundup - research papers from WWW2008, the 17th International World Wide Web Conference that concern image search at Google, local aspects of web search, using search history to identify relevant information sources, using query patterns, wisdom of crowds, tag-based social interest discovery, and several more.

Also Microsoft Paper: Improving Search Results By Mining Web Surfing Activity by Danny Sullivan - Summarizes anew research paper from Microsoft about how surfing behavior -- as logged by a search toolbar -- can be used to improve search results.

Yahoo Paper: Finding The Local "Center" Of Search Queries by Danny Sullivan -- "A new research paper from Yahoo and Cornell University -- with search legend Jon Kleinberg as one of the coauthors -- provides a fascinating look at how a search query such as "red sox" or "hurricane deal" can be centered around a physical location -- including one that changes over time."

Posted by Gwen at 03:18 AM

April 25, 2008

SearchMonkey will change results listings

Yahoo! Launches SearchMonkey Developer Tool in Limited Preview by Vanessa Fox, Search Engine Land (Apr 24)

SearchMonkey is a tool for developers and site owners to use to enhance the listings of their pages in Yahoo. Article describes two types and provides instructions on how to set these up.

- enhanced with extra links and navigation
- infobar with additional information

Will this improve quality of search results?

"How does this impact the future of the web and search? On first glance, it appears to be a strong move to advance the semantic web and the beginning of a whole new way to view search results beyond ten blue links. But at least for the short term, Yahoo! is taking things more slowly than that. Their plan seems to be to use the presentation applications that others create as a test of search quality compared to the current results. Will searchers choose to opt in to these enhanced listings? (With Google's Subscribed Links, that answer seems to be no.) Will the additional meta data from the semantic web be useful or spammy? This may be a test to find that out."

Posted by Gwen at 03:56 PM

April 15, 2008

About Powerset

Powerset: Don’t call us a search engine by Chris Morrison, Venture Beat (Apr 10)

Powerset could make 2008 a significant year in semantic search.

"It appears 2008 might well be shaping up to be the year that semantic technology kicks off: Semantic search engine Hakia has begun licensing its technology, the intelligent organizer Twine is readying for launch, and now natural language search engine Powerset is also considering a near-term launch, as TechCrunch recently noted."

It won't be a Google killer (no one is asking for that), but it could fill a gap and need.

"The answer is all in possibilities. Google is still the best way to hunt through vast numbers of silos (web pages) containing information when you’re looking for a specific fact. No new technology will seriously challenge that ability for a year or two, at least. But a technology like Powerset could short-circuit Google’s process by just giving you the damn fact, already instead of listing relevant websites."

Posted by Gwen at 01:29 AM

April 13, 2008

Google's Guidelines to Assessing Quality

The Secret Google Quality Raters’ Handbook Pandia (Apr 2)

Information about adjustments that Google makes in ranking results - what they deem as important on a page - was leaked in March. Pandia has a summary of the guidelines that are given to human editors in assessing quality of selected sites.

There are degrees of quality: vital (official site), useful (good and authoritative content), and relevant (but not as good as useful). Everything else not relevant or marked as for problems in content.

Google watches for "thin affiliates" - essentially spam -- "A thin affiliate is a site that gives you no original content and that only provides copied descriptions of products with affiliate links." If this is so, why does so much of this turn up? The article has a list of other types of spam and PPC that Google tries to winnow out.

Posted by Gwen at 07:50 PM

April 01, 2008

Blog about Federated Search

Federated Search Blog

Sol Lederman is the master blogger on Federated Search. He explains --

"This blog exists to serve the federated search industry. This Sol Ledermanincludes vendors, customers, and potential customers. While Deep Web Technologies (DWT) is sponsoring this blog, i.e. they are paying me to manage the blog and produce content for it regularly, don’t dismiss this blog as a marketing piece for Deep Web. My intent is to produce quality original content that educates all of us about the offerings of all federated search providers, addresses the issues and concerns of vendors and their customers, and keeps us all abreast of happenings in the industry."

This is one of the best looking / designed weblogs I've seen in a long time. It covers several interesting aspects of search: collaborative search, incremental results, deep web, verticals, federated search in libraries.

There is a sense of humour too such as in this April 1 2008 posting - Google to stop crawling the web: will federate it instead.

Posted by Gwen at 12:38 PM

March 31, 2008

Using Google Sets

How Google Sets Works SEO by the Sea (Mar 30)

Google Sets was developed in Google Labs a few years ago. It allows you to “automatically create sets of items from a few examples.”"

A newly published patent reveals how it works.

The simple explanation of how the program works is that Google attempts to identify lists on the web as it crawls pages. It may look for these lists by considering:

* HTML tags for unordered lists, ordered lists, definition list, headings
* Items placed in a table,
* Items separated by commas or semicolons,
* Items separated by tabs.
* Other ways.


One person commenting on this post also recommended Google Adwords Keyword tool for finding related terms.

I tried Google Sets on three Canadian women writers -- In the set of 15 there were 12 Canadian authors - not bad.

Posted by Gwen at 01:29 PM

March 29, 2008

Are Yahoo's Microformat the Future?

How Yahoo Could Avoid Microsoft - Part 1 Andrew Goodman, Traffick.com (March 28)

Written for search engine marketers but has some interesting musings the breakdown of Google's pagerank system and the possibilities of Yahoo's use of microformats (or open formats) for selective tagging. This becomes more important in the face of a deluge of user generated content (UGC).

"The reality of the massive growth in web content (most of it user-gen) is - something must change so that search engines work better with formatted, quality content, rather than their own proprietary, generic, semi-intuiting way of trying to sort out what's what. Google long ago broke with the majority of "troglodyte metadata" conventions, but nothing really solid has risen to take its place (Google Base is a failure). I see the new adoption of contemporary open formats by Yahoo as a big step in an evolution towards a more usable web, much more so than, say, the SiteMaps protocol."

Goodman promises more in Part 2 -- "I'll explain how startups like Mahalo are on the right track, but ultimately, utterly wrong. I'll talk about how Yahoo has it right, if they move forward in a certain direction. And I'll discuss their target audience and the potential that yes, they could still come back to be a credible alternative to Google in many markets. ..."

Posted by Gwen at 04:21 PM

March 27, 2008

Improving Search Results

Yahoo and the Future of Search by By Eric Enge, Search Engine Watch (Mar 26)

Yahoo is introducing ways by which webmasters can more fully describe websites that will be less subject to the abuse that rendered metatags nearly unusable.

"Some of these include:

* Microformats
* RDFa and eRDF markup
* OpenSearch
* Atom/RSS Feeds

Yahoo says this information won't be used to affect ranking results. Yahoo wants to use the information to provide a better search listing in their results. "

Methods that have been developed for vetting local search submissions could be applied to web sites in general.

"Trust-based systems play a critical role in that. Keyword meta tags may be dead as a ranking signal, but there's no reason why a search engine can't implement something new and more robust (such as an extension of the Microformats protocol) to allow the Webmaster to provide lots of information about their site."

Posted by Gwen at 01:09 PM

March 17, 2008

Ranking Results

Search Engine Ranking Factors V2 SEOmoz.org (2007)

Excellent reference for understanding the key factors in search results ranking. Lists top 10 positive factors, most controversial factors, and top 5 negative factors.

"This document represents the collective wisdom of 37 leaders in the world of organic search engine optimization. Together, they have voted on the various factors that are estimated to comprise Google's ranking algorithm (the method by which the search engine orders results). The result is a resource of incredible value - although not every one of the estimated 200+ ranking elements are included, it is my opinion that 90-95% of the knowledge required about Google's algorithm is contained below."

Posted by Gwen at 01:50 PM

March 02, 2008

Hakia and semantic search

A Chat with Hakia’s CEO Dr. Riza C. Berkan Natalya Murakhver, AltSearchEngines (Feb 29)

In this interview, Dr. Riza C. Berkan, Co-Founder & Chief Executive Officer of Hakia described semantic search:

"Semantic search introduces “understanding” where the algorithm analyzes both the Web page and the query to match and rank meaning. To give an example, if you are looking to find out the answer to the question, “What drug treats headache?,” you have to enter various combinations of these words to be able to search all relevant text, such as “drug, treat, headache” ; “drug treat migraine”; “drug help headache”; “Tylenol treat headache”: etc. You get the drift. When responding to the same question, semantic search can deliver a search result that states “aspirin helps migraine” where no words match but the concepts do."

Also mentioned Hakia's Galleries - "For short, discovery type queries, hakia brings categorized results (galleries) to offer a wide range of aspects of the search term."

More on Hakia, a meaning-based search engine at Pandia (AUg 2007)

Posted by Gwen at 04:51 PM

February 27, 2008

Yahoo Open Source Search

Yahoo Set to Open Search Engine to Third Parties Heather Havenstein, Computerworld via PC World (Feb 26)

"New open-source application programming interfaces will allow Web site owners to add information directly to the Yahoo Search results Web page."

Example given is of a restaurant adding information about itself that goes beyond the address and information snippet.

"Code-named "Search Monkey," the new open-source application programming interfaces (API) will allow Web site owners to add information such as ratings and reviews, images, deep links and other data directly to the Yahoo Search results Web page."


Google does something similar with Subscribed Links -- "allows users to create custom search results that users can add to their own Google search pages. Matt Cutts, a Google software engineer and head of Google's Webspam team, noted that Subscribed Links, which Google debuted in 2006, allows users to "display links to your services, answer questions, and calculate useful quantities and more."

Both seem to be mainly about local search for commercial businesses and services, but you could see schools, libraries, governments and social services making use of this.

Also see Yahoo Announces Open Search Platform TechCrunch

Has an example of " a screenshot of a different search, for “hillary clinton.” The New York Times has altered the result to include links to other election news, debate analysis, and added data for current delegate count and total money raised:"

Posted by Gwen at 12:19 PM

Berners-Lee and Web 3.0

Sir Tim Berners-Lee: Semantic Web is open for business, Paul Miller, ZDNet UK (Feb 26)

Update on the progress of the use of semantic web concepts in data integration. Links to a 20 minute podcast with Tim Berners-Lee.

"We spent some time (almost 15 minutes, from about 20 minutes in, for those listening along) talking about the ways in which data holders will gain benefits from their data being visible to a new generation of Semantic Web applications -"

Interesting comment by Berners-Lee about Web 2.0 and social networking sites -- "Now if you look at the social networking sites which, if you like, are traditional Web 2.0 social networking sites, they hoard this data. The business model appears to be, ‘We get the users to give us data and we reuse it to our benefit. We get the extra value."

Web 3.0 will change that -- "“Web 2.0 is a stovepipe system. It’s a set of stovepipes where each site has got its data and it’s not sharing it. What people are sometimes calling a Web 3.0 vision where you’ve got lots of different data out there on the Web and you’ve got lots of different applications, but they’re independent. A given application can use different data. An application can run on a desktop or in my browser, it’s my agent. It can access all the data, which I can use and everything’s much more seamless and much more powerful because you get this integration. The same application has access to data from all over the place.”

Posted by Gwen at 11:40 AM

February 26, 2008

Recommendation Systems

Rethinking Recommendation Engines by ALex Iskold, ReadWriteWeb (Feb 26)

Recommendation systems are often put forward as a form of personalizing search results or as an example of social search (recommendations from community). Alex Iskold identifies three types of systems (personal, social and fundamental) and the main difficulties with each.

Key -- "Building a recommendation engine is a complex endeavor, which we discussed here a year ago. But in addition to being a technical challenge, there are also fundamental psychological questions: do people want recommendations and if so, then when are they open to them? Perhaps an even bigger question is: what happens when the user receives one or more bad recommendations? How tolerant will they be?"

Posted by Gwen at 12:24 PM

How search engines work

How Search Really Works - an ongoing series by Ruud Hein at Search Engine People.

Short, pithy and illustrated descriptions of the under-the-hood operation of search engines. The Keyword Density Myth is especially informative.

Posted by Gwen at 12:17 PM

February 16, 2008

Semantic Web - Structure on the fly

11 Things To Know About Semantic Web Bernard Lunn, ReadWriteWeb (Feb 16)

Lunn gives a shape to the semantic web (aka Web 2.0) in this article.

His definition of Web 3.0 is “the combination of Web 2.0 mass collaboration with structured databases”. Instead of data-modelled relational databases, structure will be obtained "on the fly".

"Structure on the fly is done by people adding structure as they use the service and by engines that automatically create structure from unstructured content."

Vertical search will be the first place where semantic web will show: he called it the "pragmatist’s Semantic Web".

"Vertical Search businesses use whatever techniques they need - basic search engines, scrapers, APIs, human editors - to create some meaningful/useful structure in a single domain. Over time these cobbled together pragmatic solutions will be replaced by a semantic web platform, probably by an API that enables human editors to leverage their valuable domain expertise".

Stephen Abram sees this as "made for librarians' skills" - Semantic Web - Web 3.0, Stephen's Lighthouse

Posted by Gwen at 01:02 PM

February 09, 2008

Search Engine Re-Indexing Patterns

A three-year study on the freshness of Web search engine databases Lewandowski, Dirk, E-Prints in Library and Information Science (Jan 19, 2008)

This paper looked at the index freshness of the Google, Yahoo and MSN/Live, and found practices uneven and certainly not 100% fresh.

Study methods: "We conducted a test of the updates of 40 daily updated pages and 30 irregularly updated pages, respectively. We used data from a time span of six weeks in the years 2005, 2006, and 2007. "

Findings: "We found that the best search engine in terms of up-to-dateness changes over the years and that none of the engines has an ideal solution for index freshness. Frequency distributions for the pages’ ages are skewed, which means that search engines do differentiate between often- and seldom-updated pages. This is confirmed by the difference between the average ages of daily updated pages and our control group of pages. Indexing patterns are often irregular, and there seems to be no clear policy regarding when to revisit Web pages. A major problem identified in our research is the delay in making crawled pages available for searching, which differs from one engine to another."

Of interest:

Surmised from the findings that MSN updates its entire index within a certain time span, whereas Google updates those it considers important frequently and leaves the rest for later.

In 2007 Google had a mean of 14.8 days and median of 6, and MSN had 9.3 and 9.

Re indexing of the German Wikipedia, MSN showed a regular update pattern. Google suffered from a lag of 2 days between time crawled and time it showed in search results. Yahoo was erratic.

Concluded: "all search engines investigated have large shortcomings in updating their databases. None of the engines offers the ideal solution for the user (ie a comprehensive database of the Web that is updated according to the actual updates of the pages themselves). We found that none of the engines provides up-to-date copies even for the daily updated pages".

Posted by Gwen at 01:14 AM

February 08, 2008

Discovery Aids on the Web

From Keyword Search to Exploration: How Result Visualization Aids Discovery on the Web Kules, B., Wilson, M., Schraefel, M., Shneiderman, B., Juman Computer Interaction Lab, University of Maryland (February 2008)

Discusses ways to meet the needs of the searcher who is engaged in exploratory search, situations "in which users need to learn, discover, and understand novel or complex topics". Looks at the information retrieval models, use of classification and of data visualization - includes examples from tools in use on the Web.

"This monograph offers fresh ways to think about search-related cognitive processes and describes innovative design approaches to browsers and related tools. For instance, while key word search presents users with results for specific information (e.g., what is the capitol of Peru), other methods may let users see and explore the contexts of their requests for information (related or previous work, conflicting information), or the properties that associate groups of information assets (group legal decisions by lead attorney)."

Posted by Gwen at 02:26 PM

February 07, 2008

Indexing and PageRank

PageRank Is The Primary Google Search Ranking Factor Andy Beard, Niche Marketing (Feb

Long article with many comments that examines the importance of PageRank (or inBound links) in getting pages to show in search results.

Several revealing points.

+ "the toolbar PageRank has very little to do with rankings, and is manually manipulated based on Google's commercial goals." - so don't bother with the PR you see in the Google Toolbar.

+ There are many aspect to ranking through links: within a site, to a site, topical relevance of inbound links, popularity of the linking site, and others.

+ The point - "no page rank, no google juice - no index".

+ All content is not indexed, content that is indexed will be dropped as it gets older, links from old content won't keep it in the index. "To be in Google's index, pages really have to have a certain undefined amount of juice, no matter what other factors you gain merit for."

+ Pages that are deeper in the structure tend not to get indexed or are dropped from the index. A flat structure seems to be better.

"Drew if you divert juice into various archive pages, and keep them flat as if they are sitemaps, and also have an HTML sitemap, you can keep far more pages in the primary index."

+ To be found the page must be indexed, and to be indexed it must be linked to.

"Sure it is a little bit of a chicken and egg thing, but if you have a 750 word article and you want it to appear in the Google search results, the primary factor is that it receives enough juice to get in the index in the first place.

Other factors goven what it will rank for and how high in the results. PageRank can also affect that, at least possibly, but it is the only thing that is 100% required to appear in results, providing you have a document that can be addressed using some kind of URI."

+ Beard questions whether Google has really ended its practice of a supplemental index. Google introduced a supplemental index separate from the main one to store more "unusual" documents and would only show results when there there were only a few from the main (but supplemental also held duplicate conent). Supposedly Google has merged the two. - at least it doesn't show supplemental, but Beard's test raise some number questions.

+ John Honeck says that PageRank also determines how Google treats the pages -- "Crawling speed, indexing speed, updating frequency, and even the statistics that they display for a site in webmaster tools is determined by PageRank and not relevance or quality."

Lots to digest.

Posted by Gwen at 12:34 PM

January 21, 2008

Advancing Filtering

Advancing Advanced Search by Stephen Turbek, Boxes and Arrows (Jan 16)

Advanced search at search engines and sites has never worked well for a variety of reasons. But presenting filtering aids helps searchers refine a search for greater specificity. Presents some examples from product shopping searches. Excellent comments on the nature of search and the search interface follow.

Posted by Gwen at 08:14 PM

January 16, 2008

Extract Text From Images

Google Looks to Tech That Recognizes Text in Images Heather Havenstein, Computerworld via PCWorld (Jan 4)

Google has filed a patent for extracting text from images. This would help in indexing images of book pages, documents, product packaging, and much else.

"The search giant in June filed a patent application for technology that can recognize text in images. It could be used to retrieve text from video or from photographs that may show up as part of a street scene."

Posted by Gwen at 08:30 PM

December 26, 2007

Cloud Computing

The Two Flavors of Google by Stephen Baker, Business Week (Dec 13)

Describes cloud computing: Hadoop is open source, and MapReduce which is proprietary to Google..

"Why are search engines so fast? They farm out the job to multiple processors. Each task is a team effort, some of them involving hundreds, or even thousands, of computers working in concert. As more businesses and researchers shift complex data operations to clusters of computers known as clouds, the software that orchestrates that teamwork becomes increasingly vital. The state of the art is Google's in-house computing platform, known as MapReduce. But Google (GOOG) is keeping that gem in-house. An open-source version of MapReduce known as Hadoop is shaping up to become the industry standard."


A battle could be shaping up between the two leading software platforms for cloud computing, one proprietary and the other open-source

Posted by Gwen at 03:03 AM

December 23, 2007

Google's Search Practices

Google uses human evaluators to improve search results Pandia (Dec 20)

Pandia recaps the main points from an interview with Peter Norvig, director of research at Google published in the Technology Review. It clarifies several points:

+ Google is working to understand "concepts" but doesn't plan on "natural language" queries

+ Google has started to apply personalized search to News.

+ Google does employ humans to look at queries and results and make adjustments to the algorithms and possibly raise the ranking of a site that has the answer or to block spam.

Q&A: Peter Norvig - The evolution of Web search. by Kate Green, Technology Review (Jan/Feb 2008)

Posted by Gwen at 06:25 PM

December 19, 2007

Snippet Generation

How does Google Pick Snippets for Your Pages to Show in Search Results? SEO by the Sea (Dec 19)

Google's search results are comprised of a title, url, and a description. This is " A summary of the page in the form of a snippet or snippets, taken from either a meta description tag, or a description of the page from a directory like the DMOZ, or actual text from the page itself."

"A patent granted to Google today describes some of the process behind the choosing of text from a page to summarize the content of that page in relation to the keywords that it was found for in a search."

Some points: snippets need to be small, they may be pre-generated, keywords and surrounding text will be important, there is some weighting of possible snippets based on number of words or textual meaning.

Posted by Gwen at 06:41 PM

December 15, 2007

Future of Search 2010

Search 2010: Thoughts on the Future of Search by Leading Experts Enquiro Research (Dec 11, 2007)

"On December 11, 2007 leading experts on search met to discuss the future. In fact they met to share their thoughts on the future of Search in the year 2010." Webex Webinar - 65 minutes

Gord Hotchkiss was the host for panel discussion with several prominent search people from the main engines.

Participants:

Marissa Mayer - VP, Search Products and User Experience, Google
Larry Cornett - VP, Search Experience, Yahoo
Justin Osmer - Senior Product Manager, Live Search, Microsoft
Daniel Read – Senior VP of Site Product Management and User Experience, ASK
Jakob Nielsen - User Advocate and Principal of Nielsen Norman Group
Chris Sherman - Executive Editor, Search Engine Land
Greg Sterling - Founding Principal, Sterling Market Intelligence.

Projections:

+ more dramatic universal search / blended search. (Google and Microsoft). Ask says it will be about the "interface".

+ Yahoo - search engines will be able to understand "user intent".

+ Greg Sterling - personalization and more structure.

+ Jakob Nielsen - mobile phone

+ Compares a "discovery" search for the singer-songwriter Feist at each engine: Ask, Google, Live, Yahoo. All try to identify indent and provide "contextual relevance". Idea of disambiguation is being adopted by all engines. Even Google sometimes show words at the bottom of the page.

+ Marissa Mayer (Google) commented on improvements in creating snippets at Google.

+ Another test question - when is iphone coming to Canada? Ask was weakest for current material, other three were stronger. But all poor in the summaries.

+ Yahoo, through Search Assist, tries to have a conversation with the searcher.

+ Mayer doesn't think there will be a great breakthrough in figuring out user's intent in the next year.

+ Promise of personalization - "how much traction can we get in disambiguation through personalization?" Related Search at Ask (Daniel Read) begins to help user refine the search - combine this with patterns in use of verticals. May see payoff in three years. (Justin Osmer, Microsoft).

+ Local search will advance further through personalization - and benefit mobile.

+ How people are looking at pages - divide page into sections and dealing with them independently. E shaped scanning rather than F. More attention to the left than the right. Google uses the "golden triangle" to tightly hold to the margin on the left. But new challenges as images are introduced - images and videos are disruptive to the scanning. Ask found that when they interleaved video and images with web results saw some loss of ease of use and therefore moved to the three columns. Enquiro in its studies on the same query at Google and Ask found people spent the same amount of time.

+ How to put important marketing messages in front, and keep balance between sponsor results and organic results? Marissa Mayer says there is a sophisticated algorithms for selecting and ranking - and some new methods of presentation - richer content such as video and maps. Richer format ads are coming and higher relevance with more "brand relevance". Still there are dangers that people will be turned off. Nielsen advises targetting ads based on personalization elements or simple profiling.

+ How will community patterns influence results? Larry Cornett at Yahoo has several social services - finds that people will trust others. Gord asked "Will there be a merging of Facebook and search?" Greg Sterling - social search in a vertical may work better because of shared interests and values. Jakob Nielsen - social networks will be too small - need to be able to capture across a large number of people - Facebook etc won't cut it.

Posted by Gwen at 01:30 AM

December 04, 2007

Personalization scenarios

Google on Desktop Search and Personal Information Management SEO by the Sea (Dec 2)

Bill Slawksi sketches out some scenarios based on his readings of Google patents.

"You sit down at your computer, and start working on a document, and visiting the Web to find information.

A program on your computer considers the way that you move your mouse, and the speed at which you type, and recognizes you as one of the people who use that computer, and looks through your past computing sessions to see what kinds of things you are interested in, what web pages you may have visited, which documents you’ve printed, whether you prefer HTML or PDF documents when given a choice."

Posted by Gwen at 05:31 PM

November 30, 2007

Semantic Apps

10 Semantic Apps to Watch Richard McManus, Read / Write Web (Nov 29)

Highlights 10 semantic applications - they "all try to determine the meaning of text and other data, and then create connections for users". Describes this as being top-down - analyze the text, or bottom-up - embed meta-data.

Several are still in private testing. Hakia is one of the few search engines that while still in beta is open for use by the public.

Of particular interest - a Firefox extension named Gnosis from ClearForest.

"The Firefox extension is called Gnosis and it enables you to "identify the people, companies, organizations, geographies and products on the page you are viewing." With one click from the menu, a webpage you view via Gnosis is filled with various types of annotations. For example it recognizes Companies, Countries, Industry Terms, Organizations, People, Products and Technologies. Each word that Gnosis recognizes, gets colored according to the category."

Posted by Gwen at 03:38 AM

November 18, 2007

Sensebot Sample

Sensebot gets an upgrade Pandia (Nov 15)

Update on Sensebot - the engine that "takes results from Google, Yahoo! and [or] Live and summarizes them into one concise digest on the topic of your query."

There is a test page where you can choose the engine. Search results are in sentences.

Posted by Gwen at 05:50 PM

Text Mining with Power Text tools

Searching the web using text mining, Pandia (Nov 14)

"What if you could get a search engine to summarize all the information found for you?"

Power Text has two text mining search engines - iResearch Reporter and NewsFeed Researcher. Demo versions of both are available.

Pandia describes both and has some additional information about text mining.

Posted by Gwen at 05:41 PM

Different Kind of Personalization

Google and Personalization in Rankings by Bill Slawski, SEObytheSea (Nov 16)

Slawski detects a possible move to a behind the scenes move to personalization that operates not from search history or expressed likes and dislikes, but from overall searcher activity in selecting certain results.

"We often talk about the ranking of Web pages with terms like PageRank or relevancy, meaning how relevant terms on a page might be to a query used by a searcher.

Many patent filings coming from Google refer to statistical models, like a probabilistic model that can learn about how words are related to each other, and how pages might be similar. Those models might tell us something about searchers. "

Information would be obtained from "user query sessions".

Posted by Gwen at 12:37 AM

November 13, 2007

TrueKnowledge Video

TrueKnowledge Demos Its Semantic Search Engine Marshall Kirkpatrick, Read/Write Web (Nov 7)

Has a video that demos the new TrueKnowledge search. Search engine is still under wraps.

"TrueKnowledge combines natural language analysis, an internal knowledge base and external databases to offer immediate answers to various questions. Instead of just pointing you to web pages where the search engine believes it can find your answer, it will offer you an explicit answer and explain the reasoning patch by which that answer was arrived at. There's also an interesting looking API at the center of the product. "Direct answers to humans and machine questions" is the company's tagline."

TrueKnowledge will be inviting users to add to the "knowledge" by adding what they know. Video briefly describes the process. But any contribution from users invites the wikipedia (and Yahoo Answers) problem of authority. correctness, and worth.

Posted by Gwen at 02:26 PM

November 09, 2007

True Knowledge Natural Language

True Knowledge Launches Natural Language Search Engine Michael Arrington, TechCrunch (Nov 8)

The new True Knowledge from the UK "aims to give appropriate answers to natural language queries, even if key query terms are not included in the data being indexed. Current search engines are unable to return appropriate results for these queries."

True Knowledge is using structured databases - it isn't indexing the web. "Results can be returned based on inference of the intended meaning. So a question about if someone is married or not can be answered even if there is no specific structured data about that question."
34 comments »

Cambridge, UK-base

Posted by Gwen at 09:59 PM

November 01, 2007

Update on Quaero

The Quaero project - new European search technology Pandia (Oct 30)

Pandia will be running a series of articles about European search engines beginning with Quaero, being developed by the French.

"Quaero is to take part in this market [blogs, podcasts, multimedia], by developing technologies for finding, accessing, manipulating and processing multimedia and multilingual content."

Posted by Gwen at 12:29 AM

October 13, 2007

How Search Engines Operate

Rewriting the Beginner's Guide - Part I: How Search Engines Operate SEOMoz (Oct 10)

What a good series this will be - "For the next few weeks, my blog posts will primarily consist of re-authoring and re-building the Beginner's Guide to Search Engine Optimization, section by section". Starts with How Search Engines Operate.

Posted by Gwen at 12:04 AM

October 11, 2007

Using Advanced Commands

Web Searching with Advanced Commands Genie Tyburski, Virtual Chase (Oct 11)

"This article examines ... using advanced search commands to manipulate or improve search results."

Nice to have all these advanced commands in one place with examples from an expert on how to use them.

I'm not sure, however, that * will work well as a wildcard at Ask or Live. It does at Google and will at Yahoo when inside a phrase (eg "three * mice).

At Exalead you can use NEAR to get words within 16 words of each other, and specify the nearness of the words with NEAR/number eg NEAR/2.

Also, Live.com has a partial stemming capability in its new "related terms" feature - it can do a reasonable job of picking up the singular on a plural word (eg - markets and market), and sometimes finds combinations (health care and healthcare).

Websearchguide has comparison charts for these engines, and a section on the use of syntax.

Posted by Gwen at 01:45 PM

September 18, 2007

Powerset Labs

Powerset: Move Over, Google by Robert Hof, BusienssWeek (Sept 17, 2007)

Powerset, a new "natural language" engine that hopes to challenge Google, has set up Powerset Labs and is asking people to help improve its search before its launch in 2008.

Where Google and others do keyword search, Powerset will - "analyze the actual meaning of words and phrases that it indexes on the Web. It then will analyze the linguistic meaning of the query and find the best matches between the two—theoretically, at least, producing more meaningful results. "Our system reads every single sentence in every single document and extracts meaning from them," says Powerset Chief Executive Barney Pell. "

Natural language engines thrive on words - the more the better. Searchers will have to change their ways from the skimpy 1 to 3 word queries they use today.

View to the future: "Google executives have said that natural-language search could be years away from practical use and that linguistic analysis hasn't produced notably better results so far, which Powerset disputes. At the same time, there's little doubt Google's search wizards are examining the possibilities and are positioned to take swift advantage if the technology pans out. But even if Google isn't threatened by the competition anytime soon, it's clear the search game is far from over."

See demo about Powerlabs . The Powerset Labs demo site will use Wikipedia its database during the trial.

Posted by Gwen at 11:37 AM

September 07, 2007

Future of the Web and Search

10 Future Web Trends by Richard MacManus, Read / Write Web (Sep 6)

MacManus gives a time frame of 10 years for these changes, but I think some will be well developed in the next 5.

+ Semantic Web for making connections between blocks of information - has always been thought that will require metadata.

+ Artificial Intelligence for computers to do what humans do - especially in seeing patterns.

+ Virtual worlds - live in them, create them.

+ Mobile web and location aware devices.

+ Attention Economy - "personalized news, personalized search, alerts and recommendations to buy"

+ Web sites as web services - starting to see this in widgets.

+ Online video / tv - get the television programming you want.

+ Rich Internet Apps (RIA)

+ International web - China, Korea, India - surely growth areas but will they use US-based sites and services?

+ Personalization - more and more of this unless people fear for privacy and turn it off.

At SEOMoz, randfish added his thoughts on trends - Where are Search Engines Most Likely To Innovate? - more query intent detection, more use of social, more verticals but the search engine has to recommend it. Some of this is good - I'd like to see some figures on how much searchers use suggested phrases especially those based on a log of queries - I'd rather an engine that can make sense of pages that are returned in a search set.

Posted by Gwen at 12:24 PM

August 12, 2007

Search in 2010

Search In The Year 2010 by Gord Hotchkiss, Search engine land (Aug 10)

Hotchkiss brough together his dream team of 8 people for looking down the pipe to what search will be like in three years. Among them were Jakob Nielsen usablity guru; Marissa Mayer, Google VP for interface design; Michael Ferguson from Ask; Larry Cornett at Yahoo; Justin Osmer at Live search; Chris Sherman, Greg Sterling, and Danny Sullivan - all SE pundits.

Topics:

+ Search results page - maybe more mixed content.

+ Personal portal page - can results come back organized into a portal-like page? Will iGoogle be able to do that some day?

+ Social experience - much about Stumbleupon.

+ Personalization - Chris Sherman said, "I don't really see any kind of dramatic breakthrough on the horizon. I think as long as we’re limited to the current search form factor, if you will, where we’re encouraged to do the slot machine approach, where we punch in a few keywords, pull the lever and hope to hit the jackpot."

It's hard to do and some people will worry about privacy, but all the search engines are going to work on this anyway.

+ Usefulness as part of the algorithm - Maybe, if searchers will agree to indicate what is useful to them.

+ Contextual search - Chris Sherman challenges search engines to come up with "search by example"

+ Semantic search - none of the participants talked about linguistic analysis and meaning extraction. Instead there is the interesting idea from Mayer that results might be presented with different views - on a map, on a timeline - with models taken from what has been developed in local search.

+ Hands-on - replace advanced search with buttons and sliders. But Jakob Nielsen thought users would ignore those too, as they did with the sliders on MSN.

Nielsen observed, "The basic information foraging theory, which is, I think, the one theory that basically explains why the web is the way it is, says that people want to expend minimal effort to gain their benefits. And this is an evolutionary point that has come about because the people, or the creatures, who don’t exert themselves, are the ones most likely to survive when there are bad times or a crisis of some kind."

Interesting.

Posted by Gwen at 10:01 PM

Next Generation Search

The Ultimate Search EngineBy J. Nicholas Hoover, InformationWeek (August 4, 2007)

"Google, Microsoft, Yahoo and others are developing next-generation technologies that automate and personalize information search."

Nicholas Hoover is rather harsh in describing the effectiveness of search engines today suggesting that users must "dumb down their queries with the pidgin language understood by first-generation search engines", but he does provide a good overview of trends in search technology and design. Query making has always been working with words - thinking of them, combining them, getting into the mind of the writer. The main change today is that search engines are adding capabilities to understand the words and make some suggestions, or to group the results so that the searcher can try some new words. But there are other developments as I've noted below from the article.

"Search results will be more accurate and automatically summarized, with relevance determined by individual preferences. New methods of presentation such as clustering, tag clouds, graphical scales that widen or narrow searches based on parameters, and automated categorization will make it easier to navigate results. And search engines will be enhanced by human intelligence and the wisdom of crowds through tagging, social bookmarking, and shared searches."

+ Learning language - Hakia and Powerset are two that are applying linguistic analysis to interpret and analyze the question, content, and results. Autonomy and IBM are also adopting this.

+ Queryless search - where the search engine anticipates need based on what you are working on. Watson is one tool which watches in the background. StumbleUpon will use Web history to make recommendations as does the new Google Dice.

[Re Google Dice, see Searching without a query]

+ Personalization - iGoogle has personalized pages including a recommendation service based that is based on the user's search history. I've never found it to be useful, but there must be potential.

+ Social skills - essentially means getting answers from other Web users whether you know them or not such as through Yahoo Answers or any of the social bookmarking services.

Google has something like this now - "iGoogle, "magic tabs" present a menu of gadgets and feeds deemed relevant to a search query--the word "travel," for example--based on the tabs other Googlers have created".

Collarity identifies communities-of-interest and uses "collaborative filtering" for relevance ranking. It goes to some trouble to pick up suggestions from users it has identified as showing a deep interest in a subject.

+ Results oriented - cites engines that cluster results (Clusty) or otherwise categorizes (Endeca) and those that have smart answers (such as Windows Live being able to show a map). These are both very important, and the examples for smart answers should have included Ask.com.

+ Multifaceted - really multimedia and being better able to discern content from patterns rather than being limited to metatags and surrounding text.

Some of the developments and players mentioned in this article aren't new. Watson, as an example, has been trying for broad acceptance for some time.
Of interest is the update on Watson -- "Watson got a second life in MediaRiver's ClickSurge widgets, which determine important concepts on a Web page and embed relevant links elsewhere on the page." Most Web searchers do not want to download software, and many will be wary of gadgets and widgets too.

Nonetheless, this is a good description of the main vectors in play for improving web search, the query interface and the results.

Posted by Gwen at 12:37 PM

July 23, 2007

Semantic Web Musings

Powerset and hakia - Quest For The Semantic Web by Phil Butler, Read/Write Web (July 20)

Hakia, the meaning-based engine, and Powerset, which promises "semantic search", are quite different, explains Butler, in the index that is used, the processing, and the horsepower.

Of interest, Barney Pell, CEO of Powerset, said that Facebook is one of the key innovation of late and that it "will become one of the primary communications platforms of the future".

Butler opined that "Facebook is one heck of a representation of information for a social network. Essentially, hakia, Powerset, Facebook and others are bending the machines to engage humans. And in a way, Facebook is the semantic Web in a microcosm - but in it's infancy."

I think that is a stretch but perhaps it does depend on how important personalization will turn out to be.

Posted by Gwen at 05:11 PM

July 18, 2007

On Google's Mind

Google's Research Director Peter Norvig On 'The Future Of Search' by Greg Sterling, Search Engine Land (July 17)

Several excerpts from on an interview done with Peter Norvig, Google's director of research and published in the MIT Technology Review -- The Future of Search - The head of Google Research talks about his group's projects.

+ emphasis on machine translation and speech - will be useful for video search.
+ want to know more about the searcher's intentions.
+ want to be able to understand contents beyond word matching. Google does understand synonyms and place names but can't parse a sentence yet.

Posted by Gwen at 10:09 PM

July 11, 2007

Linguistic Search

New in the Demo Center -- Cognition Linguistic Search, EContent (July 11)

"Linguistic search technology employs a unique mix of linguistics and mathematical algorithms which has, in effect, "taught" the computer the meanings (or associated concepts) of nearly all the words and the frequent phrases within the common English language. Unlike all of the popular search engines in use today, which utilize mathematically-based pattern-matching technology (i.e., they search for a particular word pattern), CognitionSearch understands the meaning of words in context; in both the query and in the document base."

CognitionSearch looks for concepts in your query and identifies alternate meanings. You select from the meanings it finds in order to refine the search. It reminds me of Oingo of long ago. It's a bit too much work for the searcher - would be better if the engine could figure out the meaning from the context.

Posted by Gwen at 01:58 PM

July 09, 2007

Google Ranking Algorithms

Eric Enge interviews Udi Manber about Search Quality, Stone Temple (July 9)

Udi Manber is VP of Engineering at Google. He talked about Google's work to improve quality of search results through changes to the algorithms.

Manber explained, "we [Google] use more than a hundred different parameters. PageRank is still an important parameter, but it's just one parameter. And, there are all kinds of parameters, such as whether the word appears in the title and whether the two words are close together and all the obvious traditional information retrieval parameters. There are many others that we invented and there is the combination of all of them, which is really where the hard work is being done, figuring out when and how to put them all together, of course, all of which is being done in real time."

Posted by Gwen at 05:15 PM

Keeping Search History

People often repeat web searches by Greg Linden, Geeking with Greg (jul 7)

As many as 40% of queries are repeats of earlier searches to refind results or get new ones. If this is the case, then having tools to remember your searches and the results would be useful. This ties into personalization -- "A first and necessary step toward personalization is to start maintaining search and viewing history for each user. "

Posted by Gwen at 04:59 PM

July 08, 2007

Google Ranking

Proof Google is Using Behavioral Data in Rankings, SEOMoz.org (June 12)

Worth a careful read - two points of interest:

+ experiment by Visio "proved that Google was using the data from Google Analytics to improve the ranking algorithm". Google Analytics is a tool to use on a website to see where visitors come from and how they interact with a site.

+ Google's purchase of Feedburner will provide even more data to Google from the feeds to subscribers - "Obviously a site with 10,000 readers is going to have more authority than one with 100 readers! I would say it's a safe bet that this new data will eventually find its way into the ranking algorithms."

Posted by Gwen at 03:16 PM

July 06, 2007

Powerset Preview

Powerset Meets the Press, SEW Blog (Jun 29)

Powerset showed its natural langauge / semantic search technology to the press in San Francisco. People were impressed. Release is expected for September.

From Powerset: The natural language search mashup platform, by Dan Farber, ZDNet (Jun 28)

Steve Newcomb, COO, is quoted as saying, “Imagine a mashup between Facebook, Digg and Google Apps, but you get to participate in the building of the products that sit on top of our platform. You log into a social network, like you would Facebook, and you get certified to be a Powerlabber. Once certified you can join different interest groups, such as travel, and participate in idea and mashup competitions. QA is embedded and its all bloggable.”

Posted by Gwen at 01:12 PM

The ? in the URL

Search Engine Friendly URL’s explained, SEO (June 27)

The dynamically generated url is no longer a problem for search engines to index - "Google and the other top search engines have figured out how to do deal with dynamic URL’s. There are many sites that rank well using them. Having a “?” or “&” in your URL is not considered a negative or a positive by the search engine algos. What they do have a problem with is session ID’s. Do not use those."

Posted in SEO at 6:45 am

Posted by Gwen at 12:59 PM

July 04, 2007

Radar Networks and Semantic Web

What's next for the Internet By Michael V. Copeland, Business 2.0 via CNN Money (July 3)

Nova Spivack is working on making the semantic web real through his company Radar Networks. Here they are working on an artificial intelligence that will make connections between blocks and bits of information to create an order and reveal meaning.

"For Spivack, however, the semantic Web begins now with the data engine and user applications he and his team are prepping for launch -- and ends somewhere in the future with artificially intelligent software agents handling all the online drudgery of your business and professional life."

Radar will be launching "a sort of personal data organizer. It will allow you to bring in e-mail, contacts, photos, video, music --anything digital, really -- from anywhere on the Web, turn it into RDF, and access it in one place."

Posted by Gwen at 01:15 PM

July 03, 2007

Making Sense of Results

Sensebot summarizes search engine results on the fly, Pandia (June 28)

"Sensebot is a new search engine that takes results from Google and Yahoo! and summarizes them into one concise digest on the topic of your query."

Still in beta, but if it succeeds students and researchers will love it.

Posted by Gwen at 07:20 PM

May 30, 2007

Hakia CEO describes Semantic Search

Semantic Search: An antidote for poor relevancy, by Dr. Riza C. Berkan, Founder & CEO, hakia.com, at Read / Write Web (May 29)

Of course, Hakia.com search technology is a form of semantic search.

"The option of "Semantic Search Engine" has yet to be tested. My company hakia, along with others like Powerset, Cognition Search, and Lexxe are taking steps in this new direction. There are challenges with this approach as well. First and foremost, the knowledge of languages must be built in a structure that would allow a scalable and speedy search process. Building such resources is an expensive, tedious, and time consuming endeavor. Then, all the Web pages must be analyzed using this system to prepare for a retrieval platform; another time-consuming process. But when all of this is done properly, the users will start to experience something totally new. Let me emphasize the word "properly" here, which is an entirely new discussion point."

Posted by Gwen at 08:41 PM

May 27, 2007

Edu Domain

The .edu domain that generally represents universities and colleges (educational institutions) in the United States could become contaminated with spam. Rebecca at SEOMOZ.org suggests seems to answer yes to her own question - Will .edu Links Ever Lose Their Luster?. SEO people look for ways to post material to an edu domain (such as jobs), and to get links. The comments on this are also interesting with some feeling that edu and gov links offer a boost and another quoting Matt Cutts at Google who said definitely not.

Posted by Gwen at 02:33 PM

May 26, 2007

Looking to Web 3.0

A Smarter Web by John Borland, Technology Review (Mar/Apr 2007)

A view of the future - "The next wave of technologies might ultimately blend pared-down Semantic Web tools with Web 2.0's capacity for dynamic user-generated connections. It may include a dash of data mining, with computers automatically extracting patterns from the Net's hubbub of conversation. The technology will probably take years to fulfill its promise, but it will almost certainly make the Web easier to use."

Posted by Gwen at 11:23 AM

May 25, 2007

Stacking the Deck at Search Engines

Web Search Results: Something to Keep in Mind, ResourceShelf (May 25)

Refers to an article in Forbes - Google-Proof PR? (May 25) about a company called Reputation Defender that "pads the Web with friendly-sounding content like flattering blog entries, personal sites and other positive pages, and then pushes those sites to the top of the Google".

Gary Price at Resourceshelf points out that this people should be taught how to spot this kind of manipulation as part of information literacy. Indeed. He also has pointers on search strategies people can use to circumvent (somewhat): use advanced features, have more specific queries, use more than one tool and preferrably specialty tools.

Posted by Gwen at 01:16 PM

May 17, 2007

More on Google Universal

Will Universal Search Mean Universal Domination? , Eric Enge, Searchday (May 17)

Google's new universal results presentation is a complete integration from web, video, news, local and books. Enge points out that this requires a "relevance scoring system that would work on the same numerical scale across all of their properties." ... "The key thing that Google needed to do was to normalize these results, putting them all on a common scale. " ... "But once they succeeded in normalizing and extracting their relevance scoring systems, the rest was relatively easy."

Posted by Gwen at 11:05 PM

May 08, 2007

Themes to Search Today

Top 17 Search Innovations Outside of Google by Nitin Karandikar, Read / Write Web (May 7)

Excellent round-up of new search technologies roughly grouped into 4 categores - Query Pre-processing; Information Sources; Algorithm Improvement; Results Visualization and Post-processing. Has all the themes and excellent examples.

Posted by Gwen at 11:49 PM

Google Page Rank

What Is Google PageRank? A Guide For Searchers & Webmasters by Danny Sullivan, Search engine land (Apr 26)

The underlying concept to Page Rank is that links to a site are like votes, and some votes are more important than others. Google also uses other factors that involve matching on text to rank results - as Sullivan explains.

PageRank is not something the searcher can see unless using the Google Toolbar and with the ranking meter turned on. People don't turn it on for privacy reasons - Google is tracking what you do - but that may change as people opt for Web History tracking.

Sites in the Google Directory are sorted by PageRank - that's what the green bar is all about. Sadly, Sullivan says that the directory has not been updated with changes from the Open Directory Project for months (maybe years).

Sullivan presents proof that PageRank is not the most important factor in rankings. He also distinguishes between search rank (on the fly ranking) and toolbar ranking (periodic snapshot of the page).

Knowing the search rank could confuse searchers, but knowing the rank of a single page may help in assessing its quality.

"PageRank is one of many, many factors used to produce search rankings. Highlighting PageRank in search results doesn't help the searcher. That's because Google uses another system to show the most important pages for a particular search you do. It lists them in order of importance for what you searched on. Adding PageRank scores to search results would just confuse people. They'd wonder why pages with lower scores were outranking higher scored pages.

In contrast, if you're looking at a single page, such as when you are surfing the web, you no longer want the search ranking but rather an idea of how important or reputable that page might be. This is where PageRank makes more sense."

All in all, PageRank is mainly of interest to SEO specialists.

Posted by Gwen at 11:19 AM

Supplemental pages at Google

Why I Love the Google's Supplemental Index, Aaron Wall, SEO Book (May 5)

Supplemental pages that show in Google results are a great concern to search engine marketers. According to this post, Forbes says these are pages that Google "deems to be of low quality or designed to appear artificially high in search results" - and so Google doesn't index them. But Matt Cutts at Google says no - these pages are supplemental because of PageRank - presumably that they don't have sufficient links to them. Aaron Wall in this post posits that it has to do with duplicate content.

Whatever the cause, looks like searchers can ignore those results most of the time.

Danny Sullivan wrote about this last January - "Basically, the supplemental index is a way for Google to hit less important pages in specific instances when it can't find matches in the main index. Trying to search against tens of billions of pages all at once is time consuming and expensive. Far easier to hit just the "best of the web," exactly as Inktomi used to do -- and for exactly the same reasons. But it's a continuing reminder that Google can't do it all. No matter how great those machines are, they have to divide up that index. The "best of the web" might still be tens of billions of pages, but divisions still raise concerns."

From January 2007 Update On Google Indexing & Ranking Issues Search engine land (Jan 11, 2007)

Posted by Gwen at 10:59 AM

May 05, 2007

The Bias in Search

Is Relevance Relevant? Market, Science, and War: Discourses of Search Engine Quality by Elizabeth Van Couvering, Department of Media and Communications, London School of Economics, Journal of Computer-Mediated Communication, 12(3), article 6.

Ouch - search engines are not unbiased or neutral in ranking results, says this writer, who interviewed senior management in several search engine companies between November 2002 to May 2004 .

"The evidence presented here suggests that resources in search engine development are overwhelmingly allocated on the basis of market factors or scientific/technological concerns. Fairness and representativeness, core elements of the journalists' definition of quality media content, are not key determiners of search engine quality in the minds of search engine producers. Rather, alternative standards of quality, such as customer satisfaction and relevance, mean that tactics to silence or promote certain websites or site owners (such as blacklisting, whitelisting, and index "cleaning") are seen as unproblematic."

Also see John Battelle for comments - Search Paper: Is Relevance Relevant?

Posted by Gwen at 11:10 AM

April 27, 2007

Page Rank at Google

What Is Google PageRank? A Guide For Searchers & Webmasters, by Danny Sullivan, Searchengineland (Apr 26)

Probably the definitive explanation of page rank to date - "Let's start with how PageRank is used by Google for searchers. First and foremost, it is one of many factors used for ranking pages. You can't see PageRank when you search (ordinarily, that is. further below I'll explain how you CAN see it), but behind the scenes, it helps in part to decide if a page will show up in the top search results or not."

Also explains page rank for the Directory ( Google uses Open Directory (ODP) but is slow to update for updates, and ODP itself is poorly maintained.)

Main message - "PageRank is one of many, many factors used to produce search rankings. Highlighting PageRank in search results doesn't help the searcher. That's because Google uses another system to show the most important pages for a particular search you do. It lists them in order of importance for what you searched on. Adding PageRank scores to search results would just confuse people. They'd wonder why pages with lower scores were outranking higher scored pages."

Posted by Gwen at 11:53 PM

April 26, 2007

Autonomy Pattern Matching

Autonomy To Reclaim Blinkx, Then Spin It Off, Danny Sullivan, Searchengineland (Apr 25)

"Autonomy is to exercise an option to take over Blinkx, then appears to be spinning some consumer-facing search technology that its owns (and I believe Blinkx was licensing) into an independent company Blinkx, that will go public in London."

Interesting review of the history of Blinkx and of Autonomy with some discussion of textual analysis technology - or "meaning-based search". Refers to a page at Autonomy that describes the IDOL Server and very broadly the technology - "Autonomy's strength lies in advanced pattern-matching techniques (non-linear adaptive digital signal processing), rooted in the theories of Bayesian Inference and Claude Shannon's Principles of Information, that enable identification of the patterns that naturally occur in text, based on the usage and frequency of words or terms that correspond to specific concepts."

Posted by Gwen at 10:57 AM

April 22, 2007

Google Bombing

Google Declares Stephen Colbert As Greatest Living American, Danny Sullivan, Search engine land (Apr 20)

Google changed its ranking algorithms to prevent pages being ranked high because of terms in the anchor text linking to those pages - or so it was thought. This was to prevent the "miserable failure" bomb against President George Bush (and others). But Stephen Colbert has managed to have himself show as the greatest living american . Sullivan explains this as "I suspect the answer will be that the link bomb fix Google uses is more sophisticated than just looking to see if the words people are using in links, when a lot of links suddenly point at a page, actually appear on a page." - which doesn't help much. Yahoo, Live, and Ask didn't give Colbert the top spot, and Ask no spot at all.

Posted by Gwen at 12:24 PM

April 17, 2007

Will Google Categorize?

Google Categories Prototype, Google Blogscoped (Apr 16) - screenshot of Google categorizing (or at least grouping) results - but it was a fleeting glance.

Posted by Gwen at 02:01 PM

April 12, 2007

Power of Links at Google

The power of links – non-indexed pages out-ranking optimised ones. Search Engine War (UK) (Apr 10)

We get a glimpse of Google's ranking algorithm in this posting - "...when a page is linking to another that is blocked by the robots.txt file Google opts to display the link text from the linking page as the result title, which when you think about it is actually quite a serious flaw in their treatment of the robots.txt protocol ... Google is showing priority to the decision of the linker to link, over the content owner who wants it excluded."

Posted by Gwen at 01:14 PM

April 10, 2007

Semantic Web Technologies Coming

The semantic web - the next upgrade to the web. Neal Goldman, CEO of Inform, speaks to BusinessWeek Online about semantic web technologies and what they may mean for the Web. Semantic search technology analyzes the use of words in text to make it possible to understand "conceptually" what you want. It links words and phrases together and makes more connections in order to fill out meaning. Personalization, as an added component, will use what is known about your interests to further refine results. The process is a blend of human direction to train the algorithms for word connections, and machine learning.

Watch the BusinessWeek video on The Semantic Web. (April 6)

Neal Goldman speaking about semantic web technologies

This is part of a special report at Business Week - CEO Guide to Technology.

Taming the World Wide Web - A rising tide of companies are tapping Semantic Web technologies to unearth hard-to-find connections between disparate pieces of online data, Rachel King

Some semantic technologies are in use today for special applications in order to make linkages between data. As the article states, "Those tools are the stuff of the Semantic Web, a method of tagging online information so it can be better understood in relation to other data—even if it's tucked away in some faraway corporate database or software program. Today's prominent search tools are adept at quickly identifying and serving up reams of online information, though not at showing how it all fits together. "When you get down to it, you have to know whatever keyword the person used, or you're never going to find it," says Dave McComb, president of consulting firm Semantic Arts."

ZoomInfo is an example of a semantic search engine. "The engine automatically crawls publicly available business information—from corporate Web sites to press releases and electronic news services to SEC filings—adding semantic tags and organizing information so that it can be easily found later."

Article makes the point that we shouldn't call this Web 3.0 as if it is a software release. But there is a progression from what people are doing with tagging today. "In many user-generated sites grouped under Web 2.0, users often tag their own data, be it photos, bookmarks, videos, or other content. "Web 2.0 is the messy way that the Semantic Web is actually happening," says O'Reilly."

Business Week also has a slide show for Weaving a Web Around Technology and a podcast.

Posted by Gwen at 12:31 PM

April 07, 2007

Meta Tags and Search Engines

Do you really need meta tags? You bet by Jennifer Slegg, (Apr 4)

Of interest - "Google sometimes uses the description you place in the meta description tag as the snippet when certain criteria is met. This sometimes includes site search or keyword searches when keyword(s) that are searched for also are contained within the meta description."

Posted by Gwen at 12:01 AM

April 03, 2007

Cognition Search - Breakthrough

Cognition Launches New Linguistic Search Engine by Barbara Quint, Newsbreaks (Apr 2)

Seems to be a breakthrough on linguistic search with Cognition.

"Cognition Technologies (www.cognition.com) has launched CognitionSearch, a linguistic search engine that supports ontology, morphology, and synonymy, tapping one of the world's largest computational dictionaries. Initially, the company will market a vertical enterprise service for legal litigation support and for life science and health research. It also offers an open Web service (www.cognitionsearch.com) to demonstrate the technology as applied to MEDLINE and PubMed content, to judicial and legislative sources, and to political blog content."

"In the current launch of the CognitionSearch open Web service, the company selected three subject areas to showcase and demonstrate the technology: health (MEDLINE, PubMed, etc.), legal (U.S. Supreme Court cases, a million Enron emails, etc.), and politics (key political blogs)."

Posted by Gwen at 12:19 PM

March 27, 2007

Google Will "Surface" Information

Google and the deep web by Greg Linden (Mar 23)

Through papers Google has released recently, Greg Linden has gleaned much about Google's intentions on whether to index structured data in order to reach into the "deep web" (aka invisible web).

There is specific mention of the "content that lies hidden behind queryable HTML forms", ie dynamically generated answers. Google sees generating queries on those databases based on the user's key words, and - possibly - anticipating these by "surfacing" information beforehand and adding that to the Google index.

In the followup posting - The end of federated search? (Mar 24) Linden concludes that Google will not do federated search (a meta search of other search engines such as those powering specific databases), but opt for copying what it can.

Federated search would require a "virtual schema" with the domains mapped into a common view. That's not going to happen.

From the quote: "The third limitation is our reliance on structured queries. Since queries on the web are typically sets of keywords, the first step in the reformulation will be to identify the relevant domain(s) of a query and then mapping the keywords in the query to the fields of the virtual schema for that domain. This is a hard problem that we refer to as query routing."

A9.com, as Linden explained, was an attempt at large scale federated searching, and presumably it has failed. And even though its creator, Udi Manher is now with Google, Google seems to prefer the surfacing approach, probably for performance reasons.

What does this mean for searchers? We're not going to see a search service that can automatically direct us to the best resource and then exploit the structure and organization of that resource to deliver answers. We will get clues from Google, but not the in-depth search that is required for digging into the deep web. Searchers - it is still up to you to find the resources and learn how to use them.

Addendum: There is an excellent discussion of federated search and metasearch in the comments to the End of Federated Search

Comments on the value of metasearch (the type that fuses results from different but similar search engines) - "On the other hand, with metasearch, each search engine is working across the same corpus, and the whole point is that duplicate content is a good thing. The more often that independent search engines retrieve the same document, the higher our confidence is that the document is truly relevant."

Posted by Gwen at 12:46 PM

February 10, 2007

Meaning Search

Powerset Aims to Leapfrog Google by By David Needle, Internet News (Feb 9)

In this article about Powerset and its work on a natural-language search engine we get an example on what will happen with a consumer question.

"Bobrow gave a consumer example of how the Powerset service works. When someone types in "Who was Spielberg married to before Kate Capshaw" Google and others give results related to the movie director Steven Spielberg and actress Kate Capshaw.

"Google doesn't give you the answer, Amy Irving, because it's not part of the question. What you really want is the answer, not hundreds or thousands of links. We give you the answer." "

Hakia is another natural-language processing search engine. It had an answer for the Spielberg question -- "The top of its results page said: "You are very curious today. Spielberg was once married to Amy Irving and is now married to Kate Capshaw." That was followed by links to pages related to Spielberg."

Posted by Gwen at 03:23 PM

February 09, 2007

Chaos Ahead with Personalization

Personalized Search - The Feature No one is Asking For, Graywolf's SEO Blog (Feb 8)

Here's a scary thought, as more search engines apply proprietary personalizing routines to selecting ranking results, how do you work with a customer or a student online with a search query - they may get different results from you, and as you move to another computer, results may be different again. How do you help the customer or teach the student?

Michael Gray gets it - "I’ve never met a single person who’s said “wow searching for something at home gives me different results than searching for something at work, that’s not confusing at all, in fact I think that’s an improvement, why can’t my calculator work like that”."

But the search engines don't seem to understand the chaos they will create by using personal factors to rank results. There will have to be a way to turn off these off.

Posted by Gwen at 05:34 PM

Natural language search

In a Search Refinement, a Chance to Rival Google, Miguel Helft, New York Times (Feb 9)

At PARC scientists have been working on natural language search technologies. PARC is licensing that technology to others.

"The start-up, Powerset, is licensing PARC’s “natural language” technology — the art of making computers understand and process languages like English or French. Powerset hopes the technology will be the basis of a new search engine that allows users to type queries in plain English, rather than using keywords."

But it will be tough. Powerset doesn't expect to release an engine until late 2007. Many doubt that it will be possible to get an engine to answer real questions such as "what companies did I.B.M. acquire in the last five years". Marissa Mayer, Google’s vice president for search and user experience, was quoted as saying: “Natural language is really hard. I don’t think it will happen in the next five years.”

Posted by Gwen at 01:58 PM

February 03, 2007

How Search Might Evolve

Evolution of a Search Engine by Philipp Lensen, Google Blogscoped (Feb 2)

Very interesting article by Lensen on how search discovery might develop into delivering "knowledge" answers, providing personalized content, and ultimately performing some analysis. Google is the engine under study, but Ask.com may have some of the "knowledge" capabilities today.

"Right now, to answer your queries, Google quotes from the web, and orders the quotes in a list. In the future, Google may combine these quotes into a free-style text for a more direct answer. When the Google AI advances beyond that, it may analyze the texts available to it to come up with conclusions of its own."

Posted by Gwen at 02:11 PM

January 29, 2007

Future of Search

Future of Search: The European View, Frank Watson, SEW blog (Jan 24)

Richard Firminger, Director of Northern European Sales for Yahoo Search Marketing sees moves to integrated results from various sources including social search, and natural language search.

"From a single search we will soon be able to receive answers incorporating text (sponsored and algorithmic), video, images and even human knowledge – the latter coming from social search products like Yahoo! Answers, ...."

"Additionally, natural language, or Semantic Search – which enables users to pose queries as a properly phrased question, not with a couple of words – may come to the fore."

Posted by Gwen at 01:57 PM

January 22, 2007

Search Scent in Web Design

Search Scent in the Search Engines by Kevin Lee, Clickz (Jan 19)

Searchers will be interested in what's on the minds of search-engine marketers to improve the searcher's experience (and sell the product). PARC scientists posited the idea of that Web users pick up an "information scent" when navigating between sites; Lee converts that idea to a "search scent" that searchers have for a particular piece of information, a scent that advertisers and site designers can plant.

"Search scent is an extension of the information scent concept, initially developed by scientists at Xerox Palo Alto Research Center (PARC). Information scent centers on the how users navigate the Web, both within sites and from one site to the next while pursuing information on a specific topic. The research illustrates that humans forage for information on the Internet in much the same way animals follow scent and visual cues to find food. Scent is essentially an application of user interface optimization best practices, and search scent is a specific niche based on the fact searchers are even more wedded to a particular information-gathering mission than surfers or casual browsers."

Posted by Gwen at 01:36 PM

January 13, 2007

Eye-tracking at Microsoft

Eye Tracking in MSN Search, Search Engine Land (Jan 12)

Microsoft may be using eye-tracking methods to assess effectiveness of snippets for results. "Adding information to result snippets significantly improved performance for informational tasks but degraded performance for navigational tasks."

Live Search is going to have to do more than this to improve its worth as a search engine.

Posted by Gwen at 02:15 AM

December 24, 2006

Autonomy for LoC

U.S. LIBRARY OF CONGRESS SELECTS AUTONOMY FOR ITS ENHANCED WEBSITE SEARCH FEATURES
( Dec. 14, 2006)

... "U.S. Library of Congress has selected Autonomy's enterprise search infrastructure platform to offer enhanced search features on several of its websites, including Thomas and the Legislative Information System" Features are "framework for organizing and managing the legislative information, providing multiple guided navigation paths, and new flexibility in searching.."

Posted by Gwen at 02:57 AM

December 17, 2006

Trending to Search 2.0

Search 2.0 - What's Next? Written by Emre Sokullu and edited by Richard MacManus, Read/Write (Dec 13)

Good end-of-year article for looking at trends in search: user interface (sees promise in Snap and the new Live.com), technology (clustering, natural language), and vertical engines.

Comments by others worth a quick browse.

Posted by Gwen at 06:39 PM

November 26, 2006

Meta-tags Back

Revenge of the meta-tag!, SEOMOz.org (Nov 17) - those metatags and descriptions can be important in getting indexed at all.

Posted by Gwen at 03:09 PM

November 22, 2006

SiteMap for News Sites

Google adds indexing tools for News portal - "Publishers and webmasters will use a site map to indicate the articles they want Google News to index" By Juan Carlos Perez and Mike Barton, InfoWorld (Nov 21)

"This means that publishers and webmasters will be able to specify through a site map the articles they want Google News to index. A site map is a file that webmasters and publishers put on their sites to guide search engines' automated Web crawlers in properly indexing their Web pages."

Posted by Gwen at 03:37 PM

November 16, 2006

Sitemaps help crawlers

Google, Yahoo, Microsoft partner on open source search protocol "Rivals team on how sites are indexed, easing the game for webmasters, improving search results for users" By Juan Carlos Perez, IDG News Service (November 15, 2006)

An "open source, Sitemap Protocol based on XML (Extensible Markup Language)" could improve indexing of web sites significantly and make some of the "invisible" web visible. Webmasters would create a sitemap that will guide the Web crawlers to index areas of the site. These "site maps are particularly useful in highlighting to crawlers the dynamic Web content that is served up on the fly." In the end, crawlers will be able to do deeper indexing.

[Added Nov 19] - Search Engines Unite On Unified Sitemaps System by Danny Sullivan, SEW Blog (Nov 16) - has the complete press release and some comments.

From the press release:

"Las Vegas, November 16, 2006 - In the first joint and open initiative to improve the Web crawl process for search engines, Google, Yahoo! and Microsoft today announced support for Sitemaps 0.90 (www.sitemaps.org), a free and easy way for webmasters to notify search engines about their websites and be indexed more comprehensively and efficiently, resulting in better representation in search indices. For users, Sitemaps enables higher quality, fresher search results. An initiative initially driven by Yahoo! and Google, Sitemaps builds upon the pioneering Sitemaps 0.84, released by Google in June of 2005, which is now being adopted by Yahoo! and Microsoft to offer a single protocol to enhance Web crawling efforts."
Posted by Gwen at 11:30 AM

October 27, 2006

Google Bombing

Republicans hit in 'Google bombing' by Tom Zeller, New York Times via IHT (Oct 26)

"Fifty or so other Republican candidates have also been made targets in a sophisticated "Google bombing" campaign intended to game the search engine's ranking algorithms. By flooding the Web with references to the candidates and repeatedly cross-linking to specific articles and sites on the Web, it is possible to take advantage of Google's formula and force those articles to the top of the list of search results."

Posted by Gwen at 01:49 PM

October 22, 2006

Personalizing Search

Making Search More Relevant, By Bruce Clay, Search Engine Guide - October 18, 2006

"In recognition of these limitations, search engines are constantly innovating to make search more relevant. Some are providing a means to personalize your search results with shared knowledge, some are experimenting with a new and different results page, and others want to improve relevance with the human touch."

Posted by Gwen at 02:32 PM

October 17, 2006

Ranking Search Results Getting Personal

How Can Search Engines Rank Results? Let Bill Count The Ways by Danny Sullivan, SEW Blog (Oct 16) There are over 100 factors a search engine could consider in ranking results. For Sullivan the main takeaway is that "... we are moving further into that world ... where not everyone will see the same search results for the same query." Essentially, ranking search results is getting personal.

Points to an excellent article - 20 Ways Search Engines May Rerank Search Results - by Bill Slawski , SEO by the Sea (Oct 14). Article describes ways that results may be re-ranked after the basics of matching on terms and link analysis.

Posted by Gwen at 02:11 PM

September 28, 2006

The Search Interface

Why Search Sucks & You Won't Fix It The Way You Think, Danny Sullivan, Daggle (Sep 19) - screenshot tour of search interfaces since the early days of Altavista with substantial coverage of design efforts to cluster results and some on the information visualization efforts at Kartoo, Grokker, and Ujiko. What works? People prefer simple keyword entry and results display. One person commented -- "Search is a verbal process for the most part, so effective display of results is going to be blocks or columns of text. In my experience, anything else is just annoying."

Posted by Gwen at 03:30 PM

Personalizing Search

Potential of web search personalization, Geeking with Greg (Sep 27) - Summarizes points from KDD 2006 paper, "A Large-Scale Analysis of Query Logs for Assessing Personalization Opportunities", by Steve Wedig and Omid Madani from Yahoo Research.

Basically there are two ways - "using a searcher's short-term history to change search results, which they call "adjustment", and modifying searcher results using a profile built from their long-term history, which they refer to as "personalization""

Either way, search personalization is something we will be seeing more of, but I fear we won't know when it is being applied. When search engines adopt this, it would be nice if they had a turn on and off button.

Posted by Gwen at 03:18 PM

August 22, 2006

Peter Morville Webcast

Peter Morville, author of Ambient Findability, spoke to the Library of Congress on July 20, 2006. The presentation is available in this webcast. Runs for 45 minutes.

"Peter Morville, widely recognized as a founding father of information architecture, discussed his recent book, "Ambient Findability," in a program sponsored by the Science, Technology and Business Division. Morville describes Ambient Findability as a safari of how people search for information and how they now find their way through a world of information overload. His previous book, which he co-authored with Louis Rosenfeld, "Information Architecture" was named "Best Internet Book of 1998." Morville's work has been featured in many publications including Business Week, The Economist, Fortune, MSNBC and The Wall Street Journal. He blogs at findability.org."

Posted by Gwen at 08:07 PM

AI for Search

Spying an intelligent search engine by Stephanie Olsen, CNet (Aug 18)

"While most would agree that Google has set the current standard for Web search, some technologists say even better tools are on the horizon thanks to advances in artificial intelligence."

Medstory applies AI techniques to healthcare. "Rappaport [CEO] won't disclose the secret sauce of the company's technology; however, he said, it's a 24/7 process in computing that connects valuable pieces of information together, such as linking one document that explains symptoms of a disease to another document with analysis of a therapeutic drug for that disease."

AI is also being used at Riya to find photos by matching on characteristics - density, patterns, colours.

Posted by Gwen at 07:14 PM

August 19, 2006

Understanding the Searcher

Microsoft Researchers Inventing New Techniques to Improve Search Engine Accuracy and Relevance --
Papers presented at the 2006 SIGIR conference describe new techniques for analyzing rich patterns of user interactions with search to improve the overall search experience. (Aug 7)

Improving search seem to lie in understanding and anticipating the searcher.

"“Most search engines today use a somewhat two-dimensional approach, matching user queries with the content and link structure of Web pages to return a list of results,” said Eugene Agichtein, a researcher in the Text Mining, Search and Navigation Group within Microsoft Research. “We’re looking at how to add a third dimension — the users themselves — to improve the search experience. By examining click-through and browsing patterns across a large number of users, we are able to learn a great deal about how people interact with search technologies and can thereby improve our accuracy dramatically.”"

Posted by Gwen at 12:32 PM

How Search Engines Work

Web Search Engines: Part 1 and Part 2 , by David Hawking, Computer - How Things Work (June 2006)

"In this two-part series, we go behind the scenes and explain how this data processing "miracle" is possible. We focus on whole-of-Web search but note that enterprise search tools and portal search interfaces use many of the same data structures and algorithms."

"Part 1 of this two-part series (How Things Work, June 2006, pp. 86-88) described search engine infrastructure and algorithms for crawling the Web. Part 2 reviews the algorithms and data structures required to index 400 terabytes of Web page text and deliver high-quality results in response to hundreds of millions of queries each day."

Posted by Gwen at 12:29 PM

May 25, 2006

Semantic Web Part 2

Researchers look to semantic Web to drive Internet -- "Computer scientists discuss ideas for organizing the Internet's growing mass of data" By Jeremy Kirk, IDG News Service via Infoworld (May 24)

The idea of the semantic web is still strong, although it won't be accomplished by adding keywords to metatags. Labelling is still needed, but people have new hopes on how that might be done.

"Labeling information on the Internet involves tagging it with code and then classifying it into a taxonomy. Customized taxonomies and ontologies, or data models, could be created for different subject matters to connect disparate, rich information tucked away on servers.

It's an approach that differs vastly from current search engine technology, which may be able to find all instances of a keyword and rank a document's popularity but not interpret the context. "

I, for one, am not holding my breath in expectation that such taxonomies will be developed and used.

Posted by Gwen at 02:36 PM

April 18, 2006

User Behaviour and Search Results at Google

User Behaviour and Google Site Profiles By Jim Hedger, Search Engine Guide - April 17, 2006

There is a sense that Google is relying less on link analysis for ranking results and more on what it figures out about user behaviour and preferences.

"The term “user behaviours” describes any number of actions taken by people while using a Google branded search tool, while visiting a particular site in Google’s index, and while moving from site to site or document to document.

Basically, Google wants to know what its users like and dislike. Those user-judgements have become important factors in how Google ranks sites in its index and in personalized search results shown to registered users. "

Posted by Gwen at 05:46 PM

April 13, 2006

Voice Command Search

Google Talks the Talk for Search by Ben Charny, eWeek (Apr 12) - Will we be able to ask Google questions by voice, talking to our computers or through a cell phone? "Google co-founder Sergey Brin and three others on April 11 were granted a U.S. patent for technology to let the human voice command Internet search engines."

Posted by Gwen at 02:49 PM

March 11, 2006

Infonortics Search Engine Conference

Search Engine Meeting Caters To Serious Seekers by David Gardner, Information Week (Mar 9) The Infonortics Search Engine Conference will be held in Boston from April 24-25.

"The heart of the conference, Collier said, is still centered on new and offbeat developments, some of which are likely to become mainstream in the future, and he noted that Google's participation this year won't deflect from the conference's main objective of getting search freethinkers and pioneers together."

Posted by Gwen at 04:08 PM

February 13, 2006

Towards Niche

A Search Engine For Every Subject. "Google and Yahoo rule, but a flock of upstarts is offering new ways to find info". Business Week ONline (Feb 20)

There has been a boom in startups that are seeking to change search from one-box one-million hits. Instead people might use specialized engines.

"Instead, people may use several different search engines, each tailored to a specific task. One might specialize in blog postings, another in video clips, and a third in general information. The shift may look like the evolution of TV, from a world dominated by the Big Three networks to one in which hundreds of cable channels specialize in topics from cooking to history. "People are looking for targeted, specific information that search engines can't provide," says Michael Yang, CEO of Become.com, a search engine focused on Internet shopping."

"Social search" is one angle. But niche engines that have a narrow focus is another that is being used for shopping, real estate, health, and several others.

Posted by Gwen at 10:11 AM

February 10, 2006

New Search Engines

Cos. Tackle Online Searches at Conference by Matthew Fordahl, AP Via Yahoo News (Feb 9) DEMO Tech Conference saw three new search engines analyse content.

- Plum "lets users group Web pages, e-mail, music, pictures and files from their desktop computers into online collections that can be kept private or made public for others to find."

- Kaboodle for shopping

- Riya for searching photos.

Posted by Gwen at 01:04 PM

February 03, 2006

Trends in technology for 2006

Search engines to be key technology in 2006: Report by Jack Kapica, Globe Technology (Feb 1)

Deloitte's Technology, Media and Telecommunications Predictions 2006 sees search as being an increasingly important technology as digital content increases.

"The reason for the rising importance of search engines is the increase of the volume of digital content on-line — as much as 20 billion gigabytes in 2006 alone. Search tools will be needed to sift through such a volume of data. Searches will also extend to include data held on devices such as PCs, mobile phones, digital cameras and personal video recorders."

Expect changes -- "Technology will change people's behaviour, in the same way MP3 players now enable owners to carry their entire music collection wherever they go, game consoles created a new leisure category, and mobile devices and broadband connectivity have made working at home a reality, Deloitte says."

Posted by Gwen at 02:19 AM

January 18, 2006

Quaero - could be a long wait

Europe's 'Google killer' goes into hiding -- Project to launch a European search engine imposes 'news blackout' to avoid scrutiny -- By James Niccolai, IDG News Service (Jan 13)

Thomson didn't like all the news coverage surrounding its search engine Quaero and has shut down the web site. "It was unclear how far the work has progressed, but it seems unlikely that users will be searching the Web with Quaero any time soon. The participants are still determining how they will divide up and manage the various parts of the project, according to one source. And Waibel suggested that some of the language technologies he is working on may be years away."

Posted by Gwen at 02:22 AM

January 12, 2006

To Search for Quaero

Quaero as a new search engine being developed in Europe gets press, but what is there to see? This project to provide multimedia search solutions is being billed as a challenge-to-be to Google. But there is nothing to see at the moment. The Quaero home page is blocked with a password signin (presume Thomson, the owner, will remove that), and the technology is still under wraps. One thing - it will need to buy a domain. Quaero.com is already in use.

Quaero, the European Developed Multimedia Engine, Gets Press Attention - overview from Gary Price, Search Engine Watch Blog (Jan 11)

European Tech Giants Craft Search Engine by Angela Charlton, AP via Washington Post [registration] (Jan 11) - expresses some doubt about ultimate success -- "Quaero is the latest in a string of largely French-led efforts to compete with America's dominance of the global marketplace, a theme of Chirac's foreign policy."

Posted by Gwen at 02:56 PM

January 06, 2006

Human Component to Search

Search is About Communication Aaron Wall, SEO Book (Jan 6)

Suggests that one good way to improve search results is to get more information from other people whether intentionally or as part of the system. States that, "Many of the major search and internet related companies are looking toward communication to help solve their problems. They make bank off the network effect by being the network or being able to leverage network knowledge better than the other companies." There are several examples given for the major search engines showing this direction. The technical apparatus in place now for ranking may break down under weight of size of the web and spamming. Sites will thrive if they can build relationships with people.

Posted by Gwen at 02:34 PM

December 23, 2005

Google explains ranking

Google's first Newsletter for Librarians was published on Dec 19 on the topic of How does Google collect and rank results?. Nothing new here but the explanation is clear and would help new searchers understand the principles.

Posted by Gwen at 02:55 AM

December 20, 2005

Information Extraction

Information Extraction: Distilling Structured Data from Unstructured Text by Andrew McCallum, University of Massachusetts, Amherst, ACM Queue vol. 3, no. 9 - November 2005 -- describes information extraction techniques.

"Information extraction ... is the process of filling the fields and records of a database from unstructured or loosely formatted text. Thus (as shown in figure 1), it can be seen as a precursor to data mining: Information extraction populates a database from unstructured or loosely structured text; data mining then discovers patterns in that database. Information extraction involves five major subtasks (which are also illustrated in figure 2):"

Articles includes some examples such as ZoomInfo.com for extracting information about people from Web sources, CiteSeer.org for citation information from academic papers, FlipDog.com for job openings.

Comments on the accuracy of automated extraction, and looks to future developments. Concludes that methods for information extraction will be critical in being able to access what we need in an ever growing mass of data.

Posted by Gwen at 02:30 AM

November 17, 2005

Swickis for hire

Eurekster Introduces Swickis - Community-Powered Search Engines for Personal and Small-Business Websites; Swickis Are a Powerful New Way to Improve Search Relevance and Advertising Revenue by Harnessing the Knowledge of Online Communities, Business Wire via Marketwatch (Nov 16)

Eurekster has developed a new search engine it calls a Swicki to be used on individual web sites. "Swickis automatically learn from search behavior, without collecting or identifying individual user information, to deliver content and advertising that is highly relevant and valuable to a specific community. "

"Publishers are invited to create their own swickis -- free of charge -- with the Eurekster SwickiBuilder at http://swicki.eurekster.com, and can opt to share in the search-related advertising revenue, a feature that will be available soon."

Posted by Gwen at 11:16 AM

Canada's TerroGate

It's the Google of police tools "Canadian experts invent search engine to find, track down terrorists" by Sarah Staples, The Ottawa Citizen (Nov 17) [Thanks to LT for this story.]

Defence R&D Canada has developed a new search engine called Terrogate for tracking down references to terrorism in documents. At present this works on documents that have been collected, but is to be rolled out to analyze web content and eventually real-time news feeds. The algorithms work with the "vocabulary of terrorism" on five main themes: terrorist tactics, weapons, locations, targets, groups and individuals. Researchers identified 3,000 terms that are exclusively related to terror.

"TerroGate melds two emerging search trends. An "entity extraction" component sifts through documents tagging relevant words for easy retrieval. And the system is one of a handful in the world capable of performing "conceptual" searches, which don't merely hunt for keywords the way Google or Yahoo do, but also notions more vaguely associated with the keyword."

The software grew out of a project by the University of Sheffield, in England, on "entity extraction" done in the mid 1990s.

There are two commercial systems - "AeroText, by a subsidiary of Lockheed Martin, and ThingFinder, by Inxight Software, Inc., which is used by the U.S. Defence Department and the U.S. army -- but they only annotate generic proper or place names in a document."

Plans for TerroGate include:

- "incorporating link analysis software that analyses relationships between references to terrorism in different documents."
- web crawlers
- displaying results in map form.
- languages other than English

Posted by Gwen at 10:41 AM

November 15, 2005

Google's Dream

The Google Story: An Excerpt "Chapter 26: Googling Your Genes" Washington Post (Nov 14) [subscription] - Excerpt from the Google Story by David A Vise and Mark Malseed - goes on sale on Nov 15.

Reveals that Google founders, Sergey Brin and Larry Page, hope to " empower millions of individuals and scientists with information that will lead to healthier and smarter living through the prevention and cure of a wide range of diseases". Specifically describes a project involving biological and genetic research.

Posted by Gwen at 07:15 PM

November 09, 2005

Morville on Findability

Ambient Findability: Libraries at the Crossroads of Ubiquitous Computing and the Internet By Peter Morville, Online (Nov / Dec)

Peter Morville, author of Information Architecture for the World Wide Web, has a new book - Ambient Findability.

"I envision a future of ambient findability in which we can find anyone or anything from anywhere at anytime. At the heart of this brave new world is a library, or rather a multitude of libraries, that help us find what we need, whether the objects sought (and the libraries themselves) are physical, digital, or in between."

Posted by Gwen at 10:48 AM

October 31, 2005

Autonomy Interprets Web

Autonomy's Consumer Division Announces Creation of Conceptual Index of World Wide Web Press Release (Oct 25) - This would be something to see - "brings next generation retrieval features to the web, including conceptual clustering, implicit query, video search and Autonomy's unique Automatic Query Guidance (AQG). Autonomy's AQG automatically returns categories of results based on the meaning of the query, providing an easy navigation facility directing users to the results they require based on a conceptual and contextual understanding of their query." But doesn't look like this will be public - it's intended for enterprises.

Posted by Gwen at 01:13 PM

October 18, 2005

Quintura in the wings

Quintura Search - Pandia Search (Oct ) Quintura promises "revolutionary web search software". Software will use dynamic clusterization and semantic maps.

Posted by Gwen at 12:58 PM

September 19, 2005

Look Ahead

Surfwax Offers Look-Ahead Technology for Web Sites Gary Price, SearchDay (Sept 19) -- "Today SurfWax is introducing a dynamic query suggestion tool that can be easily installed and customized on any web site." Price says that "Technology like this has the potential to save a user a large amount of time and aggravation by helping create a more focused and precise query, thereby getting better results. It can also help when a searcher enters general terms when they're looking for something specific."

Posted by Gwen at 04:28 PM

September 15, 2005

Semantic Web

A Beautiful, Networked World? Sap Info (Aug 29)

"In a conversation with Andreas Blumauer, project manager at the Semantic Web School in Vienna, SAP INFO online illustrates how far the vision of Berners-Lee, the founder of the Web, has already taken form in the real world and the actual advantages of the Semantic Web of the future – far from any technological infatuation."

Posted by Gwen at 01:55 AM

September 03, 2005

Smarter Search Soon

On the Frontier of Search by Terry McCarthy, Time magazine (Aug 28) -- Predicts a future where search engines are "smarter and more tailored to the individual, embrace video and music--and be accessible from any device with a chip."


+ Singingfish for image and video is mainstream. But Viisage will recognize faces.
+ Cell phone search facilities to find local services as you walk down the street and even give information on an object you've just pointed the camera-phone at (from Mobot).
+ KnowItAll for getting answers.
+ More tagging and finding through tagging.
+ Blinkx.TV for tracking down video clips.
+ Satellite online maps - Google, MSN, A9.
+ Personalized starting with Findory for news, and now adopted in the new Google desktop.

Posted by Gwen at 08:27 PM

August 29, 2005

Autonomy moves on China

Autonomy positions itself for content wars By Maija Palmer, FT.com (Aug 28) -- Autonomy, noted for its technology for handling unstructured content, is working with the Chinese to create a service for searching news and video. Prior to this Autonomy had been working with Blinkx for video search.

"Where Google and Yahoo rely on having video clips manually catalogued and tagged so that they can be searched using key words, Autonomy uses voice recognition software – also used by the US Department of Homeland Security to eavesdrop on terrorists – which automatically catalogues every spoken word in hundreds of thousands of hours of footage."

Posted by Gwen at 04:30 AM

August 22, 2005

Search Everywhere

Friday Book Excerpt: More on Perfect Search by John Battelle's SearchBlog (Aug 19) - preview of final chapter of book about search - Search Everywhere.

Posted by Gwen at 06:56 PM

Masters of Information

E-Gang - Eight Masters Of Information Edited By Elizabeth Corcoran, Forbes (Aug 18) - This seventh E-Gang review by Forbes presents "the Masters of Information--those entrepreneurs and companies figuring out how to separate the gold from the gravel on the Web."

+ Barry Diller, IAC/InterActiveCorp which now owns Ask Jeeves.
+ Caterina Fake and Stewart Butterfield - created Flickr for sharing photos
+ Jeffery Jonas - IBM Entity Analytics
+ Ellen Siminoff - Efficient Frontier for picking keywords for online ads.
+ Peter Norvig - Google's director of search quality
+ Jimmy Wales - father of Wikipedia. It has 2.2 million articles in 100 languages.

Entry about Peter Norvig mentions clustering - "Now Google's statisticians develop algorithms that look at how closely one query links to another and how groups of queries interact. Studying word "clusters"helps determine whether a search term like "Blondie" means the comic strip or the punk-pop band from the 1980s. Norvig's crew also aims to accelerate results by learning which irrelevant words (like "like") to discard when indexing a Web page."

Posted by Gwen at 06:14 PM

August 19, 2005

Deep web through custom data extraction

Diving deep into the Web by Michael Bazeley, Mercury News (Aug 17)

At Glenbrook Networks, the Komissarchik father and daughter team are developing a search engine that will do "custom data extraction" from databases that standard search engines can't touch.

"Komissarchik and her father, Edward Komissarchik, say they have figured out how to analyze the forms on Web pages and understand the type of information the sites are looking for. Then, Glenbrook's Web crawlers use artificial intelligence to walk themselves through sometimes complex Web forms, answering questions, such as the location of their desired job, in the same way a human would."

Posted by Gwen at 12:08 PM

June 30, 2005

Search Labs and Blog

What's Cooking in Search Engine Labs by Chris Sherman, SearchDay (Jun 30) - Lists the various labs at the major search engines and blogs that discuss developments in search features. Google, Microsoft, Yahoo, Ask Jeeves are here as well as CiteSeer for computer scientists in academia and, at the opposite end, Shopzilla's Robozilla for developments in shopping.

Posted by Gwen at 08:00 PM

June 24, 2005

Louis Monier at Google

Louis Monier On Why He's Going To Google by John Battelle (June 24) - Louis Monier was one of great search minds at Altavista when it was the best. After four years at eBay, he is moving again, this time to Google. In explaining the move he said, "So rather than chewing on variations of e-commerce for the next few years, I'm very tempted to play with radically new stuff: satellites images, machine translation, ways to extract knowledge from giant bodies of data ... who knows what else? " This might give us some hints about what Google will be doing.

Posted by Gwen at 04:27 PM

June 23, 2005

New Ranking at MSN?

MSN Search and Learning to Rank by Greg Linden, Geeking with Greg (June 21) - translates into layman's language a paper written by Microsoft's Chris Burges (and others) about using neural networks in relevance ranking - as it seems Microsoft intends to do.

Also see Danny Sullivan's comments in MSN Search Gets Neural Net/RankNet Technology & (Potentially) Awesome New Search Commands. MSN Search may have adopted new ranking algorithms to improve its search but on the search that Sullivan ran Google, Yahoo and Ask Jeeves were just as good.

Posted by Gwen at 01:46 PM

June 15, 2005

Hierarchies - In or Out?

Google's War on Hierarchy, and the Death of Hierarchical Folders
by John Hiler, Microcontent blog (May 10)

Finds that hierarchical organization of information (subject trees or taxonomies) is under attack by the believers of keyword searching and in particular Google. Google, Hiler, finds is anti-hierarchical - witness the lack of folders in GMail and Google Desktop Search.

Article reviews the history of web hierarchies starting with Yahoo Directory, Looksmart and the Open Project Directory. Google's page-ranking algorithm based on linkages vastly improved relevance (at least for a time), and people left the directories in droves to use Google (tho we should remember that Altavista was a strong search engine then too). In March 2004 Google sidelined its use of ODP and, according to Hiler, killed directories.

"As Google's Director of Search Wuality put it, "We analyzed what people were using, and [directories] become less popular over time. As the web grows, directory structures get harder [for consumers] to use.""

The last is an interesting statement. I don't think directory structures get harder to use at all, and in a world of unmediated search results, some classification is an aid for providing context. However, it is true that manual classification is very labour intensive.

Hiler reviews the history of folders used for organizing email in Outlook, Hotmail and other web mail programs. And then came Google's GMail where the bins have been reduced just an inbox and an archive (tho you can add labels). You keyword search for "conversations".

Desktop is the third area. People could relate to the filing cabinet metaphor but who could find the right drawer? Desktop search from Google, MSN and others make it much easier to search across folders - in fact to ignore folders. Hiler, doesn't mention though, that you might wish to restrict the indexing to specific folders.

Article concludes -- "But Folders rarely solve the core problem that they address - and often create new ones, like forcing you to create new folders just to manage new information. Solutions like Search, Archives, Stars and Labels get more directly at the core problem... and promise that the future of information management will look very different from its past."

Posted by Gwen at 06:30 PM

June 13, 2005

Looking for intelligent search

Enough Keyword Searches. Just Answer My Question by James Fallows, New York Times (June 12). James Fallows finds - "Search engines are so powerful. And they are so pathetically weak." He describes the difficulty of determining the right keywords to find information on changes in California's spending on its schools - and of "trying to outguess the engines". How much better it would be to use something like Aquaint, a project by US federal intelligence bodies, that will handle ""advanced question answering for intelligence".

Fallows mentions two engines whose added features he does appreciate - Ask Jeeves for broadening and narrowing results and offering suggestions, and Vivisimo for categorizing, Grokker for visual presentation, and his favourite Mr Sapo "because it allows quick, easy comparisons of the results of the same search on virtually any major engine."

While I appreciate his frustration with word guessing, this would be an occasion for bringing in an information professional who knows how and where to look for statistical and specialized databases, free and for-fee. Using Google or any other general purpose search engine with or without search aids would only find some bits and pieces on this question.

In the article he also endorses Roboform for handling the myriad of usernames and passwords. And mentions that Google Map's satellite views of some places in the US are camouflaged: the vice-president's compound in Washington DC ( though not the White House,) and downtown Albany.

Posted by Gwen at 11:24 AM

June 03, 2005

Longhorn Interface

Longhorn goes beyond search By Rafe Needleman, CNet Reviews (June 1) - Advance look a the user interface for Longhorn, the next Windows operating system. Describes folders, search, tags, and visual aids and mentions that many of these are available as add-ons today.

Sees a future of -- "I'm betting that contextual and audio/visual searching can't be far behind. And at some point in the future, we'll be able to search for documents on our hard disk "about rent" without having to match search terms, or direct our system to find pictures of Grandma given just one picture of her, or find orchestral-sounding music given a sample of it. Whether Microsoft ships these tools first is an open bet; but I'd wager that this is what Google, Yahoo, Apple, and other search companies will try to do to stay ahead."

Also, the next generation of tools will have to be able to handle "digital assets" in general - on mobile devices, online services, digital media - not just the home computer.

Posted by Gwen at 09:47 AM

May 24, 2005

Search Clutter

SearchTHIS: Clutter, Relevancy, & Search - by Kevin Ryan, IMediaConnection (May 10) - there is so much going on in search with new services for multimedia, new and varied applications for social groups, and smart answers, (to name a few) that Ryan asks -- "With all of this activity, ... Just how thick with relevant results can a search engine results page become before relevancy gives way to clutter? How will the searching public react to all of these changes? History has taught us that clutter equals disaster in search, and we might just have to take a breath before integrating everything but the kitchen sink into search."

Posted by Gwen at 03:50 PM

May 12, 2005

AI in Search

If Search Engines Could Read Your Mind by Chris Sherman, Searchday (May 11) - Artificial Intelligence is almost here for search. Sherman tips us off to 20Q.net, a program that asks 20 questions about an object you think of and can often guess the object. It's based on neural networking as Sherman explains.

"To a certain degree, search engines already employ similar systems. Just as 20Q.net starts out with broad questions (is it animal, mineral, or vegetable) to "prune the tree" of possible branches, search engines do the same thing with the few clues offered by your search terms, eliminating thousands or millions of possibilities before even considering possible matches."

Posted by Gwen at 12:53 AM

May 05, 2005

Infonortics Search Engine Meeting 2005

Presentations from the 2005 Search Engine Meeting are Now Available Online - listing of presentations from the Infonortics Search Engine Conference held in April 2005. This is always an excellent conference with analysis and a view to the future. Gary Price has picked out the presentation most on search. Full listing is at Search Engine Meeting 2005.

Posted by Gwen at 02:10 PM

April 30, 2005

Stuff I've Seen

Search Engine Watch Forums has a threaded discussion about the future of search and indexing as visualized by the Microsoft Research project, Stuff I've Seen (SIS). In particular, it points to a presentation by Susan Dumais delivered to the Infonortics search engine conference.

http://www.infonortics.com/searchengines/sh05/slides/dumais.pdf (April 2005)

Posted by Gwen at 05:52 PM

April 28, 2005

Overload

PC Users Drowning in Data, Microsoft Says - Ted Bridis, AP in Globe and Mail. (Apr 26)

"Computer storage technology is getting so cheap a person could record every conversation of a lifetime and decades of photographs, but experts must improve search systems so users can make sense of such mind-boggling amounts of information, Microsoft's top research executive said Tuesday."

Posted by Gwen at 06:01 AM

April 14, 2005

Search Algorithms

Search Engine Algorithms & Research By Christine Churchill, SearchDay (April 14) -- a peek into the ways the algorithms work to rank results.

+ Ask Jeeves / Teoma: ""Lahiri confirmed that Ask Jeeves looks at the web as a graph and looks at the link relationships between them, attempting to map clusters of related information. By breaking down the web into different communities of information, Ask Jeeves can rely on the "knowledge" from authorities in each community to better understand a query and present more on-topic results to the searcher. If you have a smaller site, but one that is very relevant within your community, your site may rank higher than some larger sites that provide relevant information but are not part of the community."

+ Co-occurence: identify and use semantic associations between terms.

+ Future: "introduction of probabilistic latent semantic indexing and probabilistic hyper text induced topic search"

Posted by Gwen at 02:30 PM

April 13, 2005

Yahoo Labs

Yahoo focuses on research CNet.com (Apr 12) - Competition in the search labs at Google, Yahoo and MSN. Yahoo has hired Usama Fayyad from NASA to head up the Yahoo Research Labs. Of interest, Gary Flake who used to be principal scientist at Yahoo Research Labs has moved to Microsoft.

"Yahoo's lab will be developed into a center for innovation with scientists from all over the world, the company said. The lab will tackle scientific problems in search and information navigation, personalization and mobility, Yahoo said. It also will work on designing algorithms to support new technologies."

Yahoo Next is Yahoo's showcase for new tools.

Posted by Gwen at 01:38 PM

April 07, 2005

Google's Changeable Titles

Seroundable.com noticed that Google is Showing Dynamic Titles. That means the title it shows for a page may vary with the search terms you use. The example given was rustybrick. Search for rustybrick alone and get the home page with that as the title. Search for rustybrick web - get another title, in fact the actual title of the page. The first title comes from the entry in the Google Directory for Rustybrick.

Shows that Google is using its directory for something. But what happens if you search for word in title -- intitle:rustybrick? You get neither version, although Rustybrick does show as a Sponsored link.

Posted by Gwen at 03:41 PM

Semantic Web

The Evolution Of Web Search by David M. Ewalt, Forbes (Apr 6) -- Future of search is in the Semantic Web - tagging and identifying relationships.

"A more familiar example of tagging might be Froogle, Google's comparison shopping service. Retailers who want their Web sites to show up in Froogle searches have to update their product pages with hidden labels on things like price, name and manufacturer. Everyone uses the same tag for price, regardless of what they actually call it, so Google can easily collect product information from thousands of different stores, even if they're in different languages. "

Posted by Gwen at 03:16 PM

April 05, 2005

Tim Bray

A Conversation with Tim Bray in ACM Queue vol. 3, no. 1 - February 2005 -- "Searching for ways to tame the world’s vast stores of information".

Tim Bray, co-founder of Open Text, is director of Web technologies at Sun Microsystems. In this interview he talks about his work with the OED (Oxford English Dictionary) project at Waterloo University, Open Text, use of SGML and development of XML and RDF.

Posted by Gwen at 02:17 PM

April 01, 2005

Future of Search

Search For Tomorrow - by Thomas Claburn, Information Week (Mar 28) -- "Google may lead in Web searches, but investment in emerging technologies will open up new ways of searching digital information. Part 3 in the series The Future Of Software"

This is the future: "Google may have the market lead looking for Web pages, but fast-growing business and government investment in emerging IT areas such as Internet phone calls, electronic medical records, and anti-terrorism technology is driving demand for new ways of searching digital information. The goal is to extract information from databases, Web pages, documents, or audio and video clips automatically; recognize the names of people, places, organizations, dates, and dollar amounts; and find the relationships among them. Mining sounds and images for meaning is also important as companies expand call centers and switch to Internet-based phone calls and as the government pours money into IT for intelligence and homeland security."

Posted by Gwen at 05:44 PM

March 25, 2005

Free tagging

Yahoo's game of photo tag -- Stefanie Olsen, Cnet (Mar 22) -- Discusses the free tagging of photos at Flickr, the online photo sharing service that Yahoo just bought, and the possible expansion of such "folksonomies" to a "global categorization of information".

""The future of folksnomies involves meshing these user-generated categorizations with more standardized categorizations, such as the Library of Congress or the Getty Thesaurus of place names, so you could start to connect data to allow more of these associations to be made," Merholz [Peter Merholz, a founder at Adaptive Path] said."

Posted by Gwen at 09:01 PM

March 11, 2005

Semantic Web For Enterprises

Next big step for the Web--or a detour? by Paul Festa, ZDNet (Mar 9)

Speakers at the Semantic Technology Conference discussed whether enterprise applications for the Semantic Web will be the next wave.

"Just as the Web encompassed existing Internet technologies while adding its revolutionary system of hyperlinks, so, they claim, will the Semantic Web give birth to vastly more powerful ways of gleaning information from the world's computer network."

First they have to sell the concept -- "The Semantic Web protocols aim to let computers distinguish different kinds of data. Armed with those distinctions, applications could more automatically trade information, for example between an online address book and a cell phone. A Web site could automatically reconfigure itself on the fly based on the needs of a particular visitor. Search engines could narrow down results with greater precision."

Article points to a few real-world implementations of the Semantic Web. But a world of interchangeable data does come with concerns about security and privacy.

Posted by Gwen at 12:49 PM

March 08, 2005

User Taxonomy

Folksonomies - Cooperative Classification and Communication Through Shared Metadata by Adam Mathes, at Computer Mediated Communication, University of Illinois Urbana-Champaign (December 2004) - examines the user classification done at services like Furl.net, Flickr, and del.icio.us. Argues that "The primary problem with this approach is scalability and its impracticality for the vast amounts of content being produced and used, especially on the World Wide Web." On the other hand, involving users in organizing information may mean picking up new terms earlier and patterns of use.

Posted by Gwen at 04:33 PM

Under Google's Hood

Slashdot has an entry on Google's Technology Explored (Mar 3) gleaned from several articles.

Especially Peeking Into Google By Susan Kuchinskas, InternetNews (Mar 2)

Of interest: "As a search query comes into the system, it hits a Web server, then is split into chunks of service. One set of index servers contains the index; one set of machines contains one full index. To actually answer a query, Google has to use one complete set of servers. Since that set is replicated as a fail-safe, it also increases throughput, because if one set is busy, a new query can be routed to the next set, which drives down search time per box."

Google is also applying machine learning to know that one thing can relate to another even though there isn't an exact match on words. Clustering is part of the process.

"To do this, the system tries to cluster concepts into "reasonably coherent" subclusters that seem related. These clusters, some tiny and some huge, are named automatically. Then, when a query comes in, the system produces a probability score for the various clusters. " Google uses this for its contextual ads and to cluster news stories in Google News. Now - if they would just add it to web search results.

Posted by Gwen at 11:19 AM

March 05, 2005

Data Mining the Internet

Net searcher has its ears to the blog Faster information on trends promised by prototype tool by SARAH STAPLES, CanWest News Service (Feb 28) -- Accenture Technology Labs, in Palo Alto, Calif. has a tool - Online Search -- that "focuses on several thousand influential sources of online news and gossip that have traditionally been less accessible to search algorithms - from chat rooms and bulletin boards, to Usenet groups, fan sites and blogs written by amateur scribes. From those, it identifies hot topics and monitors people's positive or negative reaction to the next new thing."

Posted by Gwen at 11:50 AM

February 28, 2005

Interview with Ask Jeeves

"In conversation with..." Jim Lanzone & Apostolos Gerasoulis of Ask eeves/Teoma by Mike Grehan, e-Marketing News (Feb 2005) -- In conversation with Jim Lanzone, Senior Vice President, Search Properties at Jeeves and Apostolos Gerasoulis, founder of the Teoma search engine. Lots of gems in this. Grehan undertook this conversation, reproduced here as a transcript, as part of his research for a book on search engine marketing.

+ Reviews ranking technologies. Grehan refers to another research paper he wrote about use of link analysis in Google's Page Rank and the topic clulstering doen by Teoma based on Kleinberg's algorithm. Gerasoulis says that Google is only using its Page Rank to break ties -- "The importance has diminished because PageRank is just one piece of the ranking algorithm over there. The ranking algorithm is so much more complex now. And PageRank is just used when they want to break ties."

+ Ask Jeeves doesn't intend to absorb Excite, iWon and MyWay, but it might switch these portals over to the Teoma search.

+ Is it wise for Yahoo to index XML feeds and web sites? These three men say not. "It's mixing apples and oranges, the structured data with the unstructured."

+ "Majority of searches on the web are non-commercial".

+ Gerasoulis expects 2005 will be an exciting year for search engines. "Now it's not just about communities, it's about the users. There are new technologies coming in which will change the way that people access information."

Posted by Gwen at 12:46 AM

February 26, 2005

Smart Search Might Be On The Way

Web searching made more successful with automated, personalized assistance system - from Penn State, PHYSOrg.com (Feb 18) -- search software in the future might give better advice by watching what the seasrcher does.

Of interest >> "A Penn State researcher has developed software that improves Web searching with a personalized system that offers automated assistance for structuring and refining queries, evaluating search results and finding more relevant information. "Research shows 50 percent of all Web results retrieved are not relevant, pointing to a need for improved searching techniques," said Jim Jansen, assistant professor of information sciences and technology. "This technology enabled a 20-percent performance increase.""

Posted by Gwen at 04:45 PM

Google Algorithm

Google Watchers See Shift In Algorithm by Shankar Gupta, Online Media Daily (Feb 22) -- Signs that Google has changed its relevance ranking algorithm -- "... new formula appears to give more weight to sites that have content, not just sponsored links and a navigation bar. And Google apparently now evaluates the anchor text to determine if it's related to the site content, or is just the same word over and over again--in which case the site's rank would fall."

Posted by Gwen at 04:39 PM

February 01, 2005

Future of Search

In search of more: the ‘friendly’ engines that will manage the data of daily life By Richard Waters, FT.com (Feb 1) Futuristic view of what searching may become.

Of interest -- "Users will want more direct responses to their search queries, the experts acknowledge. "The biggest change we will see in the next five years will be in the way people use computers," says Mr Silverstein. Mobile handsets will become the most common way to find information on the internet, he adds. At that point, most queries would best be made and answered by voice. f the search companies become a more integral part of everyday life, how far will their influence eventually extend - and what impact will they have on other companies that exist to create or distribute information?"

Posted by Gwen at 11:53 PM

January 27, 2005

Future of Search

Seeking Better Web Searches "Deluged with superfluous responses to online queries, users will soon benefit from improved search engines that deliver customized results" By Javed Mostafa, Scientific American (Jan 24) Sweeping article about the trends in search, starting with a review of the ranking algorithms, personalization initiatives and the potential for full customization that will include location, plus advances in searching for images and music.

Conclusion: "By leveraging advances in machine learning and classification techniques that will be able to better understand and categorize Web content, programmers will develop easy-to-use visual mining functions that will add a highly visible and interactive dimension to the search function. Industry analysts expect that a variety of mining capabilities will be available, each tuned to search content from a specialized domain or format (say, music or biological data). Software engineers will design these functions to respond to users' needs quickly and conveniently despite the fact they will manipulate vast quantities of information. Web searchers will steer through voluminous data repositories using visually rich interfaces that focus on establishing broad patterns in information rather than picking out individual records. Eventually it will be difficult for computer users to determine where searching starts and understanding begins."

Posted by Gwen at 03:35 PM

January 15, 2005

Clustering Results

Some bits are turning up about clustering search results including a paper written about why and how to do it -- Learning to Cluster Web Search Results, Microsoft Research Asia. Paper finds that current clustering approaches don't produce good labels, and they propose a new method that uses and ranks "salient names".

"Our method is more suitable for Web search results clustering because we emphasize the efficiency of identifying relevant clusters for Web users. It generates shorter (and thus hopefully more readable) cluster names, which enable users to quickly identify the topics of a specified cluster. Furthermore, the clusters are ranked according to their salience scores, thus the more likely
clusters required by users are ranked higher."

Other experimental bits are mentioned in Web Search Clustering from Microsoft (and other Clustering Tools) Search Engine Watch Blog (Jan 11)

Posted by Gwen at 04:02 PM

December 30, 2004

Battelle on 2005

A Look Ahead by John Battelle (Dec 22) - predictions for 2005 in which the blogosphere will get more fractious: Firefox will win over 15% of the browser market but Microsoft will release a good upgrade; Yahoo and Google will do even more for merchants - and several more.

Posted by Gwen at 02:09 AM

December 27, 2004

Search Architecture War

What’s Next for Google By Charles H. Ferguson. Technology Review (Jan 2005) Sees that the "search industry is ready for an architecture war" -- "Architecture wars (also known as standards wars) occur because information technology markets require standards in order to manage complexity, communication, and technological change." Google and Microsoft are the main contenders. Examines strategies, past and present, of each and observes Google to be in the more precarious position. Shareholders, take note.

Posted by Gwen at 06:31 PM

Google Spidering URLs

Google Now Indexing Up to Six Url Variables Search Engine Roundtable (Dec 7) Google has been seen spidering URLs that contain 6 variables, showing that it is getting better at penetrating into databases.

Posted by Gwen at 04:13 PM

December 01, 2004

Topic Maps for Search

Searching Smarter, Not Harder by John Gartner, Wired (Nov 30) Some organizations are constructing topic maps to categorize content and show aspects and relationships. An example given was William Shakespeare -- " ... would be mapped to essays about him, his plays and his famous quotes." Topic maps are created by computers and modified by humans. Mentions work in Europe at Ontopia, Mondeca and Empolis to develop commercial applications.

Posted by Gwen at 01:36 PM

November 25, 2004

Personalization of Search - impossible?

Narrowing the search November 22, 2004, By Raul Valdes-Perez, News.com -- Notes several drawbacks to the personalization of search, most particularly that it's difficult to infer interest from what people click on. He sees a better future in clustering techniques. Mind Valdes-Perez is CEO and co-founder of Vivisimo and is responsible for the leading technology for clustering results.

Posted by Gwen at 04:52 PM

Natural Language Processing

"Advanced Search Techniques using Natural Language Processing" by Tony Rose, Freepint Nov 25, 2004 - overview article about work to improve information retrieval using natural language processing techniques.

Posted by Gwen at 03:07 PM

November 18, 2004

MSN Search will use Rosette Linguistics

Basis Technology to Enhance Multilingual Search in New MSN Search Engine Business Wire via CBS Marketwatch (Nov 17)

"Basis Technology today announced that Microsoft Corp. has chosen the Rosette Linguistics Platform to support Web searches in its new MSN search engine." ... "The Rosette Linguistics Platform uses state of the art Natural Language Processing techniques to improve information retrieval, text mining and other applications and apply them to global markets. Rosette provides capabilities like identifying the language of incoming text, providing a normalized representation in Unicode, and locating names, places and other key concepts."

Posted by Gwen at 11:46 PM

November 13, 2004

Microsoft v Google

A Google-Microsoft War by John Dvorkak, PC Magazine (Nov 16) Predicts an all out war between Microsoft and Google with similiarities to the Netscape - Microsoft war. Who's Netscape in this competition? Google? Is Google trying to create a browser-centric online environment?

Posted by Gwen at 12:58 PM

October 20, 2004

Matthew Koll on Web Search

A Conversation with Matthew Koll by Gary Price, SearchDay (Oct 18) Matthew Koll, once CEO of Personal Library Software and now of Wondir, spoke to Gary Price about the state of the web search industry. Some comments:

+ Google is in the business of advertising and maybe 50% in information retrieval.
+ Searchers do need specialized tools but " knowing where to look is the first and biggest obstacle to overcome in searching."
+ Future - "voice access and task integration"

Posted by Gwen at 12:46 AM

October 17, 2004

Stochasto for natural language search

The Answer Search natural language search engine " The Norwegian company Stochasto is getting ready to launch their natural language search engine, Answer Search, in English. " Pandia (Oct 15) - look promising but won't be available until Q1 2005.

Posted by Gwen at 03:16 PM

Meta description tag

The Meta Description Tag and Search Engines Jill Whalen. ISEDB.com (OCt 14) "The keywords and phrases you use in your Meta description tag don't affect your page's ranking in the search engines (for the most part), but this tag can still come in handy in your overall SEO campaigns."

Author tested the use of meta description tags at Google, Yahoo, Teoma, MSN.

Google - will use a snippet from the meta description tag if the search term is used in the text and in the description tag.

Yahoo - does show the meta description tag on some keyword queries depending on occurrence of words in the text (exact rules are not clear). It will also search on the tap and display the record even if search term appears only in the description tag. And lastly, on a url search it shows the meta description tag if available.

Teoma looks at the meta description but does not necesssarily display.

Posted by Gwen at 02:54 PM

October 14, 2004

Natural language search at H-Bot

H-Bot is an "automated historical fact finder" developed at Center for History and New Media. It responds to natural language questions. When did Scott go to Antarctica? When did Louis Riel die? (But it can't tell you what Louis Riel did.) Interesting.

Reviewed by Tara Calishain -- H-Bot Answers Historical Questions (Oct 12)

Posted by Gwen at 03:06 PM

October 13, 2004

Google's Intentions

Google's Web 2 Demo and the UI Plunge by John Battelle, SearchBlog (Oct 12) Reports on Google's demos at the Web 2 conference for language translation (seemed powerful), named entities and clustering.

Named entitity extraction: "essentially identifying semantically important concepts and the meaning wrapped around them".

Also predicts that Google will follow Ask Jeeves, A9, and Yahoo in using search history and personal data to filter and rank search results.

Posted by Gwen at 02:07 PM

October 08, 2004

When Will Google Cluster?

Google Sets Sights on Clustering, Translation By Matt Hicks, EWeek (
October 7, 2004) - Finally, the improvement we've all been waiting for. Google previewed work in clustering entities and words at the Web 2.0 conference. Unfortunately a beta version is not available yet.

Posted by Gwen at 08:05 PM

September 23, 2004

New search seeks answers

Search Me: Online Search Shifts from a Navigational Tool to a Customer Service and Educational Tool Tim Carpenter, Senior Analyst, Watchfire GómezPro, Insurance Technology Online (Sep 22)

"... search has taken a different turn in the financial services industry and is being used increasingly as both a customer service and educational tool, with the goal being to answer precise questions rather than to direct users to a specific product or area of the site".

Posted by Gwen at 01:27 PM

September 22, 2004

Flash files can be indexed

We should start seeing more FLASH (swf) files in search results now that Macromedia has made it easier for search engine spiders to read and index the files. Major search engines are said to have adopted the patch provided by Macromedia.

Search Engines Can See the Movies - Macromedia FLASH SDK Internet Search Engine Database (Sept 20)

Posted by Gwen at 02:07 PM

September 13, 2004

Web Search Technical Resouce

Web IR & IE - Information Retrieval and Information Extraction Has publications, mailing lists, newsgroups, and names of people active in this area.

Reviewed by Chris Sherman in Search Engines 201 (Sept 13)

Posted by Gwen at 06:39 PM

August 15, 2004

Natural Language Promised

Kozoru wants to give relevant answers to your questions Lars Iselid, Pandia (Aug 22) John Flowers hopes to create a natural language search engine - Kozuro - by building up a knowledge database. Good luck.

Posted by Gwen at 05:29 PM

August 09, 2004

Where search is headed

Next-generation search tools to refine results By Michael Kanellos.
CNET News.com (Aug 9)

Report from New Paradigms for Using Computers Conference, held at IBM's Almaden research lab. New ways for searching for information will involve connections either assigned (classification) or discovered (latent). Mentions work by University of California at Berkeley on Flamenco for searching art and antiques that uses faceted classification. Also Inxight's software to find connections between people and institutions according to information on the Web. There are also the many projects to index the desktop especially the MyLifeBits by Microsoft. Predicts the end of the file system.

Has figures on amount of information in the world.

- 100 million written books
- 2 million to 3 million audio recordings
- 100,000 to 200,000 theatrical movies - 1/2 from India



Posted by Gwen at 09:56 AM

August 04, 2004

Personalization

So Much Information, So Little Relevance by Steve Johnson. Computerworld (Aug 2) - Consumers are more interested in receiving personalized Web services and the Web services - especially search - are interested in presenting the right advertisements (if not the right search results). Collaborative filtering was an early approach used for recommending music and books but it is notably error prone. Attributized Bayesian Choice Modeling (ABCM) is better at understanding why people like the content. Still, it is not for every web site. Companies must know when personalization will be most useful.

Posted by Gwen at 04:55 PM

July 22, 2004

Gary Flake - Yahoo! Research Labs

Gary Price interviewed Dr Gary Flake, Principal Scientist & Head of Yahoo! Research Labs. Part 1 starts in Behind the Scenes at Yahoo Labs (June 24) Flake describes the work of the Yahoo! Research Lab and reflects on the state of web search engines -- "Today, search engines have almost no understanding of words or language in any significant way. " His intention is to get closer to the perfect engine -- "If web search were perfect, then it would produce an answer to every query that would be as good -- or better -- than if the smartest people in the world had as much time, data, and contextual information (about the user) required to fulfill the query; and it would do all of this in a split second. "

In Behind the Scenes at Yahoo Labs, Part 2 Flake discusses structured and unstructured data and the possibility of extracting implied data from pages. Personalization is an important development area - he foresees more tailoring of the relevance ranking functions.

Behind the Scenes at Yahoo Labs, Part 3 (July 7) covers a variety of topics - Yahoo! shortcuts as answers, local search, filtering out spam, and new features. Flake is certain that personalization will make the difference.

Posted by Gwen at 01:42 PM

July 21, 2004

Group Think Works

Perhaps good decisions can come from a crowd. That is the message of The Wisdom of Crowds: Why the Many are Smarter Than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations, a new book by James Surowiecki. According to Nigel Waters in his review of the book - A crowdy crystal ball [The Globe and Mail Book Section for July 17, 2004] -- Surowiecki "is able to show that, for certain types of problems, the group is wiser than the individual. " He cites as one example Google's ranking according to links or votes from other sites.

Contrast this to conclusions by Terrence Brooks of the University of Washington's Information School that Google's method for ranking search results replaces judgements by experts with that of the crowd. [See The Nature of Meaning in the Age of Google Sitelines (July 11)]

But if Surowiecki is right about crowd wisdom maybe Google's approach is actually better.

The nature of meaning in the age of Google by Terence Brooks, University of Washington. Information Research April 2004

Posted by Gwen at 04:27 PM

Apple's Spotlight

Apple unveils its answer to users' searching questions by Laurie Flynn. International Herald Tribune (June 29) Apple will be introducing an all-purpose search in its next version of the Macintosh Operating System. The search is called Spotlight and will be able to find data anywhere on the hard drive.

Posted by Gwen at 02:51 PM

July 20, 2004

GIS for web sites

"GIS Enabling the Internet" By Chris Kutler. FreePint (July 1, 2004) Geographic identifiers for web sites would greatly help in localizing information. This article describes current situation where web site owners must "register" their sites by location. It looks to Dublin Core to establish location as metadata, and to the search engines to use it.

Posted by Gwen at 07:16 PM

June 11, 2004

Gary Flakes of Yahoo talked to ResourceShelf

A ResourceShelf Interview: 20 Questions with Gary Flake, Head of Yahoo Research Labs (June 3) Gary Price asked what is wrong with web search today to which he replied, "Today, search engines have almost no understanding of words or language in any significant way. They exploit the statistical properties of words and links, but in no way is there anything going on akin to understanding. Search engines don't recognize user intent, can't distinguish goal-oriented search from browsing search, and are completely ignorant of the subtleties of how different concepts relate to one another. Moreover, they completely lack wisdom -- i.e., they are very poor at distinguishing between trivia and something profound.".

That said, sounds like Yahoo will be pushing into personalization and expanding content.

Quoting Gary Flakes in part 2 of the interview: "My hunch is that personalization will be so good that most users will look back to web search circa 2004 as ridiculously outdated."

Posted by Gwen at 05:12 PM

Views of IR

From IR to Search and Beyond ACM Queue vol. 2, no. 3 - May 2004 by Ramana Rao, Inxight Software -- History of search and information retrieval from the 1960s to the present. Describes several models and considerations. Sees a future with a "richer user model of information space".

Posted by Gwen at 04:41 PM

Google Bombs

There's a running list of Google Bombs at Google Blogscoped -- Googlebomb Watch.

Posted by Gwen at 03:36 PM

June 08, 2004

Arnaud Fischer - Search Technology

What Lies Ahead For Local Search Engine Technology by Andy Beal webpronews.com (June 2004 ) Andy Beal spoke with Arnaud Fischer, head of the Search & Directory division at Infospace about developments in local search. Search engines are putting their R&D dollar into finding ways to better deliver results that are specific to the area you are in - especially the advertisements. Infospace is one of these players as a service for yellow pages.

"Geo-targeting Web search content, both organic and paid, requires search engines to better understand users and queries, inferring local intent by extracting geo-signals and leveraging implicit and explicit user profiles. "

Fischer also commented on desktop search saying that, "Both Microsoft Longhorn and IBM WebFountain will eventually make search a lot more transparent and integrated to end-users' broader task-centric activities. "

Posted by Gwen at 03:28 PM

May 28, 2004

Google Ranking

SEARCH AND DESTROY by James Surowiecki. New Yorker (May 24) - about the manipulation and misuse of Google's ranking system. Considers Google bombs a prank, but search engine optimization a "racket". Prospects are not good. "Google works best when no one knows it’s there—when people are making their own decisions about which sites are useful or good". But that is no longer the case.

Posted by Gwen at 02:07 PM

May 27, 2004

Semantic Web and Social Networks

"The Semantic Web is Your Friend" By Libby Miller and Simon Price in Freepint (May 27) - finds that semantic web is emerging through social networking software - refers to the Friend Of A Friend (FOAF) project
.

Posted by Gwen at 02:04 PM

May 21, 2004

Inside Nutch

Building Nutch: Open Source Search Sponsored by Verity, Effectively Evaluate Enterprise Content.
ACM Queue vol. 2, no. 2 - April 2004 by Mike Cafarella and Doug Cutting, Nutch - about the experimental search engine Nutch and writing "an open source search engine".

Posted by Gwen at 08:23 PM

Endeca Guides Abebooks

Abebooks Selects Endeca - Abebooks, an online marketplace of 12,000 bookstores, will be using Endeca InFront to power search and navigation at its international sites. Endeca uses taxonomies to improve navigation. Barnes and Noble uses Endeca as well.

Gary Price said , "What I like most about Endeca is the ease with which a user can refine their results by simply pointing and clicking the refinements listed on the right side of a results page. " Resouceshelf

Posted by Gwen at 06:51 PM

May 19, 2004

Desktop Search

Google Moves Toward Clash With Microsoft By JOHN MARKOFF New York Times (May 19) Google has been testing a "powerful file and text software search tool for locating information stored on personal computers." This puts Google in a full head-to-head with Microsoft who will have similar function in the new Longhorn system. Microsoft's intentions seem to be to remove the need for a browser. This is Google's response -- "The disappearance of the Web browser and the integration of both Web search and PC search into the Windows operating system could potentially marginalize Google's search engine. Google, well aware of this threat, hired a Microsoft product manager last year to oversee the Puffin project as part of its strategy to compete with Microsoft's incursion into its territory." But will embedded search also mean advertising? Likely. Are there privacy issues? Yes. Article did not ask if the indexing of personal files will slow down a personal computer.

Also available at IHT as Google Invades Microsoft's Turf.

Posted by Gwen at 11:35 AM

May 11, 2004

Web Fountain

Web Search: On to "Sense-Making" by Ben Elgin. Business Week Online (May 6) "IBM's Dan Gruhl and Andrew Tomkins explain how Big Blue's WebFountain technology tries to answer "why" questions"


Of interest:
"Just as intriguing, WebFountain is attempting to bring a time axis to Internet search. Today, search engines provide a snapshot of how the Web views a certain topic. But it's largely a medium without a memory. That makes it next to impossible to spot trends or easily analyze how things shift over time -- which could be compelling information. Imagine the value a marketer would get from an answer to the question: "How have mentions of my brand changed over the last six months?" "

Posted by Gwen at 12:30 PM

May 08, 2004

KnowItAll

Search engine tackles tricky lists New Scientist Print Edition (07 May 04) Work by Oren Etzioni at the University of Washington to create a search engine that can make understand sentences well enough to extract lists of scientists, or botanists, or anything.

"Etzioni's ultimate aim is to have KnowItAll answer questions such as "list all British scientists born before 1900". The software cannot do that yet, because it lacks a module that can understand "natural-language" questions of this type. That will come later, he says. "

Posted by Gwen at 04:55 PM

Natural Language

Do What I Mean by Robert Cringely. PBS.org (April 22) - MeaningMaster is a technology developed for natural language queries based on the use of a lexicon that has 200,000 words interconnected based on meaning. It has been years in the making. Article mentions that both Google and MSN are interested. First reason for interest, of course, is for contextual advertising.

Posted by Gwen at 04:43 PM

May 06, 2004

Web Search problems

Two excellent articles on problems with searching in the April 2004 issue of ACM Queue about Enterprise Search.

Enterprise Search: Tough Stuff - When searching fewer documents, shouldn't it be easier to find what you're looking for? by Rajat Mukherjee and Jianchang Mao, Verity.

Searching vs Finding: How do you help computers find the information people really want? by William Woods, Sun Microsystems Laboratories

Source: TVC Alert

Posted by Gwen at 05:22 PM

Trends in Search Technology

Web Search for Tomorrow by Ben Elgin. Business Week Online (May 6) -

Describes some new developments in search technology to watch for:

- personalization but may take another couple of years to get it right.
- trend searching - taking snapshots over time. IBM's WebFountain does this but it will be along time before the technology is available for consumer search.
- desktop search. Microsoft has the lead with "Stuff I've Seen" being tested by staff.
- better results through clustering. Vivisimo excels at this. ixMatch is mentioned - has software for corporate use.

Posted by Gwen at 02:10 PM

April 29, 2004

Search Engine Meeting 2004

Presentations from the annual Infonortics Search Engine Meeting held for 2004 in The Hague are now available for viewing. All will be fascinating but in particular:

The Subtle Side of Retrieval - Elizabeth Liddy, Syracuse University, New York, USA
Search and Guided Navigation for Unstructured Content? - Peter Bell at Enteca
A Holistic Approach to Search - Tuoc Luong, Ask Jeeves
Convera: Fundamental Approaches to Categorisation - Iain Fletcher
Social Software and New Search - Stephen Arnold
Human Intervention in the Search Process - Martin Belam, BBCi Search
Turbo10: The Mechanics of a Deep Net Metasearch Engine - Nigel Hamilton, Turbo10.com, UK

Posted by Gwen at 05:07 PM

April 27, 2004

Web site ranking

Google In Controversy Over Top-Ranking For Anti-Jewish Site By Danny Sullivan, SearchDay (April 25) Sullivan tells a blow-by-blow story of the appearance, disappearance and re-appearance of the anti-jewish website, Jew Watch, at Google in top-ranked results all the while trying to divine Google's actions. Bottom line: Google states "The only sites we omit are those we are legally compelled to remove or those maliciously attempting to manipulate our results." Controversy over ranking may be with us for a very long time.

Posted by Gwen at 02:05 PM

April 18, 2004

Desktop Search

Humans vs. Computers, Again. But There's Help for Our Side. By JAMES FALLOWS. New York Times (April 18) - the next great breakthrough in search will be of internal documents - the stuff on our own computers. Fallows looks forward to the day when there is a tool that has the clarity of Google to solve this. Microsoft says the solution will be in LongHorn. Others are trying too -- One Note, ADM, askSam, BrainStorm, Chandler, Enfish, InfoSelect, iRider, Lookout, Onfolio, TheBrain and Zoot - and many more.

Posted by Gwen at 07:58 PM

April 15, 2004

Visual Search

Search prototype gets the picture By Michael Kanellos and Stefanie Olsen
CNET News.com (March 30) -- "Researchers at Purdue University have developed a search engine that retrieves results based on an image or a sketch." - reviews standard image search (largely based on text) and the implications of being able to search based on a sketch of a shape. Article was unaware of the work of Idée in Toronto.

Posted by Gwen at 04:29 PM

PNAS Online

Mapping Knowledge Domains is the major subject in April 6, 2004 from PNAS Online - Proceedings of the National Academy of the United State. Some articles are free.

Of interest:

Extracting knowledge from the World Wide Web by Monika Henzinger * and Steve Lawrence - talks about communities on the Web.

The world of geography: Visualizing a knowledge domain with cartographic means by André Skupin *

[Source: ResourceShelf Professional Reading]

Posted by Gwen at 04:13 PM

Reputation Management

Winning the Name Game Technology tools are helping companies monitor their reputations on the Internet. Alan R Earls. ComputerWorld (April 5) - features WebFountain and Factiva collaboration to track corporate reputations by analyzing Factiva's information sources daily. Aslo Biz360 - it monitors some 50,000 print, online and broadcast sources.

Susan Feldman, a Director at IDC, offered some technical explanation about how these tools work -- "The key to Factiva and some of the other reputation management offerings is text analytics ... That capability lets you look inside documents and pull out the information you need on a specific topic -- it parses the document the way you would parse a sentence in fifth grade" Tools use syntactic analysis - "It can distinguish the difference in meaning between the statements 'Bill hit Fred' and 'Fred hit Bill." "If you want to look for ideas rather than just words, you can store them as a block that includes the subject, object and verb relationship" ... "Then you can match those similar concepts."

Posted by Gwen at 04:03 PM

Gopher Alive

Gopher is still alive. This was the directory-like service used in the very early days of the Internet pre-Web for organizing and retrieving documents. There are still 250 active gopher services. Gopher: Underground Technology in Wired (April 12) - John Goerzen in Kansas is doing the most to preserve it. He sees some future for gopher in data exchange.

Posted by Gwen at 03:42 PM

April 14, 2004

Google and Finding Meaning

The nature of meaning in the age of Google by Terrence A. Brooks at the Information School, The University of Washington, Seattle, USA

Brooks argues that Google that succeeds with PageRank in dealing with an unruly and wild Web. By aggregating links, it does capture fairly well the "subjective sense of Web-page importance" and serve the average searcher. But it cannot extract meaning well. Article looks at the efforts to create a Semantic Web and contrasts it to "historical ambitions" such as the WOrld Brain by H.G. Wells in 1937. The Web is unruly and won't be easily wrestled into control through metadata.

Abstract: "The culture of lay indexing has been created by the aggregation strategy employed by Web search engines such as Google. Meaning is constructed in this culture by harvesting semantic content from Web pages and using hyperlinks as a plebiscite for the most important Web pages. The characteristic tension of the culture of lay indexing is between genuine information and spam. Google's success requires maintaining the secrecy of its parsing algorithm despite the efforts of Web authors to gain advantage over the Googlebot. Legacy methods of asserting meaning such as the META keywords tag and Dublin Core are inappropriate in the lawless meaning space of the open Web. A writing guide is urged as a necessary aid for Web authors who must balance enhancing expression versus the use of technologies that limit the aggregation of their work."

From Information Research Volume 9 No 3 April 2004 published by Professor Tom Wilson of the Department of Information Studies, University of Sheffield.

Posted by Gwen at 03:47 PM

March 29, 2004

New search engines

Beyond Googling: tech industry generating next generation of search engines by ANICK JESDANUN Canadian Press (March 26) Article mentions Mooter, Dipsie (that claims it will get past dynamic web pages), Eurekster (social networking), Superpages, Factiva.

Posted by Gwen at 04:09 PM

ScentTrails

Search tool aids browsing By Kimberly Patch, Technology Research News ( March 10/17, 2004 ) Researchers from Carnegie Mellon University have developed software that will help assess relevance of links in a search.

"The software, dubbed ScentTrails, shows a user how strongly the links generated by a Web search correlate with the topics she is searching for. The software grades the links a search engine returns by increasing the font size of links that have more connections to relevant pages. "

It's still in the prototype stage.

Posted by Gwen at 02:41 PM

March 23, 2004

Google search and Orkut

Google to find place for Orkut network in search by Michael Kanellos. CNet (March 22) Google will do an Eurekster thing and integrate social networks to search. "Schmidt [Google CEO Erick Schmidt] said that such services are a natural complement to the sort of automated searches that Google now provides, because it allows visitors to connect to experts or at least to people with knowledge. "

Posted by Gwen at 10:19 AM

March 22, 2004

Federated Search

Federated Searching: A Viable Alternative to Web Surfing by Barbara Fiehn. TechNews World (March 22) "A possible solution to the Google-only research approach is making its way into schools via library media center automation systems. Imagine searching your local library media center and other library collections, Web sites, and subscription databases with a single click of the mouse. " - examines the good, bad, and ugly - are still some wrinkles to work out.

Posted by Gwen at 03:12 PM

March 19, 2004

Finding Information

The high cost of not finding information By Susan Feldman of International Data Corporation (IDC). KM World (March 2004) There is a cost to not finding information. Fifty percent of searches are likely abandoned. Studies show that knowledge workers spend up to 35% of their time looking for information. IDC extrapolated from these studies some estimated costs of not finding information. The figures are shocking. Can technology help? Susan Feldman does name a few companies -- "Autonomy autonomy.com ClearForest clearforest.com Convera convera.com Endeca endeca.com FAST fastsearch.com InQuira inquira.com Inxight inxight.com iPhrase iphrase.com Mindfabric mindfabric.com Siderean siderean.com Verity verity.com".

Posted by Gwen at 08:58 PM

March 11, 2004

First Monday Articles

First Monday (March 2004) has two articles

Finders, keepers? The present and future perfect in support of personal information management by William Jones. Looks at costs of deciding what information to keep or destroy. Wants to "Develop tools that decrease the likelihood that "keeping" mistakes are made in the first place."

Do you "google"? Understanding search engine use beyond the hype
by Eszter Hargittai - warns against drawing conclusions about search engine behaviour based on seeing Google as the most popular. Millions of people don't use it.

Posted by Gwen at 12:20 PM

Unintended Consequences of search technology

David Seuss, CEO of the revived Northern Light gave a presentation to Computers in Libraries about "Ten Years into the Web: The Search Problem is Nowhere Near Solved." (March 2004) [Powerpoint]

Opened with a history of access to information beginning with the Puritans in Massachusetts. Noted that there are unintended consequences to information technology. Web search began as a good thing but today we are seeing that "web search results decline with the size of the Web databases". Junk is a big problem. Also, innovations in search are driven by revenue objectives - it's all about improving advertisements. Seuss does have an alternative -- "organize content into many high quality databases for professionally-oriented Web searching".

Spotted at the ResourceShelf.

Posted by Gwen at 11:43 AM

March 09, 2004

Deep Web Meaning

In search of the deep Web by ALex Wright. Salon (March 9)

"Today, the deep Web remains invisible except when we engage in a focused transaction: searching a catalog, booking a flight, looking for a job. That's about to change. In addition to Yahoo, outfits like Google and IBM, along with a raft of startups, are developing new approaches for trawling the deep Web. And while their solutions differ, they are all pursuing the same goal: to expand the reach of search engines into our cultural, economic and civic lives. "

Considers implications of this on publishers and prices.

Posted by Gwen at 10:40 AM

March 01, 2004

AOMI - New Way to Search

Here we go again - a search tool that will learn from us (it says). AOMI's Artificial Intelligence Search Product Will Render Most Traditional Internet Search Technologies Obsolete. PR Newswire via News Alert (March 1, 2004) AOMI is just vaporware so far - it's coming later in 2004.

Posted by Gwen at 07:49 PM

Personalization Efforts

New Web tools aim to customize searches By Michael Bazeley. Mercury News (March 1, 2004) - review of work at Google and Yahoo in personalizing search.

Posted by Gwen at 01:19 PM

February 28, 2004

Personalization of Search

The Future Of Search Engine Technology Andy Beal WebProNews.com (Jan 29) - notes that search engines are trying to "anticipate the intentions of the searcher" but it tends to be for finding neighbourhood pizza shop. Argues that the future is based on personalization. "However, in order to achieve this new search nirvana we, as consumers, must quell our fears and trepidations surrounding the protection of our privacy. In order for the search engines to develop technology that will be intuitive and anticipate our every need, we must first relinquish at least some of the privacy that we currently hold so dear. Let’s take a look at some of the ways that search technology could improve and you’ll soon get the idea why it will require us to cooperate with the search engine providers. " Has other scenarios of a rosy future in which search improves because the operating system monitors all activities. Let's remember - all solutions bring new problems.

Posted by Gwen at 11:36 AM

February 27, 2004

Dipsie Interview

Gary Price interviewed Jason Wiener, CEO of Dipsie. Dipsie is working on indexing the invisible web - "We can index pages that utilize cookies, database backends, forms and client-side scripting, among others. Our scalable technology will allow us to have over 10 billion pages within our first year alone." Ranking methods will be "language based".

Posted by Gwen at 07:54 PM

February 26, 2004

Ranking at Google and Yahoo

Yahoo Keyword Density Analysis Comparison to Google Research by goRank.com compiled on Feb 17 comparing "keyword density elements of Yahoo's new algorithm with Google's algorithm".

Found that Yahoo seemed to have a preference for more words on a page and more frequent exact word matches. Google's lower figure for keyword density (2% vs Yahoo at 2.8%) may because it does semantic word matching.

Both engines care about keyword density in the title. (Google = 16.9% and Yahoo = 19.6%).

Link text is a factor too, where Yahoo may prefer less text and better matches in the links.

Bolding could make a small difference. Yahoo likes it.

Posted by Gwen at 12:25 PM

February 24, 2004

Social Searching with Alerts

LeanIndex from 312inc.com might be the solution for anyone who needs to create their own niche search engine that will also alert them to new information.

312, Inc. Launches LeanIndex and LeanSwap, a Powerful Personal and Social Search Solution for Windows, UNIX, Linux and Macintosh Users Press Release (Feb 23)

"LeanIndex personal search engine is simple to use and finds information fast. It runs from a profile created by the user that contains keywords to look for, Web sites to search, the time between searches and how the user wants to be alerted. LeanIndex only searches Web sites the user pre-selects and trusts to keep them up-to-date with reliable news and information."

"LeanIndex simplifies a user’s ability to find what they need allowing them to make better-informed personal and business decisions. Three Twelve’s LeanSwap service creates a new Web community for sharing LeanIndex search profiles, tips, tricks and ideas. “312 created LeanSwap so people searching the Internet can now find other people who have similar interests and exchange ideas, tips and Web information sources,” said Brian Neilson, 312’s co-founder and chief executive officer."

Posted by Gwen at 11:35 AM

February 17, 2004

Web agents like travel agents

Search For Tomorrow by Joel Achenbach, Biz Report (Feb 16) - presents a history of web searching as background to some comments about its future. The future is to be ruled by agents.

""I often use the analogy of Web agents being like travel agents," says James Hendler, a computer science professor at the University of Maryland. "When I go to my travel agent and say where I want to go, they don't usually just say, 'Yes, you can get there.' They give me some options of different ways to get there. They think about some things I might have forgotten. Do I need a car, do I need a hotel reservation? And then they go do it for me.""

Looks to the metadata of the promised Semantic Web to make it easier for search engines to "understand" what it's looking at.



Posted by Gwen at 11:25 AM

February 14, 2004

New search power

Search Beyond Google by Wade Roush. Technology Review (March 2004) [Requires free registration] -- "Google reigns supreme as the search engine of choice—but for how long? A pack of startups—and Microsoft—are developing technologies to find what you want, faster."

Excellent article on the challenges of search in an ever expanding web of information. Notes that Google has reason to be anxious. Page ranking by popularity, while it was a huge boost 2 years ago, is now plagued by spammers and may also not scale well. Many are working on alternatives.

"For example, there’s Teoma, which ranks results according to their standing among recognized authorities on a topic, and Australian startup Mooter, which studies the behavior of users to better intuit exactly what they’re looking for. And then there’s the gorilla from Redmond: Microsoft is turning to search as one of its next big business opportunities. Its researchers are devising a new operating system that melds Google-like search functions into all Windows programs, as well as software that scours the Web for definitive answers to questions you phrase in everyday English. Meanwhile, Yahoo! launched its own research laboratory in January, and Cutting himself is building an open-source alternative to Google (see “Keeping an Eye on Google”). “Nowadays,” he says, “I’m not convinced [Google is] markedly better.”"

Article describes how Mooter works - a clustering search engine that learns from what you look at.

"Mooter analyzes the potential meanings and permutations of the starting keywords and, behind the scenes, ranks the relevance of the resulting Web pages within broad categories called clusters. The user first sees an on-screen “starburst” of cluster names. ... "To develop a more precise understanding of what the user is probably looking for, the Mooter engine notes which clusters and links get clicked and uses that information to improve future responses. Suppose a user enters the term “dog,” clicks on a cluster called “breeds,” and then spends a lot of time looking at sites about Schnoodles (a popular Schnauzer-Poodle mix). When the user clicks on a new search result, Mooter will personalize the ranking to reflect this apparent pattern of interest, which might, for example, lead to sites about “dogs” plus “breeds” plus “Schnoodles” appearing higher. A refined set of results appears on every page; the engine continues to adjust the rankings based on the user’s behavior."

Another newcomer, Dipsie, intends to index the Deep Web of content in databases.

Teoma has been using its analysis of links between sites to identify web communities.

Posted by Gwen at 10:52 AM

February 10, 2004

Social Networking and Search

The upside down of search "Commentary: At what point is search too good? " by Bambi Francisco. CBS Marketwatch (Feb 10)

Search can be improved by utlizing social-networking should we do it? Article talks about Spoke Network and its work to "is make the search process, or at least the searching-for-people process, more personalized and relevant".

"By organizing information based on social networks drawn from members' address books and the people they communicate with through e-mails (and instant messaging in the future, I'm told), Spoke improves upon the average search engine's results. ... On the other hand, the data it pulls together includes information about millions of people who are not members and suggests a dark underside to search precision." Also mentions Vivisimo's clustering as a search technology that will improve search. Concludes -- "The consequence of it all: There is no privacy left. We're more accessible. We're more targeted (Do we really need improved targeting for spam?). The channels to get to us are better defined. "

Posted by Gwen at 10:01 AM

February 07, 2004

Robert Scoble on the Future of Search

Microsoft's plans for a new search engine technology by Andy Beal. Pandia.com (Feb 2004) -- "Guest Writer Andy Beal talks to Robert Scoble from Microsoft about the future of search engine technology, Google and how search will be handled by the next incarnation of Windows. "

Microsoft is working hard at improving searching of the hard drive but what about the Internet? Robert Scoble sees "social behaviour analysis tools like Technorati becoming far more important". Also search engines will become more specialty - just RSS, just news etc. And users want more ways for search results to be delivered.

Posted by Gwen at 12:42 PM

February 05, 2004

WebFountain

Monster librarian at work By Dean Takahashi. Mercury News (Feb 5) - says that IBM computers gather 250 million web pages a week as grist for WebFountain's high-powered analysis. WebFountain looks for associations of names and words.

"Now IBM has begun licensing the technology to create ``buzz reports'' for corporate clients. WebFountain scours Web logs, chat rooms, newspaper stories and every other source of information to determine whether the chatter about a new product is good or bad; is a certain rock group on the way up or a one-hit wonder?"

Posted by Gwen at 11:24 AM

February 04, 2004

Building your own spider

Hack Your Own Search Engine Crawler By Chris Sherman. SearchDay (Feb 4) - Reviews the new book - Spidering Hacks by Kevin Hemenway and Tara Calishain. The book "offers "100 Industrial Strength Tips and Tools" for creating and running your own spiders. Among these tips and tools, of course, are instructions for creating your own personal web crawler that works much like those used by the major search engines."

Posted by Gwen at 03:01 PM

February 02, 2004

Search Predictions

The Future of Search Engine Technology by Andy Beal. Pandia (Jan 28, 2004) - foresees changes in personalized results - more tuned to your real interests. Related to this will be advertisements in web-based email that are more relevant - especially if Google does go ahead with an email service. Desktop search is sure to develop. (Google deskbar sure is handy.)

Posted by Gwen at 04:17 PM

Tim Bray on Search

On Search, the Series By Chris Sherman. SearchDay (Jan 29) - describes a series of essays that Tim Bray, CEO of Antartica, has written about search as "almost a virtual textbook on search engine technology ... highly readable, and replete with Tim's personal insights and opinions."

There are 15 installments to On Search, the Series.

Posted by Gwen at 03:41 PM

FAST for the Enterprise

FAST Debuts Enterprise Search Platform by Paula Hane. Newsbreaks (Feb 2) -- "FAST ESP (Enterprise Search Platform) creates a single point of access for all information across an enterprise—in real time, regardless of data format, structure, or location." Susan Feldman, a Director at IDC and author of the article "The Answer Machine" (Jan 2000) said “FAST ESP is the first approximation of an ‘answer machine’ that I have seen.”

Posted by Gwen at 12:14 PM

January 27, 2004

FAST for the Intranet

Fast Search & Transfer Seeks More Customers With New Service
BY PETE BARLAS, INVESTOR'S BUSINESS DAILY (Jan 27)

"Fast's new search service provides a speedy and more efficient system for companies and their customers to retrieve information on Web sites and private intranets. The service also helps businesses abide by federal compliance laws by locating key relevant documents."

Posted by Gwen at 01:17 PM

January 26, 2004

Google Papers

Learning About Search Engines From Google Engineers By Chris Sherman, SearchDay (Jan 26) -- "A new archive of publications by Google employees offers deep insights into many aspects of the search engine's operation. " See Papers Written by Googlers.

Posted by Gwen at 03:40 PM

January 21, 2004

Yahoo Research Lab

Yahoo! bets on search by Stephanie Olsen. Silicon.com (January 21 2004)

Gary Flake, previously of Overture, will head up Yahoo's new Research Lab.

"Much of the research is designed to improve web search and the relevancy of sponsored listings so these companies can win the loyalty of visitors and advertisers. "

"Related to search, for example, the lab will focus on how to personalise the experience for people across the Yahoo! network.

"We're here to help, not just in one or two areas, but across the whole spectrum of Yahoo! products," such as finance, news, IM and email, Flake said. "

Work is described at the Yahoo Research Lab website - http://labs.yahoo.com/

Posted by Gwen at 01:45 PM

January 08, 2004

IBM Web Fountain

A Fountain of Knowledge 2004 will be the year of the analysis engine By Stephen Cass, IEEE Spectrum Online (Jan 4, 2004)

Cass describes the intentions and workings of IBM's Web Fountain. Search engines list documents with matching words. Web Fountain will analyze to make sense of it.

"WebFountain works by converting the myriad ways information is presented online into a uniform, structured format that can then be analyzed. The goal is to provide a general-purpose platform that can allow any number of so-called analytic tools to sift the structured data for patterns and trends. "

WebFountain will convert to structured data the content of web sites, blogs, newsgroups, mailing lists and more.

"WebFountain is not intended for casual surfers. Its target audience includes the business executives who have already shown they are willing to pay for the insights that mining corporate databases can supply. Analytic tools can ferret out patterns in, say, a sales receipt database, so that a retail store might see that people tend to buy certain products together and that offering a package deal would help sales. WebFountain will allow executives to go beyond their own databases and analyze up-to-date information from any online source. "

Factiva has partnered with IBM and will be launching a WebFountain-based service to track the online reputation of companies.


Posted by Gwen at 02:54 PM | Comments (0)

January 07, 2004

Google Bombing

Google's (and Inktomi's) Miserable Failure by Danny Sullivan. Search Engine Report (Jan 6, 2004)

The practice Google introduced of link analysis for ranking results seems to have broken under the strain of Google Bombing. Google Bombing is where bloggers (and others) mischievously or maliciously use links and related text to jack up a target site (often a spoof) to top ranking. The latest in this is "miserable failure" to bring up the official biography page of George W Bush. Sullivan finds that Google and Inktomi have failed to counteract the undue influence these blogger bombers have on search results. He notes that Teoma is unaffected.

Posted by Gwen at 01:07 PM | Comments (0)

January 06, 2004

Google and Page Rank

Danny Sullivan picked out an article in JimWorld as a gem because it pointed out that the patent for the famous Page Rank is owned by Stanford University. Has Google been trying other algorithms for ranking results in order to end its dependence?

The "Florida Update" ... Exposed ? in JimWorld by J Cokos (Dec 22, 2003)

Posted by Gwen at 11:21 PM | Comments (0)

January 01, 2004

Google Semantics

In the Wake of the "Florida" Update by Karon Thackston. High Rankings Advisor (Dec 31, 2003) -- More about how Google is moving to semantic-based algorithms for ranking results and how this will affect copywriting by search engine optimizers (SEO). Mentions that Google is picking up more information-based directory sites and information pages and possibly less commercial.

"The reports are true... Google IS moving to a semantic-type system.
But that doesn't mean keywords are on their way out at all. After the
changes are made, Google will be going beyond *just* looking for
keywords on your page. They'll want well-written copy... actual
language that speaks to your site visitors. That means your copy will
take on a more important role than ever before. And that's great
news!"

Posted by Gwen at 04:59 PM | Comments (0)

December 16, 2003

Search Economics

How Search Engines Make Money By Grant Crowell, Guest Writer SearchDay (Dec 16) -- report from "Search Economics, Search Monetization Strategies," at the Search Engine Strategies conference in San Jose, August 2003.

Posted by Gwen at 03:20 PM | Comments (0)

Google Ranking

Google's Florida Update: One Month Later By Gord Hotchkiss. SearchEngine Guide (Dec 15) It's widely recognized that Google has changed it algorithms for ranking results. Hotchkiss refers to Danny Sullivan's observation that Google could have two systems working now, one for more competitive (commercial) searches and the other for less competitive. But Hotchkiss goes another step and wonders if Google is starting to use the concept technology it acquired from Applied Semantics.

"Applied Semantics Concept Server used language patterns, including semantics and ontology to try to both determine the real meaning of the words on a website page and also to anticipate what people are looking for. It tries to interpret concepts based on the use of words, their proximity and the patterns they occur in. What if Florida was Google's first attempt to start introducing this concept to their search engine?

The other unique aspect about Concept Server is that it can refine results on an ongoing basis as it becomes "smarter". It starts by feeding concepts or results that it feels matches the searchers intentions. If the response isn't positive, it will try to do a better job next time. "

Is this what is really at work and the system will become self-regulating?

Posted by Gwen at 02:18 PM | Comments (0)

November 29, 2003

Meta Tags

Meta Tags - What Are They and Which Search Engines Use Them? By Richard Zwicky. SearchGuild (Nov 28) Meta Tags are used in creating web pages to provide additional information about the page - author, description, keywords, perhaps copyright information. This article describes what they are and how to use them but doesn't identify which search engines use them. In general search engines don't use the metatag for retrieving or ranking results but may use it for the description.

Posted by Gwen at 09:46 AM | Comments (0)

November 22, 2003

Search Engine Primer

VOICE: A bluffer's guide to search (Nov 12) NetImperative.com -- Ask Jeeves VP of production and technology Chris Martin gives a primer on web search. He identified three challenges search engines must content with - the user's query, matching query to indexed pages, and weeding out spam.

Relevancy is determined -- "Through looking at the language and words used in a web page, its context, and discovering associations between them. Secondly, through checking incoming links to a page to assess its link popularity. Discovering domain expert pages through subject specific linkages. Checking where the site is also referenced elsewhere - and 'spidering beyond the page', going to other linked sites, then going back to the original site and checking the association. Finally, through seeing if another search engine is listing the site."

Posted by Gwen at 03:27 PM | Comments (0)

Customization of news searches

Microsoft news site to customise content NewScientist (Nov 18) -- Raul Valdes-Perez, president of Vivisimo, commented on the customization that is to be part of the new news search engine from Microsoft. (uk.newsbot.msn.com). Specifically - "Now the way to improve the user experience is to work on the next layer of algorithms that determine the presentation of the "search and rank" results." Microsoft has not revealed how it will do the personalization - possibly something similar to Amazon's recommender system or through a system that looks for more-like-this. Vivisimo is also working on a news search and will be introducing news search that "spontaneously clusters links to news articles according to subject."

Posted by Gwen at 03:00 PM | Comments (0)

November 13, 2003

Link Analysis

Sitelines comments on the effects of the use of link analysis by most search engines in ranking search results -- Rich-Get-Richer with Link Analysis (Nov 12)

Posted by Gwen at 01:14 PM | Comments (0)

Google API

Serge Thibodeau explains The Google API's and their uses at ISEDB.com (Nov 11). It's directed to programmers who need to access Google's web search database to build queries.

Posted by Gwen at 12:22 PM | Comments (0)

November 06, 2003

MS Office does Search

Never mind the talk about Microsoft wanting to buy Google or develop its own search engine. Microsoft is going full barrel into managing search through MS Office 2003 judging from announcements at the ResourceShelf -- "Microsoft links Excel to Edgar Online company data" and "eLibrary Integrated Into MS Office 2003". See ResourceShelf Business Research (NOv 4)

Posted by Gwen at 08:26 PM | Comments (0)

Search Algorithms

It's in the algorithms A glimpse into the future of mapping the Web By Paula MacKinnon. Information Highways (Nov/Dec 2003) - Search engine technologists continue to seek methods for improving relevance of results. Google is exploring personalization. MITACS (Mathematics of Information Technology and Complex Systems) in Halifax, NS is investigating the focused crawler that takes it clues from the user's Web browsing behaviour. The IBM Webfountain, Nutch, and Netnose are three others entering the fray.

Posted by Gwen at 10:10 AM | Comments (0)

October 24, 2003

Vox Populi Would React to the Average User

Queries Guide Web Crawlers Technology Research News October 22, 2003

"Researchers from Contraco Consulting and Software Ltd., T-Online International and Siegen University in Germany have written an algorithm that improves Internet search results by factoring in what people are looking for. ... The algorithm, dubbed Vox Populi, picks up trends by analyzing patterns in people's Web search behavior. The algorithm might flag an increase in queries about soccer near the time of the World Cup, for instance."

Posted by Gwen at 11:53 PM | Comments (0)

Local Search at Google

Local Search Part 2: Google & Mobilemaps Bring Back Geosearching by Danny Sullivan. Searchday (Oct 21) -- "crawler-based methods being used by Google and Mobilemaps to improve local searching when tapping into a web-wide database of content." But it is still all very tentative and experimental. Article reviews the earlier work on being able to find local listings and map them.

But while Google may not have geo-searching entirely figured out for web searches it can do more regional placement for advertisers -- Google Launches Local Search Targeting & Search Forum Spotlight (Oct 24) People will see ads for their local geographic areas first; if there are none, they'll see national ads.

Posted by Gwen at 11:49 PM | Comments (0)

Search Engine Panel

The State of the Search Engine Industry by Dana Todd. SearchDay (Oct 22)

This article is a short account of a panel discussion at the Search Engine Strategies Conference in August 2003. Topics touched on were paid inclusion, vertical engines (travel is doing very well, and Singingfish's multimedia search), and mobile search on cell phones and PDAs. Panelists were asked for their wishlists. Greg Notess of Searchengineshowdown asked for "truncation and proximity locators". Brett Tabke of Webmaster World hppes for "a subscription service for an ad-free search environment".

Posted by Gwen at 10:23 PM | Comments (0)

October 22, 2003

Future of Search - Semantic Web

The Web: Search engines still evolving By Gene J. Koprowski. UPI Technology News (Oct 21)

Of interest -- "Using a combination of statistical mathematics, heuristics, artificial intelligence and new computer languages, researchers are developing a "Semantic Web," as it is called, which responds to online queries more effectively. The new tools are enabling users -- now on internal corporate networks and, within a year, on the global Internet -- to search using more natural language queries. "

"Key word searching is common today," Wiener said. "But the next generation of the Web is making documents more contextually relevant. The relevance of each document to a particular topic, or search, will be related by the semantic tagging language that developers are working on now in fields from artificial intelligence to relational databases to statistics. People have been actively pursuing this for two or three years now to evolve the Web. Several efforts are starting to rollout. I predict that in the next six months to a year, you will begin to see semantic relationship searching on the 'Net."

Mentions the work of ClearForest with unstructured data.

Posted by Gwen at 10:57 AM | Comments (0)

October 17, 2003

Semantic Web

August 2009: How Google beat Amazon and Ebay to the Semantic Web by Paul Ford. (July 26, 2002) fTrain.com -- Futuristic article on how Google succeeded in becoming the largest online marketplace, easing out Amazon and eBay by using semantic web constructs. Mentioned by Stephen Downes in the OLWeekly Oct 17, 2003

Posted by Gwen at 05:57 PM | Comments (0)

Labelling Images

Researchers search for faster searches AP via Globe and Mail (Oct 16) -- Carnegie Mellon University researchers are trying to make image search better by attaching labels that have been created through a computer game played by people. Some are skeptical this will work.

Posted by Gwen at 09:10 AM | Comments (0)

October 15, 2003

IBM Web Fountain

IBM WebFountain - taking web search to the next level it-analysis.com (Oct 15)

Describes Web Fountain as a "text analytics system".

Of interest -- "WebFountain runs on an IBM supercomputer and monitors everything on the World Wide Web. WebFountain contains over a petabyte of storage with over 3 billion pages indexed, 2 billion pages stored and the ability to mine 20 million pages a day."

"Web Fountain is not about building a better search engine; it is about identifying patterns, trends and relationships that can be used by businesses to transform the way they work. WebFountain can spot trends in public opinion and popular culture as they emerge and watch them catch hold around the world. WebFountain can be used as a surrogate for public opinion, providing instant, comprehensive virtual market research in the place of newspapers, Web page research or a professional report."

Google also owns pattern finding, meaning extracting technology through recently acquired Applied Semantics but it's being used to deliver targeted ads.

Posted by Gwen at 05:06 PM | Comments (0)

Personalization Not Wanted

Study: Personalization not Secret to E-Commerce by Sharon Gaudin. Datamation (Oct 14)

"Jupiter Research released a study today that shows that only 14 percent of consumers say a personalized Web site lead them to buy more often from online stores. And just 8 percent say personalization makes them more apt to visit news, entertainment and content sites more frequently."

Study found that consumers want improvements in site navigation and more contact information. When they go online they have a task in mind - to find a particular CD - they don't need to be bothered by suggestions and distractions.

Surprisingly the article did not say anything about the privacy issues - people not wanting to give the information that would assist in personalization or have their activity tracked.

However, this was covered in Report slams Web personalization by Paul Festa in CNet News.

"More than 25 percent of consumers surveyed by Jupiter said they avoided Web site customization because of concerns that marketers would misuse the information. A similar proportion avoided registering with a Web site, for the same reasons."

Study indicated that personalization costs too much for little to no gain. However, there were some individual successes such as at Rand McNally, and 35% of the surveyed companies intend to go ahead with personalization plans.

Posted by Gwen at 11:14 AM | Comments (0)

Local Search

The next thrust for web search engines is local search - being able to let you narrow your search to a particular city or even zip code.

Google has a beta site in its lab area for Location Search in the US -- http://labs.google.com/location

The little Gigablast will work with geo-sensitive metatags.

Overture has been testing localized search.

Pandia had an overview article Google and AltaVista test local search (Sept 23)

Danny Sullivan reported on New Developments In Local Search: Part 1, Moves By Overture in SearchDay (Oct 14)

Mainly the effort seems to be to make web search engines serve as yellow pages. But then - why not use yellow pages? Because the search engines want to serve up "localized sponsored matches" - ads for your area. If there are none, the search engine may be able to pull from the yellow pages.

Sullivan said, "Overture has a separate database of listings that involves a small number of its US national advertisers taking part in a pilot program. Additional "backfill" results are also provided by yellow pages and data provider Acxiom."

So far all the work is being done for US locations.

Posted by Gwen at 10:35 AM | Comments (0)

October 06, 2003

Copernic Enterprise Search

Copernic Launches Enterprise Search Product for the SME Market by Paula J. Hane, Information Today Newsbreaks (OCt 6)

"Copernic, a company known for its consumer metasearch product, Copernic Agent, has officially launched its first enterprise search product, Copernic Enterprise Search. ... a product that is specifically designed to meet the needs of the Small-to-Medium-sized Enterprise (SME) and departments of larger enterprises. "

"Copernic Enterprise Search uses advanced linguistic and statistical technologies that can identify the key concepts and the key sentences of indexed documents. It is able to rank a document whose main theme corresponds to search keywords higher than a document that only contains search keywords once or twice. The results ranking can be fine-tuned by altering the weight of different ranking factors. The software also does automatic indexing of new and updated documents in real time, ... "

Posted by Gwen at 01:54 PM | Comments (0)

September 30, 2003

Semantic Web

The search for 'smart data' pays off - Business will benefit: Customers will be able to find data much quicker by Danny Bradbury, Financial Post (Sept 29, 2003)

Of interest ..

"Wouldn't it be useful to have a network of "smart" information -- data that understood itself? Web searches would be easier if a Web browser was able to search for related concepts rather than just looking for key words without their context." - describes the objective of a semantic Web.

"The biggest problem for semantic Web technology is that it is mostly aimed at specialist applications. It needs a dictionary of concepts called an ontology that will enable it to hook information together. It is relatively easy to create an ontology for a specific subject such as healthcare, aerospace or tigers, but introducing semantic technology on to the wider Web would require a huge ontology, or at least many ontologies linked together." - identifies why the semantic web will be largely limited to specialist applications.

Posted by Gwen at 02:17 PM | Comments (0)

September 26, 2003

DbSurfer

Web searches tap databases By Kimberly Patch, Technology Research News (Sep 24)

Birkbeck University researchers have developed software that makes it possible to search different types of databases / sources at the same time.

"The researchers' software automatically constructs trails across tables in relational databases, according to Wheeldon. The software treats each database row as a virtual Web page, and builds links according to database settings,..."

The spokesman, Richard Wheeldon, said the software could be ready in less than a year.

Posted by Gwen at 02:07 PM | Comments (0)

Berners-Lee on Semantic Web

Berners-Lee Talks Up Semantic Web By Thor Olavsrud Internet News.com (Sep 23)

Tim Berners-Lee spoke to the Royal Society in the UK about his vision for the semantic web. "It's like a great big database.", he said.

"For instance, he explained, consider an event listing on the Web for a lecture. It would include data like the location, start time, end time, the speaker, a phone number to call for more information and so on. But the data is fairly static. It can be read by humans, but not by machines. However, metadata could be applied to those datapoints which identify to machines what they are. Then an interested party could click to attend the event, and whatever calendaring application that person uses could immediately schedule the event in the planner, denoting where it is, what time it will start and what time it will end, and who will be speaking. It could provide a map to get the person to that event, and supply information about the speaker. "

Posted by Gwen at 01:54 PM | Comments (0)

WebFountain

IBM’s WebFountain Launched–The Next Big Thing? by Barbara Quint Information Today Newsbreaks (Sep 22)

More positive comments about IBM's WebFountain -- "a Web-scale mining and discovery platform that extracts trends, patterns, and relationships from massive amounts of unstructured and semi-structured text."

Posted by Gwen at 03:16 AM | Comments (0)

September 21, 2003

Semantic Web

Semantic Web: Out of the Theory Realm By Michael Singer Silicon Valley.com (Sept 12, 2003)

Posted by Gwen at 02:07 AM | Comments (0)

September 19, 2003

Term Weighting at Vox Populi

New Search Algorithm Hears 'People's Voice' By Mike Martin NewsFactor Network (Sept 16)

New Internet search algorithm called "Vox Populi" (Voice of the People) developed in Germany assigns relative weights to search words.

"Someone typing "free MP3 downloads" in Google might be taken to all MP3 download sites. In the Vox Populi algorithm, however, if "free" has a larger relative weight than "downloads" (based on statistics showing how many users searching for MP3 downloads are looking for free ones), the algorithm will take searchers to free download sites first. "

Hmmm - I'd like to set the weightings myself.

Posted by Gwen at 02:06 PM | Comments (0)

September 18, 2003

IBM and Webfountain

IBM unveils new advanced search engine MCN International - Channel News Asia (Sept 18)

Describes WebFountain as a new search engine "capable of extracting minute data from among billions of Web pages."

"The system, run by a supercomputer that absorbs 25 million Web pages a day from the Internet, learns to recognise and put into context particular phrases and groups of words on command."

More information at http://www.almaden.ibm.com/WebFountain/

Posted by Gwen at 06:12 PM | Comments (0)

September 15, 2003

Infonortics 2003

Reports on The Infonortics Search Engine conference April 2003

Information Overlook By Martin White. EContent July 2003 Issue - saw the theme as being between searching structured and unstructured data - an issue most relevant to enterprise search. How users search was discussed - in particular the dictum "You have 12 minutes before a user gives up". Recommended an IIR Evaluation Model in Information Research, an international electronic journal Issue 8-3.

Meeting report from the 2003 Infonortics Search Engine Meeting, Boston in Unstruct.org - a weblog about unstructured information.

Posted by Gwen at 04:29 PM | Comments (0)

September 10, 2003

Visual search

Idée Inc. and Wonderfile Corporation announce the release of SimSearch, the first commercial implementation of visual search for a stock photography website. Press Release (Sep 9)

Wonderfile offers professional users "royalty-free" stock images in digital format for purchase. These are searchable online and available on CDs. Online search is by keyword and visual likeness.

The visual search software is Espion from Idee, a Toronto-based company. Wonderfile is a Masterfile company, also in Toronto.

At Wonderfile, find an image you like and use SimSearch to find others that are visually similar. For example, search on Venice and pick a canal scene. Simsearch locates mainly water or canal scenes from the collection.

Posted by Gwen at 04:29 PM | Comments (0)

September 06, 2003

Clustering

Vivisimo press release explains why clustering is useful to searchers.

- can "discover" themes and explore more listings
- can focus on a folder and find more relevant results faster
- are drawn by folders to go past the first page.

Clustering of Search Results Increases Click-Through Rates Silicon Valley Biz Inc (AUg 19) PRNewswire

Posted by Gwen at 12:01 PM | Comments (0)

RSS at My Yahoo

Yahoo Adds an RSS Reader to My Yahoo Research Buzz - Supposedly a place to put headline news from blogs. May have worked for Research Buzz, but doesn't for me. MyYahoo will have to do better.

Posted by Gwen at 11:51 AM | Comments (0)

September 05, 2003

Vivisimo

Vivisimo Announces Release 4.0 of its Award-Winning Clustering Engine PR Newswire via Silicon Valley (Sept 4)

"Vivisimo's Clustering Engine automatically organizes search results into folders, without pre-processing the information. Release 4.0 enhances the functionality and features of the solution and contains fundamental breakthroughs in quality, enabling customers to increase their return-on-investment in enterprise search tools and improve end-user satisfaction by significantly reducing total cost of ownership and improving performance."


Supports metadata clustering (folders grouped around author, sources, set topics etc), and Show-in-clusters (for a particular result identify its cluster).

Public web site at www.vivisimo.com

Posted by Gwen at 09:27 AM | Comments (0)

September 01, 2003

Using link analysis

Google is most popular but others may do it better by Lee Gomes. Wall Street Journal via SFGate.com. (AUg 18) - searches for God at Google and Teoma and prefers the answer from Teoma. In so doing, describes the fundamentals of link analysis.

Posted by Gwen at 03:34 PM | Comments (0)

Overture Research

Maybe Overture will do more for search than place ads. It has opened a new web site to feature the work of its research department - Overture Research.

"Through creativity, invention, and scientific contribution, Overture Research has the mission to position Overture as a pioneer in the next online revolution. Our goal is to develop novel algorithms and technology to empower users, consumers, businesses, advertisers and publishers worldwide to maximize the social and economic potential of the Internet."

Posted by Gwen at 03:30 PM | Comments (0)

August 28, 2003

Grokker

Groxis Announces Web-Enabled Version of Award-Winning Visual Information Software PR Newswire (Aug 19) "Embedded Grokker(TM) Enables Search Engines, Enterprises and Other Organizations to Integrate Grokker Into a Web Page"

"With Embedded Grokker, the software is integrated into a Web site as a simple browser-based application. Embedded Grokker uses the core Grokker technology to turn thousands of pieces of information -- for example, search results -- into a simple, graphical map. These embedded maps are filterable, customizable and can be saved and shared. A visitor to a Web site can perform a search, reorganize the Grokker map, and then save it and mail it to a friend or colleague, who can reopen the map on the originating site."

No mention of particular public web sites that have adopted this yet.

Posted by Gwen at 08:54 AM | Comments (0)

August 19, 2003

Nutch

Project searches for open-source niche by Stephanie Olsen, CNet News (Aug 18) Nutch is developing open source software for searching that will show how it determined the rankings.

"... the project is not-for-profit and aims to advance search by supplying a technology for experimentation. Academic researchers or developers will be able to download the software and adapt it without having to reinvent the wheel, Cutting said. Foreign governments could use Nutch to develop a noncommercial search site for citizens rather than licensing a proprietary, ad-supported technology, he said. Or corporate entities could build a for-profit business around the technology. "

This is more likely to be used for "private" purposes - an organization or a specialized service rather than the spammer-infested web.

Posted by Gwen at 09:00 AM | Comments (0)

August 18, 2003

New search engines

Pandia recaps news regarding possible new search engines in More new search engine development (Aug 11) -- Kaltix, IBM's Web Fountain, and Nutch.

Posted by Gwen at 10:27 AM | Comments (0)

August 14, 2003

IBM and Search

IBM developed a search engine for a record company that may have wider applications. Called Web Fountain, "The technology reads and understands text, and uses natural language to make correlations between words. Unlike traditional search, Web Fountain searches everything on the Web, including chat rooms, when set to that parameter."

IBM's Path From Invention To Income by Lisa Di Carla Forbes.com (Aug 7)

See Gary Price's comments and analysis Web Search - IBM (Aug 10)

Also - IBM Takes Search to New Heights by Barry Taft eWeek (Aug 11) - provides short description of Unstructured Information Management Architecture, which is the basis for Web Fountain.

Posted by Gwen at 11:14 AM | Comments (0)

Overture - Quigo

Quigo has technology for sponsored searches that Overture wants. Quigo's technology can deliver more relevant ads based on its system that mixes semantic algorithms with human intelligence.

"For example, a web page featuring a travel article about Hawaii could offer advertising for hotels in Hawaii, airlines flying to Hawaii, unique tourist attractions in Hawaii and more. One advantage of AdSonar is in giving publishers the option of a human editorial setup for defining relevancy parameters and 'teaching' Quigo's machine learning algorithms which parts of each page should be targeted. The human editing process ensures that only the relevant parts of each document are targeted for ads, significantly improving the relevancy of the results." - from Press Release


Quigo offers the online publishing and ad serving industries a new contextually targeted advertising system Press release (Aug 13)

Overture picks Israeli start-up Quigo to lead search engine battle against Web giant Google By Galit Yemini Haeretz.com (Aug 14) --
Press release

Posted by Gwen at 10:44 AM | Comments (0)

August 12, 2003

Personalizing Search

Searching for the personal touch by Stephanie Olsen. CNet News (Aug 11) -- In general article reviews aspirations of web search engines to enhance their services through personalization. In specific terms, article puts spotlight on Kaltix, a new start up company that may have technology to speed up Google's PageRank computations and enable consideration of personal interest profiles.

Posted by Gwen at 06:27 PM | Comments (0)

July 31, 2003

National Library of Medicine

Natural Language: National Library of Medicine offers COSMO for answering frequently asked questions in a natural language style. http://wwwns.nlm.nih.gov/ Uses NativeMinds software. See Gary Price - Natural Language Searching (July 28)

Posted by Gwen at 12:26 PM | Comments (0)