October 09, 2009

Evolution of Search from Boolean to Entity

An Evolution of Search, by John D. Holt and David J. Miller, ASSIS&T(Oct/Nov 2009)

John D. Holt and David J. Miller, senior architects in the Lexis Nexis Risk and Information Analytics Group, review the progression in information search and retrieval technologies since the early times of Boolean. The tugging match is still between precision (precise results) and recall (comprehensive). Search has evolved to Entity search - retrieval by the attributes of the information item.

"This paper provides a brief review of some of the earlier stages of search evolution in the context of the evolutionary pressures of the concurrent improvement of both precision and recall. "

Most especially note the conclusion:

Entity search is another step in the evolution of information retrieval systems. Entity search builds upon Boolean and relevance ranking techniques. Entity search provides improvements in both precision and recall over traditional Boolean and relevance ranked search techniques.

Boolean search techniques require the researcher to be knowledgeable of the words and expressions used in the document or record collection. Precise results can be obtained, but at the cost of a significant drop in recall. Recall can be achieved, but only at a significant drop in precision.

Relevance ranking via statistical techniques can be used to improve apparent precision in some cases. However, the statistical techniques do not apply well to searching structured and semi-structured data with attribute values.

The linking or clustering of the documents or records into sets of references that describe an entity can be used for much more than just reporting on an entity. The information from the set can be used in some cases to improve recall by broadening the search. Alternatively, and more powerfully, the entity can become the object of the search.

A search expression that specifies a set of attribute values can be used when the entity is the object of the search. Both precision and recall are improved. Precision is improved because the entities returned are all consistent with the attribute values supplied in the search. Recall is improved because the combination of entity values specified in the search expression need not appear in any particular underlying reference document or record.

Posted by Gwen at October 9, 2009 03:08 PM