Researchers unleash crawlers into Deep Web data by Jennifer Foreshew, The Australian (Jan 19)
Professor Halevy, head of Google's structured data management research group in the US, discussed the difficulties in indexing structured data on the web - aka deep web. This summary of his keynote speech in Australia gives us some clues on what Google is doing.
"Google has two research projects on these problems.
The first, WebTables, compiles a huge collection of databases by crawling the web and finding small relational databases that use the HTML table tag.
"By performing data mining on the resulting extracted information, we can also introduce a number of brand-new data-centric applications," the paper says.Posted by Gwen at January 19, 2010 02:35 PMThe second project attempts to extract information from the Deep Web, which refers to data on the web that is only available by filling web forms, and therefore invisible to traditional search crawlers."