Today’s Wall Street Journal gives us an insight in to the makeover underway in the Google search department.
Over the next few months, Google’s search engine will begin spitting out more than a list of blue Web links. It will also present more facts and direct answers to queries at the top of the search-results page.
They are going about this by developing the search engine [that] will better match search queries with a database containing hundreds of millions of “entities”—people, places and things—which the company has quietly amassed in the past two years.
The ‘amassing’ got a kick start in 2010 with the Metaweb acquisition that brought Freebase and it’s 12 Million entities into the Google fold. This is now continuing with harvesting of html embedded, schema.org encoded, structured data that is starting to spread across the web.
The encouragement for webmasters and SEO folks to go to the trouble of inserting this information in to their html is the prospect of a better result display for their page – Rich Snippets. A nice trade-off from Google – you embed the information we want/need for a better search and we will give you better results.
The premise of what Google are are up to is that it will deliver better search. Yes this should be true, however I would suggest that the major benefit to us mortal Googlers will be better results. The search engine should appear to have greater intuition as to what we are looking for, but what we also should get is more information about the things that it finds for us. This is the step-change. We will be getting, in addition to web page links, information about things – the location, altitude, average temperature or salt content of a lake. Whereas today you would only get links to the lake’s visitors centre or a Wikipedia page.
Another example quoted in the article:
…people who search for a particular novelist like Ernest Hemingway could, under the new system, find a list of the author’s books they could browse through and information pages about other related authors or books, according to people familiar with the company’s plans. Presumably Google could suggest books to buy, too.
Many in the library community may note this with scepticism, and as being a too simplistic approach to something that they have been striving towards for for many years with only limited success. I would say that they should be helping the search engine supplier(s) do this right and be part of the process. There is great danger that, for better or worse, whatever Google does will make the library search interface irrelevant.
As an advocate for linked data, it is great to see the benefits of defining entities and describing the relationships between them being taken seriously. I’m not sure I buy into the term ‘Semantic Search’ as a name for what will result. I tend more towards ‘Semantic Discovery’ which is more descriptive of where the semantics kick in – in the relationship between a searched for thing and it’s attributes and other entities. However I’ve been around far too long to get hung up about labels.
Whilst we are on the topic of labels, I am in danger of stepping in to the almost religious debate about the relative merits of microdata and RDFa as the encoding method for embedding the schema.org. Google recognises both, both are ugly for humans to hand code, and web masters should not have to care. Once the CMS suppliers get up to speed in supplying the modules to automatically embed this stuff, as per this Drupal module, they won’t have to care.
I welcome this. Yet it is only a symptom of something much bigger and game-changing as I postulated last month A Data 7th Wave is Approaching.