Monday, June 9, 2008

Powerset vs. Cognition: A Semantic Search Shoot-out


Powerset vs. Cognition: A Semantic Search Shoot-out



Nitin Karandikar,
Saturday, June 7, 2008 at 9:00 AM PT Comments (10)





Powerset, which implements
semantic search, recently released a public beta based on the limited
data set of Wikipedia. But while there is no question that Powerset has
some interesting and valuable semantic search technology — many of
their demo queries produce meaningful summary pages and reference pages
with information extracted from Wikipedia content — there are other
semantic search engines that produce equally meaningful and relevant
results.


In this post, we compare Powerset results with those of a demo implementation from one such search engine, Cognition Technologies. And we compare them both with the current gold standard in web search, Google (again, limited to the Wikipedia data set).


Example 1: Powerset


There are some classes of queries in which Powerset shines, such as
whenever the query involves extracting concepts or aggregation of data
from a given data set.


For example, check out the beautifully presented results for the
following queries that extract key information the user is looking for
and provide it in summary format:


“military intelligence”



“teams in the NFL”



Example 2: Cognition Technologies


On the other hand, there are other types of queries — especially
where hardcore semantic parsing is involved — where the Powerset
algorithms get confused, and Cognition gives better results:


“rare wildlife of the Amazon”



“football players who went to jail”



Example 3: Google


There are still queries (especially when semantic parsing is not
involved) in which Google results are much better than either Powerset
or Cognition:


“helicopter carrier Iwo Jima class”



Here, surprisingly, Google has the best results. Powerset has
related results, Cognition gets totally confused, but Google nails it!


Disambiguation


One area where both Powerset and Cognition improve on Google is the
disambiguation of query terms. This is always a significant issue for
search engines; for example, when a user types in the keyword Java,
does she mean the island, the programming language, or the coffee?


Google has recently tried some experiments in this area, but these new search engines go one better.


When Powerset sees an ambiguous topic, it uses tabs to provide both sets of results:




Cognition handles it in a different way, by letting the user select from among different semantic meanings for each term:



User Impact


For most common searches, Google search works just fine. We’ve all
gotten used to the ubiquitous “keyword-ese,” currently the universal
language of web search. With Google’s unlimited resources,
comprehensive index and formidable prowess in finding relevant results
using the PageRank algorithm, it’s going to be difficult for any other
search engine to match those results. Users may have to work just a
little bit harder for unusual queries or specialized searches, but most
users will accept that trade-off in return for using their familiar and
beloved search engine. Indeed, the word Google has come to represent
web search in the same way that the word Xerox had once come to
symbolize the process of photocopying.


Future Competition


So what can Powerset (and Cognition) do to gain traction and capture users?


In their recent book, “The Innovator’s Solution,”
Clayton Christensen and Michael Raynor discuss how upstart companies
challenging market leaders and entrenched incumbents can position new
technologies for a reasonable chance of success. One approach that they
believe is guaranteed to fail is when these smaller upstarts try to
make evolutionary improvements to get and stay ahead of the major
players.


Instead, they suggest shaping the new technology into a disruptive innovation, along either of the following two major axes:


1. New-market strategy: Leveraging the innovation to attract users
who do not typically participate in using the product or service, and
thus growing the market as a whole.


2. Low-end strategy: If there are price-sensitive, over-served
users who would be willing to trade some of the advanced functionality
in return for a lower price point, then the smaller players have an
opportunity to enter the market — that is, if they can figure out a way
to make a profit.


In other words, the new players entering the market have to find
profitable business opportunities in segments of the market that are
not attractive to market leaders.


Using this model, it is apparent that a strategy of challenging
Google head-on for control of the mainstream web search market has
little hope of success, regardless of the new technologies or search
innovations that are applied. Google would have no choice but to fight
back with everything it’s got to catch up to or leapfrog this “better
search” alternative.


Similarly, since Google search is free for users, there is really no
viable low-end strategy, no way to outdo the existing search leader by
offering a lower price point.


What about non-participant users? Practically everyone online
already uses a web search engine (with Google being the overwhelming
favorite). However, Google search follows a specific, consistent set of
guidelines: simplicity of UI, speed of response, and relevance based on
incoming links. These design parameters take top priority over all
other considerations.


By challenging these assumptions, we can discover new use cases in
search that are underserved (or not served at all) by Google. Some
examples include:


1. UI Simplicity: Google’s minimal UI is trivially simple to use
and ideal for a one-size-fits-all model, but it may be less than
optimal for complex semantic searches. As Alex Iskold points out in his
recent article on the myth and reality of semantic search,
a richer user interface would allow power users to express
semantically-rich search queries and get back better results. Notably,
Powerset and Cognition excel at these types of queries.


2. Speed: For some types of advanced searches, users might be
willing to wait, perhaps even as long as a day, in order to get back
semantically complex results. Imagine a software agent that acts as a
virtual search assistant - once the user specifies a query with
multiple levels of complexity and dependency, the agent goes off and
returns the next day with a list of possible results/options. Queries
that require the coordination of complex tasks fall into this category,
such as planning a trip that requires coordinating air travel, hotel
and car, and minimizing the cost of the whole trip while taking some
additional factors into consideration.


3. Relevance: Although all the mainstream search engines use similar
criteria to evaluate relevance (mainly, the evidence of incoming
links), other relevance algorithms are certainly feasible and may work
better for certain classes of queries. Social relevance is an obvious
example; reputable premium content is another.


This post is in no way meant to discredit Powerset — they’re in
early beta and are doing a fine job of building semantic search.
Instead, the examples above clearly demonstrate that the jury is still
out on semantic search; other search engines are also contenders in
this space, and the race is far from won.

Article Link

No comments: