Wednesday, May 28, 2008

Freebase: Dispelling The Skepticism

Freebase the first product of semantic web company Metaweb, is an open, semantically marked up database of information that we called one of the "10 semantic apps to watch" last year. With $57.4 million in funding, a smart team, and a tech legend in Danny Hillis at the helm, Metaweb is considered to be one of the most serious players in the Semantic Web space. Yet the company's
efforts to date have been met with skepticism. Particularly, people have asked how is Freebase different to Wikipedia? Jamie Taylor, the Minister of Information at Metaweb, spoke at the SemTech 2008 Conference that took place in San Jose last week in an effort to dispel some of that skepticism.

What is Freebase?

Jamie has an interesting title: Minister of Information, and his
primary responsibility is to seed Freebase with information and ensure
the quality of the data. According to Jamie, Freebase is "open shared
database of the world's knowledge."
This sounds the same as Wikipedia, but it is really quite different,
because at the heart of Freebase are the ideas of semantics and
openness via API.

Unlike Wikipedia, which is a free form database, Freebase is structured, where concepts and relationships
are interlinked into a gigantic network or graph. Another important difference is that Freebase is all about its API.
Any information contained inside the database is accessible and can be retrieved via queries. In addition, the data
in Freebase is under a Creative Commons license - meaning that is readily exportable and useful by others.

When it comes to defining the meanings of things, Freebase is focused on community, with collective editing, attribution,
and collaboratively built semantics. This last point is quite crucial - the founders of Freebase believe that meaning
has to emerge from the collaboration between users. As such, Freebase is one of the first experiments of web-scale
social contracts. The site is really focused on the notion that information is not encumbered by licenses and is free to use.

What is in Freebase Today?

Data comes into Freebase from many sources: Wikipedia, Flickr, the
US Department of Commerce, Music Brainz, the USGS,
SFMOMA, the US Exchange Commission, Chef Moz, and many other places.
Right now the information is mostly about people and places, but the
is engineered to have a wide range of data types. As an example of
"People" information, there
is a lot of information in Freebase about artists along with their
artwork and place in history.
More esoteric types of information you might find in the database
include airplanes, french cheese, tropical storms in the 90s,
oil companies, and candies.

Freebase also contains lots of other kinds of data and has:

  • 3.4 Million Subjects
  • 750K People
  • 450K Locations
  • 50K Companies
  • 40K Movies
  • ... Over 1K Data Types with over 3K Properties

Data Representation in Freebase

While Freebase certainly has long way to go before it can claim
completeness of information,
its core idea of object representation and linking seems very solid.
Each object in Freebase is unique.
As more information comes into the system about an object, more links
are created about it in the system.
It is particularly interesting how Freebase establishes object identity
and decides that two concepts (or subjects) are the same.

The diagram above illustrates the idea. When a new source of information is added to Freebase, it is parsed into
entities and facts. The new information is then cleaned up and is merged with the existing system. But
the merge only occurs if the system determines that the two bits of information are really about the same subject (in
this case Leonardo Da Vinci). This is a powerful approach which allows Freebase to grow the knowledge around individual
subjects. What is also interesting is that Freebase allows human editing to reconcile situations when the system
is unable to automatically link the two concepts together.

Each permanent object in the system has a GUID - a unique
identifier, something like this: #9202a8c040000064.....
The identifier can be used to refer to the object via URL and via
queries. In addition to the GUID, there are other
ways to refer to the object, for example, Beyond that, there
are even other
aliases, for example, you can refer to a public company by its stock
ticker symbol. But regardless of the reference, the key point is that
you end up with the same, unique node in the system.

Freebase also has the ability to create new domains and types that describe new concepts, for example, science fiction movies.
There is a way to attach new data types to the existing domains, and then these types can be shared and used by other users.
The idea is that you can model things with the fine grained resolution that you need and then you can invite people
to help you refine and evolve your models. An example is the motorcycle community, which evolved out of an effort led by
one guy and who was then joined by others, and has since been promoted to the top level. The community process
is about merging private types to build common models.

What Can You Do With Freebase?

Freebase is not a formal system, it is not a reasoning engine, it is just a knowledge repository, a database.
To query Freebase you use the Metaweb Query Language (MQL), which is based on JSON. The language is meant to be very simple
and it is actually very interesting as well. The idea is that you fill out a tree which represents a partial
graph with pieces that you know and then the system basically fills in all the slots that you left blank
and delivers back all possible subgraphs.

For example, say you are watching a movie and you can't tell what it
You know that the movie stars Patrick Swayze and an actress who was
also in "Tank Girl." So you create a movie query and express all these
facts, using JSON-style syntax.
And when you run the query you get back that the actress is Lory Petty
and the movie
is "Point Break" and you also get links to IMDB. So the query and the
results have the
same structure and to find matches you simply traverse the set of
results that is returned.

Building on this example, Freebase is really meant for complex inferencing queries, the sorts
of questions that Google has no way of answering using its statistical frequency algorithms.
For example, what US senators took money from a foreign entity? Turns out that both Barak
Obama and Hillary Clinton received donations from UBS AG, based in Switzerland. That is a complex
inferencing query that needs to be expressed in a query language before it can be answered
and so questions of this nature are outside of the reach of any search engine -- and Wikipedia too, for that matter.


There is quite a lot of activity going on around Freebase today.
Many enthusiasts are building small proof of concept applications
showcasing what can be done
in the future with this powerful database. You can stay on top of the
cutting edge stuff coming both from the Freebase team and community at: and

Friday, May 23, 2008

You Play a Game, Computers Get Smarter, AI Starts to Work

Last week a new site called Gwap was launched by Carnegie Mellon's School of Computer Science.
The site offers an array of multi-player games that have a benefit
beyond just that of momentary distraction or amusement. These games are
helping improve image and audio searches, teaching computers to see,
and enhancing AI. However, all that won't matter to the players
because, as it turns out, these games are actually fun.

About Gwap

Nicholas Carr blogged about Gwap
a couple of days after its launch, noting that "one thing the Internet
enables, which wasn't possible before, at least not on anywhere near
the same scale, is the transfer of human intelligence into machine
intelligence." In Gwap, which stands
for "Games With a Purpose," that transfer of intelligence is done by
getting people to do the routine chores that computers don't know how
to do - chores like tagging photos, describing songs, and outlining
objects, as well as transferring a good bit of human common sense to
the machine. The trick to getting people to do these things is to make
the work fun. Hence the games.

The creator of these games is Luis von Ahn, winner of a 2006
MacArthur Foundation "genius grant" and a pioneer in the field of human
computation. Ahn is most notable for helping to develop CAPTCHAs
(Completely Automated Public Turing Test to Tell Computers and Humans
Apart), those somewhat annoying but rather effective distorted letter
puzzles used millions of times each day. Last year, he also introduced
the "reCAPTCHA," where CAPTCHAs were used to gain access to a web site while also helping digitize old books.

Gwap homepage

The Games

Gwap currently features five games, one of which is an old classic called the ESP Game.
In the ESP game, two players view the same image and try to guess words
that the other player would use to describe it. Google licensed this
technology and launched Google Image Labeler to help improve the quality of their image search results.

The four new games include:

Matchin, a
game in which players judge which of two images is more appealing, is
designed to eventually enable image searches to rank images based on
which ones look the best.
Tag a Tune,
in which players describe songs so that computers can search for music
other than by title - such as happy songs or love songs.
Verbosity, a test of common sense knowledge that will amass facts for use by artificial intelligence programs.
a game in which players trace the outlines of objects in photographs to
help teach computers to more readily recognize objects.

According to the Carnegie Mellon announcement, von Ahn plans to add a lot of games to the site, saying "we have three more that we'll be launching in the coming months."
He hopes that by having all the games on the same site it will
encourage players to try several different ones. Players also have a
single sign-on and password, Top Player rankings, and online chats,
said von Ahn.

The Human Processor

In his whitepaper entitled "Invisible Computing," von Ahn compared game design to to algorithm creation, saying:

" must be proven correct, its efficiency can be
analyzed, a more efficient version can supersede a less efficient one,
and so on. Instead of using a silicon processor, these "algorithms" run
on a processor consisting of ordinary humans interacting with computers
over the Internet."

In other words, we're the processor. The machine is us.

This concept isn't entirely new - Amazon's Mechanical Turk,
for example, pays people to contribute their time to work on small,
simple tasks called "Human Intelligence Tasks," or HITs. However,
unlike HITs, which can sometimes be boring or tedious, the games on
Gawp are actually fun - and they don't feel like work.

Some believe that human powered processing is the next big wave for computing. You could argue that Mahalo, the human-powered search engine is an example of this. (Though others call it a human-powered link farm.) Perhaps a better example is ChaCha,
the mobile Q&A service that uses human guides to respond to
questions called or texted in from your cell phone. We've also covered
other human-powered services on RWW in the past, like the Galaxy Zoo
and Stardust@Home project, among other (our coverage here). Many of these efforts have tried to incorporate an element of "fun" into what is actually work.

Whether Gwap will actually gain
momentum and get a large number of people involved is yet to be seen,
but it is definitely has potential to help teach computers the things
they can't do for themselves....yet.

Thursday, May 22, 2008

Goldman Internet: Google: New Sources Of Ad Growth; Mobile Complements Desktop; No Cash-Back Plans

Google’s (NSDQ: GOOG)
growth rate and all that entails—click through rates, query growth, ad
quality, etc.—continues to be the subject of a lot of disagreement and
debate. Q1 turned out solid, but there’s not going to be a letup in the
key question facing the company: are its heady growth days done for?
Speaking at the Goldman Sachs Ninth Annual Internet Conference, Nick
Fox, Director of Business Product Management at Google tried to to pull
back the thicket: “There are significant opportunities across the
board. “He ticked off four drivers of revenue: query volume, ads per
query, quality, price per lead, and noted for example that many queries
still don’t have ads running against them: ”A small portion of our queries have ads… we see that as a pretty significant opportunity.”
The challenge is to convince advertisers that they could be making
money by bidding on more keywords than they are. Other opportunities
include helping companies improve the quality of their landing pages
(so they get more value from their ads) and rethinking the ads that go
next to each query. Fox’s example: (NSDQ: AMZN) ads for the book Harry Potter don’t generate many clickthroughs because searchers often are looking for something other than the book itself.

-- Social networks: “What I would say on social networks…
the whole industry has been surprised at the difficulty of monetizing
social networks.” “We found it more challenging than we expected it
would be.” The difference: “On search you have this amazing thing: the
query… on a social network, you don’t really know (what the user is
looking for).” Other problems: users are doing a lot of non-commercial
things on social networks (he mentioned throwing sheep and playing
Scrabble). Those who are making money: “scummy things” like pyramid
schemes and things that trick users into downloading ringtones.

-- Mobile: “Mobile is much more similar to search than a
social network is… mobile actually monetizes quite well.” New devices
are key: iPhone users search at a rate of 50x normal users. Also, on
mobile, there’s high volume on weekends and days when volume is low on
desktop. So the mobile business complements the core business.

-- DoubleClick: Neal Mohan, Director of Product Management
for Ad Serving Platforms, discussed how DoubleClick (Mohan came in via
the acquisition) would complement the core business. The main goal:
more choice. Through the integration, Google can help publishers figure
out how best to monetize a page, whether it’s through text or display…
though the primary focus is on bringing more monetization choices to
publishers on the display side.

-- Microsoft’s (NSDQ: MSFT) cash back: Fox: “A lot of demand from advertisers to advertise on a CPA basis.” For consumers: ”No
plans to pay users to use our products… our fundamdental belief is that
we should compete by building a great user experience.... that’s the
focus, rather than paying users nickels and dimes.

-- User targeting: Not focused on building up profiles of
users for better ad serving. But there are opportunities to improve
yield by looking at query patterns. For example: If you search for
‘Italy Vacations’ and then search ‘weather’ you can improve yield on
the latter search by taking into account the former.

-- Google’s apex: A questioner (representing investors)
wanted to know about the talent issues and whether we’d seen Google’s
corporate apex (arguably the biggest question there is): “We’ve seen
extremely low turnover and churn in employees… they tend to get a lot
of attention cause it’s Google.” Reality: level of churn is actually
low. Mohan: No key DoubleClick employees have left.

SemTech Panel: Investor Opportunities and Pitfalls

Eghosa Omoigui (Intel): Intel Capital is the investment arm of the Intel Corporation,
and it has $3 billion under management. 93 invest professionals are working with 422 companies on deals ranging
in size from seed stage investments to hundreds of millions of dollars. The most recent investment was in Endeca, which specializes in content aggregation and management for enterprises. We have a lot of interest in the
semantic space and have been following it for over five years. There are several deals that are currently pending
that I can't name. One is doing dynamic, large-scale ontologies - taking content and overlaying dynamic
. The company is based out of Asia and will be announced soon.

Question: How do you expect to make money on the Semantic Web? What are the monetization strategies that you are seeing?

Stephen Hall (Vulcan): It really depends on the
company and the business model. Twine will be partially
monetized via advertising, but there will be other components. In
general, Vulcan quite likes the advertising space, which will be
doubling to
$40B with room still for growth. Semantic technologies that are
impacting ad quality is another interesting area.
There are companies in that space, for example
Dapper, that are leveraging semantics to deliver
more targeted, more contextual advertising.

Amanda Reed (Palomar): There are several business
models that we are seeing out there.
Most often, it is a web service or service offering. There are licenses
in the enterprises as well, of course.
In addition, we are seeing new models where services are packaged into
boxes and bundled with software.
This is
a hybrid model that seems to be interesting to many enterprises because
of the finer control. It is important for startups to figure out how to
match the value proposition to the business model.

Wednesday, May 14, 2008

Casual Games Get Ad-Driven Widget For All

The thumbnail on the left depicts a service that, if it fully
delivers as promised, has a decent chance to transform the web as
profoundly as AdSense. Launching today, it’s the NeoEdge Game Channel,
an ad-driven game widget from NeoEdge, the Mountain View startup we wrote about last November.
Its Game Channel is a kind of videogame jukebox offering a selection of
titles from several genres; when you click to play, NeoEdge’s
advertising feed kicks in as the game loads.

Here’s the thing that excites me most: Pretty much any web
owner (including bloggers) can install this plug-and-play widget on
their site, and share advertising revenue with NeoEdge. (Hence the
comparison to AdSense, only fun and interactive.) The social network
PerfSpot is using the Channel, so you can go here to get a sense of what it’s like.

For site owners, NeoEdge Marketing VP Ty Levine told me, “This is a
way of keeping people on your site.” It also gives them a new revenue
stream; a site with 200,000 unique users, Levine estimated, could earn
$1,000 to 5,000 a month, depending on the owner’s sponsorship deal and
revenue share. With some 400 titles in the NeoEdge library, the channel
can be customized with selections that fit a site’s demographics and

As with AdSense, the Game Channel widget gives site owners far and
wide an incentive to install it — and gives casual-game developers
reason to keep creating content for it. Whether NeoEdge can capture and
hold this market depends on its ability to deliver a diverse and
compelling library of games — and to stay ahead of its competitors.
With so many players rushing into the ad-driven casual game space, I
wouldn’t be surprised to see similar services, purporting to offer
better titles and/or revenue shares, launched by NeoEdge’s rivals. Let
the casual game wars begin!

Tuesday, May 13, 2008

BrandTags - Half Hot Or Not, Half Poetry - About Brands

Marketing consultant and web connoisseur Noah Brier has launched a simple but fascinating project called
The idea is that visitors are shown a logo, we respond with a word or
very short phrase that we associate with the corresponding brand and
then we're given the option to view all the "tags" given a brand in a
big tag cloud.

It's a simple but elegant and interesting experiment. The tag cloud
for Walmart, for example, shows that the word "evil" is pretty big -
but "cheap" is even bigger! We've embedded the site below in an iframe
if you want to try it out yourself.

Nice Touches

One of the nicest touches here is how Brier displays the tags in
oversized font. By requiring users to scroll down the page, we get to
enjoy thinking to ourselves "surely this is the largest tag for this
brand" - only to scroll on and find that another term is even more
frequently associated with that company!

One thing that would be nice would be to have comments be enabled at
the bottom of the tag cloud screens. That way people could explain to
those who don't know why, for example, the word "racist" is so large on
Tommy Hilfiger's page.

BrandTags may not be the kind of site that consumers regularly
return to, but it's fun to try out once. Obviously it's something that
companies would have a real interest in checking out, especially if it
takes off. Brier reports that it's recieved over 77,000 tags in the first weekend it was live.

We've got it in an iframe below, just because if iframes are good enough for Google Friend Connect
then gosh darn it, they're good enough for us too. Click through some
brands on just might find ours and get to offer a little

Metrics: Trouble in Online Adland

PubMatic, a Palo Alto, Calif.-based start-up focused on online just released its PubMatic AdPrice Index
based on data from over 3,000 publishers and billions of ad
impressions. The findings of this month’s report: the US economic
slowdown is beginning to impact online advertising in a big way, with
overall monetization dropping by 23% - 38 cents eCPM in March vs 49
cents eCPM in March. Not a big surprise since housing related
advertising was big on the web. Even electronics retailers are feeling
the pinch and cutting back.

* eCPMs for large Web sites (more than 100 million page views per
month) dropped dramatically by 52% from 38 cents in March to 18 cents
in April 2008.

* Medium Web sites (1 million to 100 million page views per month) were
nearly flat, with monetization dropping from 34 cents in March to 33
cents in April.

* Small Web sites managed to improve their monetization, increasing from $1.17 in March to $1.29 in April.

The overall trends you pick up from report are not that surprising.
For instance, the improved monetization of small websites because they
have more focused content presents more targeted advertising
opportunity. Again no surprise that Social Networking led the plunge
with monetization dropping 47% from 37 cents in March to 19 cents in
April, below January lows of 22 cents. Too much damn inventory.

Monday, May 12, 2008

Powerset Launches Showcase For User Search Experience

Today marks another milestone for San Francisco based contextual search engine Powerset.
They’ve launched a showcase for their user search experience -
effectively the search engine minus the web crawl. For now, Powerset
queries only Wikipedia and augments results with data from Freebase. The product launch comes just a day after reports that the company is being shopped to potential buyers by investment bank Allen & Co.

I have been able to test Powerset via their labs site for the last
few weeks. I wrote about it last month, and the version that just
launched is very similar.

There is no way to look at Powerset today and determine if it can be
as disruptive to search as Google was when it launched almost a decade
ago. That’s because it only queries Wikipedia, and so there is little
need for proper ranking algorithms to sort the good from the bad

But what user can see is how effective a way it is to gather
information quickly. For someone doing research, Powerset effectively
removes a number of steps towards getting to the final information. It
is particularly effective when the information needed is on many
different web pages.

For example, a query on Powerset of “when did earthquakes hit tokyo”
yields stunning results. Try this query at Google or even wikipedia to
compare - instead of just picking out keywords that are in your query
and on a web page, Powerset is actually making some sense of the
content included in the wikipedia pages:

The way that Powerset returns queries means that answers are often
found in the result snips, as above. They are also structuring a lot of
the Wikipedia and (and already structured Freebase) data and inserting
it into results. So a search for “Bill Clinton” shows results, but also
shows Freebase structured data along with additional query refinements
to get to more information. The important thing below isn’t the
structured data in the results, its the fact that you can click on the
action words and drill down into very specific queries (to find, for
example, what bills he signed, or which Supreme Court justices he
nominated, or who he slept with).

Powerset is indexing web pages much differently than normal search
engines, which generally just record content to match against keyword
queries. Instead, Powerset is trying to understand the content on the
page so that it can be matched meaningfully to queries later. Even
queries that don’t use matching words.

Indexing the web is expensive, though, and Powerset’s way of doing it requires even more time and computing power dedicated to a web page. That’s why they say they aren’t indexing the entire web yet - the company has raised just $12.5 million
(plus another $8 million or so in bridge loans from investors). To
index the web will require a new round of financing (see the first
paragraph above about their sale/financing efforts).

Powerset is has taken a lot of criticism for their goal of trying to redefine how people search the web (including from us).
But their lofty goals are what makes Silicon Valley so great - succeed
or fail, Powerset is trying to do something pretty spectacular.

Saturday, May 10, 2008

Decoding the VC Poker Face

There are a lot of different words that can be used to describe the
venture capital community and its relationship with entrepreneurs. Many
of them, however, cannot be printed. For example, I once heard a VC say
to an entrepreneur: “It would be easier to build a nuclear reactor at
[UC] Berkeley than to execute on this idea.” And I once heard an
entrepreneur say of a VC: “If I ever see that guy in a parking lot, I
will speed up to hit him.” You get the idea.

The Sand Hill Road crowd does have a reputation. In an unscientific
opinion poll, the collective sentiment was probably best described by a
friend of mine this way: “Let’s just say you probably don’t want to
grab a beer with a venture guy, or want your sister to marry one.”
Yikes, I am a VC. No one wants to have a beer with me? Where did this
rap come from? I think it all starts with the clumsy poker that gets
played out in pitch meetings.

VCs are trying to get big returns for
their limited partners. That’s all. If they can save the world or cure
cancer in the process, even better — but that’s not their goal.
Entrepreneurs, on the other hand, are trying to convert their dreams
into reality. We all have deeply “vested interests” and all these
intents converge in the pitch meeting, where everyone shows their
proverbial “poker face.” (According to Joe Navarro, a former FBI counterintelligence agent and author who specializes in decoding nonverbal communication, “double-thumbs” is the “tell” for a player happy with the cards he’s seeing.)

Pitch meetings go something like this: Entrepreneurs bound into a
conference room, show their PowerPoint deck, bear their souls, ask for
a few million dollars and leave, not quite knowing where they really
stand. And so they wait. And wait. And wait. Some receive the big
checks to get their company off the ground, but more often than not,
they wait only to be rejected. Worse, they never hear anything at all.
Big checks are rare, so this scene of deafening silence is played out a
hundred times a day in the venture world.

But from what I’ve observed on my end of the table, VCs can respond
to a pitch in one of three ways — each of which is fraught with peril:

  • Enthusiasm: If the VC is excited about the idea
    and the prospects for the company, the entrepreneur believes the money
    is sure to come. If the company is funded, hallelujah, but if the money
    doesn’t come, the entrepreneur feels betrayed and led on — pins in the
    proverbial VC voodoo doll.
  • Criticism: When a VC tries to make recommendations
    or give feedback, it can be like telling the entrepreneur the baby is
    ugly. Often there is a sense that the VC, who probably doesn’t know
    much about this business, just criticized the best idea since alligator
  • Stoicism: If the VC doesn’t say much or react at
    all, the assumption is that the VC didn’t pay attention and doesn’t
    care., prompting the entrepreneur to think, “What a waste of time,
    money and stress.”

It’s a quandary that every VC has to deal with. Other than handing
over a term sheet straight away, any response risks damaging the
VC-entrepreneur relationship. Can you blame us for sitting still and
saying little?

Even Warren Buffett once famously said: “When the phone don’t ring,
you’ll know it’s me.” Of course, even if the phone don’t ring right
away, it doesn’t mean we’re going to say no. But saying as little as
possible is still the most efficient, and benign, option we have —which
is why it’s the response most entrepreneurs get, most often.

So, in your next pitch meeting, expect the VC poker face. We might
appear indifferent, or stoic, but don’t read too much into our
immediate reactions. (Except maybe those double thumbs.) Like the old
saying goes, “Patience is a virtue.” And champagne gets better with
time. Meanwhile, I will be careful in parking lots.

Powerset’s Dilemma: Go For It, Or Sell

San Francisco based search startup Powerset will be launching shortly. For now, Powerset will query only Wikipedia and Freebase. But as I said when the product was demo’d to me a few weeks ago, it is compelling nonetheless: “When
I tested the service I had something very similar to the “Aha!” feeling
that ran through me the first time I ever used Google. In short, it is
an evolutionary, and possibly revolutionary, step forward in search.”

But now the company may have to make a hard decision: sell now to
one of the big Internet players looking for a point of differentiation
in search, or take the risk of going it alone and possibly getting a
huge, multi-billion dollar payoff down the road.

According to our sources, Powerset is exploring both options. They hired Dave Wehner, a Managing Director at investment bank Allen & Co. (he’s the guy who sold Bebo for $850 million to AOL, and is working on LinkedIn’s huge financing), to represent them in a possible sale or financing.

CNET is reporting
today that Microsoft may be bidding for the company. According to our
sources, those discussions have been going on for well over a month,
and their most recent bid is “around $100 million.”

That probably won’t be enough to convince Powerset and their investors
to sell. The big question is whether Google will step in to try and
keep Powerset out of Microsoft’s hands, and start a real bidding war.
That could drive the price significantly higher. Google, however, has publicly dismissed the notion of contextual search as a revolutionary step forward.

Whether that’s true or not is yet to be seen. But Powerset may find
itself as a valuable chess piece in the emerging search war between
Google and Microsoft. And if Google bets wrong, they could find their
commanding lead in search eroded over time. A relatively small
acquisition to keep Powerset of of Microsoft’s hands, even if just a
hedging move, may suddenly be attractive to them

Tuesday, May 6, 2008

i360 Adds Semantics to Everything

Tony Sukiennik believes the power of the
people trumps the power of the algorithm when it comes to the
development of semantic technology. His company, infoGenome,
a startup that has been in stealth mode for about four and half years,
wants to harness that power by making semantics easy via its innovative
drag-and-drop functionality. The i360 software he's developed is
essentially the "Mahalo of semantic apps," relying on human knowledge
to add meaningful layers of metadata to the information we work with
every day. With i360, you can add semantics to everything.

People-Powered Semantics

When you're doing a web search, you instantly know what information
is relevant and which isn't. At i360, they call this flash of
understanding an "instant of information insight." In
a split second you can identify something as being useful, but the
problem in today's world is that there are too many ways to store that
information - you can tag it, bookmark it, save it to file, email it,
blog about it, share it with others, and so on. Overwhelmed by choices,
busy people often choose to "just remember it," a decision that leads
to the inevitable: forgetting. The human mind is already
overloaded with input, so isn't the ideal repository for storing all
the complexities of our information-filled lives.

Instead, software should be doing the remembering for us. That's
where i360 comes in. The application itself, self-funded but seeded by Bill Campbell,
Google advisor and chairman of Intuit, is really just a prototype of
this conceptual idea, but one that Tony hopes Google might be
interested in. Or maybe Microsoft. (He plans on proposing his ideas to
both companies to see who bites.)

What the i360 software does is provide a way quickly add mark up and
add meaning to the data you're working with - be it a link on the web,
an email, a file, or anything - with semantics. This process is done
via a quick drag-and-drop into the app.

That isn't to say that this technology is using semantics in the
technical sense of the word - it's not about converting everything into
machine-readable formats for use on the semantic web; what it is doing,
though, is adding semantics to everything by
assigning meaning to that email, that PDF, that link, that note, that
spreadsheet, etc. Meaning that only you, and not a computer or an
algorithm, could know. In doing so, the technology is not focused on a
semantic web per se, but a semantic database of your own, made up of
not only web links, but also files, contacts, emails, keywords, and
more, and knowing how they all are associated with each other.

Although Tony believes that we shouldn't give up on the algorithm -
by all means, research should continue in that area - he feels strongly
that his technology, which taps into the power of the human brain,
gives people the ability to organize and assign value to information in
a way that a machine cannot.

How It Works

What i360 does is complex and sort of hard to understand if you're
not working with it directly. In fact, it's easier to understand if you
work backwards from the end result of using the technology.

For example, imagine you do a Google Desktop Search or a Google
Enterprise Search, and, instead of just links to items that match
keywords, you get something a little more like this:

Augmented Search Results

You can see that by using the software, you've managed to associate people, documents, notes, and more with the original file.

The process of making these associations is via a "fire and move on"
drag-and-drop methodology. See a useful link? Drag-and-drop it into
i360. Highlight some text and drag and drop that as the item's
description. Click a button and a screenshot is added automatically.
Now associate that link with a person. That person with a Word
document. That document with a search and an email...and so forth, and
so on.

Saving a Web Page

Within a company, the i360 technology can also be used to work with
internally running applications, like Microsoft's SharePoint, for
example...or any other application to which you have the cooperation of
the vendor or access to the app's code base. With 100 lines of code,
information from these applications can pass data from the app itself
back to the i360 environment as just another informational nugget that
can be associated with a person, a file, or anything else.

There's more this application can do, too. For example, searches
themselves could begin in a more structured format - focusing on just
what you're interested in finding (see example below). Each item you're
researching can be available with one click from a sidebar - no saving
to required.

Focused Searching

The results of your searches can then be transformed into a new file
with links (see below), retaining the same structure of your own
headings and listed items, and that file can then be emailed to someone
else or published as a page available publicly on the web. If you find
something new to add to it, be it another link or a file or anything
else, you can just drag-and-drop that new item to i360 to update the
results on the fly.

Formatted Results Can Be Shared With Others

A project team in the workplace could use the application together,
associating people and emails and files and searches with each other,
creating a database of content surrounding their project. A year later,
an employee in another department could search via their company's
enterprise search and find all the information in that project and how
it all interrelates, even if all the original team members had moved on
to other jobs in other companies. No more would "everything is stored
in that one guy's head" be the norm. Employees could move on, but the
data they created or found, and the way that data relates to other
data, would remain.

Where It Needs Improvement

As a concept - simple drag-and-drop semantics - the technology is
fascinating. In practice though, it's still very rough. You couldn't
install i360 and be off and running in minutes - you would still need
training to know how to use it as it exists in its present form. It
today's world of bubbly web apps, anything that isn't immediately
intuitive isn't going to be adopted by the majority of users. The whole
Enterprise 2.0 trend is about bringing the simplicity of consumer
applications into the corporate world, and, although that is this
software's goal, unfortunately, I can't say that it achieves it.

The UI itself is confusing. They've made some interesting choices -
the address bar is at the bottom, for example; buttons are labeled with
things like "E+" - a reference to the name of a portion of the software
suite, but one that is meaningless to the new user. The graphics and
fonts used look ancient.

The UI


However, that being said, if you can look past the UI to the
underlying idea, there's something about this concept - human-powered
semantics - semantics over everything - that could be great, if someone
could just make it pretty. It could even be the future.

Tagging Goes Semantic With Zigtag

Bookmarking and tagging websites can be a messy business. Zigtag,
a new sidebar-based plugin currently in private beta, is looking to
offer clean and streamlined bookmarking and tagging. The plugin
differentiates itself from the multitude of other tagging services by
introducing a semantic dictionary of over two million tags. The basic
idea: each tag will be defined, and that synonymous tags (say, New York
City and Big Apple) will be linked together automatically. That should
make finding your bookmarks easier later on.

After entering an appropriate tag for a page, the user is presented
with a list of matching keywords, each of which has been defined in
Zigtag’s database. For example, after entering “Apple” into the search
field, I was able to choose from “the computer company”, “the pomaceous
fruit”, and “the record company”, among others. The process is painless
and the integrated dictionary is fairly comprehensive. If you happen to
stumble across a term that isn’t defined, you can easily request to
have it added to the dictionary (and can place your own temporary tag).

Besides the tagging functionality, Zigtag also offers a Digg-like
thumbs up/down system, which influences a list of popular bookmarked
sites on the Zigtag homepage. The site also has some basic social
networking features, allowing for group-specific privacy settings and
sharing with friends. There are a number of other handy features,
including “Share Page” that lets you send snippets of images and text
on a page to friends through email.

My experience with Zigtag was promising, but the plugin still needs
some work. Using the sidebar can be pretty unintuitive, especially when
you’re searching for something using multiple tags. And many of the
synonyms I tried weren’t in the database yet (No mention of Bruce
Springsteen for “The Boss”).

Zigtag’s biggest obstacle is the slew of other social bookmarking sites already available (Delicious, Diigo, and Twine,
to name a few). The semantic tagging feature is fairly unique, but its
appeal is still untested, especially against automated semantic taggers
like Twine. Frankly, a lot of people are just going to stick with the
simple but effective Delicious interface.

