Wednesday, May 28, 2008

Freebase: Dispelling The Skepticism

Freebase the first product of semantic web company Metaweb, is an open, semantically marked up database of information that we called one of the "10 semantic apps to watch" last year. With $57.4 million in funding, a smart team, and a tech legend in Danny Hillis at the helm, Metaweb is considered to be one of the most serious players in the Semantic Web space. Yet the company's
efforts to date have been met with skepticism. Particularly, people have asked how is Freebase different to Wikipedia? Jamie Taylor, the Minister of Information at Metaweb, spoke at the SemTech 2008 Conference that took place in San Jose last week in an effort to dispel some of that skepticism.




What is Freebase?



Jamie has an interesting title: Minister of Information, and his
primary responsibility is to seed Freebase with information and ensure
the quality of the data. According to Jamie, Freebase is "open shared
database of the world's knowledge."
This sounds the same as Wikipedia, but it is really quite different,
because at the heart of Freebase are the ideas of semantics and
openness via API.



Unlike Wikipedia, which is a free form database, Freebase is structured, where concepts and relationships
are interlinked into a gigantic network or graph. Another important difference is that Freebase is all about its API.
Any information contained inside the database is accessible and can be retrieved via queries. In addition, the data
in Freebase is under a Creative Commons license - meaning that is readily exportable and useful by others.




When it comes to defining the meanings of things, Freebase is focused on community, with collective editing, attribution,
and collaboratively built semantics. This last point is quite crucial - the founders of Freebase believe that meaning
has to emerge from the collaboration between users. As such, Freebase is one of the first experiments of web-scale
social contracts. The site is really focused on the notion that information is not encumbered by licenses and is free to use.



What is in Freebase Today?



Data comes into Freebase from many sources: Wikipedia, Flickr, the
US Department of Commerce, Music Brainz, the USGS,
SFMOMA, the US Exchange Commission, Chef Moz, and many other places.
Right now the information is mostly about people and places, but the
system
is engineered to have a wide range of data types. As an example of
"People" information, there
is a lot of information in Freebase about artists along with their
artwork and place in history.
More esoteric types of information you might find in the database
include airplanes, french cheese, tropical storms in the 90s,
oil companies, and candies.





Freebase also contains lots of other kinds of data and has:



  • 3.4 Million Subjects
  • 750K People
  • 450K Locations
  • 50K Companies
  • 40K Movies
  • ... Over 1K Data Types with over 3K Properties


Data Representation in Freebase



While Freebase certainly has long way to go before it can claim
completeness of information,
its core idea of object representation and linking seems very solid.
Each object in Freebase is unique.
As more information comes into the system about an object, more links
are created about it in the system.
It is particularly interesting how Freebase establishes object identity
and decides that two concepts (or subjects) are the same.






The diagram above illustrates the idea. When a new source of information is added to Freebase, it is parsed into
entities and facts. The new information is then cleaned up and is merged with the existing system. But
the merge only occurs if the system determines that the two bits of information are really about the same subject (in
this case Leonardo Da Vinci). This is a powerful approach which allows Freebase to grow the knowledge around individual
subjects. What is also interesting is that Freebase allows human editing to reconcile situations when the system
is unable to automatically link the two concepts together.



Each permanent object in the system has a GUID - a unique
identifier, something like this: #9202a8c040000064.....
The identifier can be used to refer to the object via URL and via
queries. In addition to the GUID, there are other
ways to refer to the object, for example,
http://www.freebase.com/view/en/leonardo_da_vinci. Beyond that, there
are even other
aliases, for example, you can refer to a public company by its stock
ticker symbol. But regardless of the reference, the key point is that
you end up with the same, unique node in the system.




Freebase also has the ability to create new domains and types that describe new concepts, for example, science fiction movies.
There is a way to attach new data types to the existing domains, and then these types can be shared and used by other users.
The idea is that you can model things with the fine grained resolution that you need and then you can invite people
to help you refine and evolve your models. An example is the motorcycle community, which evolved out of an effort led by
one guy and who was then joined by others, and has since been promoted to the top level. The community process
is about merging private types to build common models.



What Can You Do With Freebase?



Freebase is not a formal system, it is not a reasoning engine, it is just a knowledge repository, a database.
To query Freebase you use the Metaweb Query Language (MQL), which is based on JSON. The language is meant to be very simple
and it is actually very interesting as well. The idea is that you fill out a tree which represents a partial
graph with pieces that you know and then the system basically fills in all the slots that you left blank
and delivers back all possible subgraphs.



For example, say you are watching a movie and you can't tell what it
is.
You know that the movie stars Patrick Swayze and an actress who was
also in "Tank Girl." So you create a movie query and express all these
facts, using JSON-style syntax.
And when you run the query you get back that the actress is Lory Petty
and the movie
is "Point Break" and you also get links to IMDB. So the query and the
results have the
same structure and to find matches you simply traverse the set of
results that is returned.



Building on this example, Freebase is really meant for complex inferencing queries, the sorts
of questions that Google has no way of answering using its statistical frequency algorithms.
For example, what US senators took money from a foreign entity? Turns out that both Barak
Obama and Hillary Clinton received donations from UBS AG, based in Switzerland. That is a complex
inferencing query that needs to be expressed in a query language before it can be answered
and so questions of this nature are outside of the reach of any search engine -- and Wikipedia too, for that matter.



Resources



There is quite a lot of activity going on around Freebase today.
Many enthusiasts are building small proof of concept applications
showcasing what can be done
in the future with this powerful database. You can stay on top of the
cutting edge stuff coming both from the Freebase team and community at:
http://download.freebase.com and
http://research.freebase.com


Article Link


Friday, May 23, 2008

You Play a Game, Computers Get Smarter, AI Starts to Work

Last week a new site called Gwap was launched by Carnegie Mellon's School of Computer Science.
The site offers an array of multi-player games that have a benefit
beyond just that of momentary distraction or amusement. These games are
helping improve image and audio searches, teaching computers to see,
and enhancing AI. However, all that won't matter to the players
because, as it turns out, these games are actually fun.




About Gwap



Nicholas Carr blogged about Gwap
a couple of days after its launch, noting that "one thing the Internet
enables, which wasn't possible before, at least not on anywhere near
the same scale, is the transfer of human intelligence into machine
intelligence." In Gwap, which stands
for "Games With a Purpose," that transfer of intelligence is done by
getting people to do the routine chores that computers don't know how
to do - chores like tagging photos, describing songs, and outlining
objects, as well as transferring a good bit of human common sense to
the machine. The trick to getting people to do these things is to make
the work fun. Hence the games.



The creator of these games is Luis von Ahn, winner of a 2006
MacArthur Foundation "genius grant" and a pioneer in the field of human
computation. Ahn is most notable for helping to develop CAPTCHAs
(Completely Automated Public Turing Test to Tell Computers and Humans
Apart), those somewhat annoying but rather effective distorted letter
puzzles used millions of times each day. Last year, he also introduced
the "reCAPTCHA," where CAPTCHAs were used to gain access to a web site while also helping digitize old books.



Gwap homepage



The Games



Gwap currently features five games, one of which is an old classic called the ESP Game.
In the ESP game, two players view the same image and try to guess words
that the other player would use to describe it. Google licensed this
technology and launched Google Image Labeler to help improve the quality of their image search results.



The four new games include:



Matchin, a
game in which players judge which of two images is more appealing, is
designed to eventually enable image searches to rank images based on
which ones look the best.
Tag a Tune,
in which players describe songs so that computers can search for music
other than by title - such as happy songs or love songs.
Verbosity, a test of common sense knowledge that will amass facts for use by artificial intelligence programs.
Squigl,
a game in which players trace the outlines of objects in photographs to
help teach computers to more readily recognize objects.

According to the Carnegie Mellon announcement, von Ahn plans to add a lot of games to the site, saying "we have three more that we'll be launching in the coming months."
He hopes that by having all the games on the same site it will
encourage players to try several different ones. Players also have a
single sign-on and password, Top Player rankings, and online chats,
said von Ahn.

The Human Processor



In his whitepaper entitled "Invisible Computing," von Ahn compared game design to to algorithm creation, saying:

"...it must be proven correct, its efficiency can be
analyzed, a more efficient version can supersede a less efficient one,
and so on. Instead of using a silicon processor, these "algorithms" run
on a processor consisting of ordinary humans interacting with computers
over the Internet."


In other words, we're the processor. The machine is us.



This concept isn't entirely new - Amazon's Mechanical Turk,
for example, pays people to contribute their time to work on small,
simple tasks called "Human Intelligence Tasks," or HITs. However,
unlike HITs, which can sometimes be boring or tedious, the games on
Gawp are actually fun - and they don't feel like work.



Some believe that human powered processing is the next big wave for computing. You could argue that Mahalo, the human-powered search engine is an example of this. (Though others call it a human-powered link farm.) Perhaps a better example is ChaCha,
the mobile Q&A service that uses human guides to respond to
questions called or texted in from your cell phone. We've also covered
other human-powered services on RWW in the past, like the Galaxy Zoo
and Stardust@Home project, among other (our coverage here). Many of these efforts have tried to incorporate an element of "fun" into what is actually work.



Whether Gwap will actually gain
momentum and get a large number of people involved is yet to be seen,
but it is definitely has potential to help teach computers the things
they can't do for themselves....yet.


Article Link

Thursday, May 22, 2008

Goldman Internet: Google: New Sources Of Ad Growth; Mobile Complements Desktop; No Cash-Back Plans

Google’s (NSDQ: GOOG)
growth rate and all that entails—click through rates, query growth, ad
quality, etc.—continues to be the subject of a lot of disagreement and
debate. Q1 turned out solid, but there’s not going to be a letup in the
key question facing the company: are its heady growth days done for?
Speaking at the Goldman Sachs Ninth Annual Internet Conference, Nick
Fox, Director of Business Product Management at Google tried to to pull
back the thicket: “There are significant opportunities across the
board. “He ticked off four drivers of revenue: query volume, ads per
query, quality, price per lead, and noted for example that many queries
still don’t have ads running against them: ”A small portion of our queries have ads… we see that as a pretty significant opportunity.”
The challenge is to convince advertisers that they could be making
money by bidding on more keywords than they are. Other opportunities
include helping companies improve the quality of their landing pages
(so they get more value from their ads) and rethinking the ads that go
next to each query. Fox’s example: Amazon.com (NSDQ: AMZN) ads for the book Harry Potter don’t generate many clickthroughs because searchers often are looking for something other than the book itself.



-- Social networks: “What I would say on social networks…
the whole industry has been surprised at the difficulty of monetizing
social networks.” “We found it more challenging than we expected it
would be.” The difference: “On search you have this amazing thing: the
query… on a social network, you don’t really know (what the user is
looking for).” Other problems: users are doing a lot of non-commercial
things on social networks (he mentioned throwing sheep and playing
Scrabble). Those who are making money: “scummy things” like pyramid
schemes and things that trick users into downloading ringtones.



-- Mobile: “Mobile is much more similar to search than a
social network is… mobile actually monetizes quite well.” New devices
are key: iPhone users search at a rate of 50x normal users. Also, on
mobile, there’s high volume on weekends and days when volume is low on
desktop. So the mobile business complements the core business.



-- DoubleClick: Neal Mohan, Director of Product Management
for Ad Serving Platforms, discussed how DoubleClick (Mohan came in via
the acquisition) would complement the core business. The main goal:
more choice. Through the integration, Google can help publishers figure
out how best to monetize a page, whether it’s through text or display…
though the primary focus is on bringing more monetization choices to
publishers on the display side.



-- Microsoft’s (NSDQ: MSFT) cash back: Fox: “A lot of demand from advertisers to advertise on a CPA basis.” For consumers: ”No
plans to pay users to use our products… our fundamdental belief is that
we should compete by building a great user experience.... that’s the
focus, rather than paying users nickels and dimes.



-- User targeting: Not focused on building up profiles of
users for better ad serving. But there are opportunities to improve
yield by looking at query patterns. For example: If you search for
‘Italy Vacations’ and then search ‘weather’ you can improve yield on
the latter search by taking into account the former.



-- Google’s apex: A questioner (representing investors)
wanted to know about the talent issues and whether we’d seen Google’s
corporate apex (arguably the biggest question there is): “We’ve seen
extremely low turnover and churn in employees… they tend to get a lot
of attention cause it’s Google.” Reality: level of churn is actually
low. Mohan: No key DoubleClick employees have left.


Article Link

SemTech Panel: Investor Opportunities and Pitfalls

Eghosa Omoigui (Intel): Intel Capital is the investment arm of the Intel Corporation,
and it has $3 billion under management. 93 invest professionals are working with 422 companies on deals ranging
in size from seed stage investments to hundreds of millions of dollars. The most recent investment was in Endeca, which specializes in content aggregation and management for enterprises. We have a lot of interest in the
semantic space and have been following it for over five years. There are several deals that are currently pending
that I can't name. One is doing dynamic, large-scale ontologies - taking content and overlaying dynamic
ontologies
. The company is based out of Asia and will be announced soon.

Question: How do you expect to make money on the Semantic Web? What are the monetization strategies that you are seeing?



Stephen Hall (Vulcan): It really depends on the
company and the business model. Twine will be partially
monetized via advertising, but there will be other components. In
general, Vulcan quite likes the advertising space, which will be
doubling to
$40B with room still for growth. Semantic technologies that are
impacting ad quality is another interesting area.
There are companies in that space, for example
Dapper, that are leveraging semantics to deliver
more targeted, more contextual advertising.

Amanda Reed (Palomar): There are several business
models that we are seeing out there.
Most often, it is a web service or service offering. There are licenses
in the enterprises as well, of course.
In addition, we are seeing new models where services are packaged into
boxes and bundled with software.
This is
a hybrid model that seems to be interesting to many enterprises because
of the finer control. It is important for startups to figure out how to
match the value proposition to the business model.

Article Link

Wednesday, May 14, 2008

Casual Games Get Ad-Driven Widget For All

The thumbnail on the left depicts a service that, if it fully
delivers as promised, has a decent chance to transform the web as
profoundly as AdSense. Launching today, it’s the NeoEdge Game Channel,
an ad-driven game widget from NeoEdge, the Mountain View startup we wrote about last November.
Its Game Channel is a kind of videogame jukebox offering a selection of
titles from several genres; when you click to play, NeoEdge’s
advertising feed kicks in as the game loads.

Here’s the thing that excites me most: Pretty much any web
owner (including bloggers) can install this plug-and-play widget on
their site, and share advertising revenue with NeoEdge. (Hence the
comparison to AdSense, only fun and interactive.) The social network
PerfSpot is using the Channel, so you can go here to get a sense of what it’s like.


For site owners, NeoEdge Marketing VP Ty Levine told me, “This is a
way of keeping people on your site.” It also gives them a new revenue
stream; a site with 200,000 unique users, Levine estimated, could earn
$1,000 to 5,000 a month, depending on the owner’s sponsorship deal and
revenue share. With some 400 titles in the NeoEdge library, the channel
can be customized with selections that fit a site’s demographics and
branding.


As with AdSense, the Game Channel widget gives site owners far and
wide an incentive to install it — and gives casual-game developers
reason to keep creating content for it. Whether NeoEdge can capture and
hold this market depends on its ability to deliver a diverse and
compelling library of games — and to stay ahead of its competitors.
With so many players rushing into the ad-driven casual game space, I
wouldn’t be surprised to see similar services, purporting to offer
better titles and/or revenue shares, launched by NeoEdge’s rivals. Let
the casual game wars begin!


Article Link

Tuesday, May 13, 2008

BrandTags - Half Hot Or Not, Half Poetry - About Brands

Marketing consultant and web connoisseur Noah Brier has launched a simple but fascinating project called BrandTags.net.
The idea is that visitors are shown a logo, we respond with a word or
very short phrase that we associate with the corresponding brand and
then we're given the option to view all the "tags" given a brand in a
big tag cloud.

It's a simple but elegant and interesting experiment. The tag cloud
for Walmart, for example, shows that the word "evil" is pretty big -
but "cheap" is even bigger! We've embedded the site below in an iframe
if you want to try it out yourself.






Nice Touches



One of the nicest touches here is how Brier displays the tags in
oversized font. By requiring users to scroll down the page, we get to
enjoy thinking to ourselves "surely this is the largest tag for this
brand" - only to scroll on and find that another term is even more
frequently associated with that company!



One thing that would be nice would be to have comments be enabled at
the bottom of the tag cloud screens. That way people could explain to
those who don't know why, for example, the word "racist" is so large on
Tommy Hilfiger's page.



BrandTags may not be the kind of site that consumers regularly
return to, but it's fun to try out once. Obviously it's something that
companies would have a real interest in checking out, especially if it
takes off. Brier reports that it's recieved over 77,000 tags in the first weekend it was live.



We've got it in an iframe below, just because if iframes are good enough for Google Friend Connect
then gosh darn it, they're good enough for us too. Click through some
brands on there...you just might find ours and get to offer a little
feedback!

Article Link

Metrics: Trouble in Online Adland

PubMatic, a Palo Alto, Calif.-based start-up focused on online just released its PubMatic AdPrice Index
based on data from over 3,000 publishers and billions of ad
impressions. The findings of this month’s report: the US economic
slowdown is beginning to impact online advertising in a big way, with
overall monetization dropping by 23% - 38 cents eCPM in March vs 49
cents eCPM in March. Not a big surprise since housing related
advertising was big on the web. Even electronics retailers are feeling
the pinch and cutting back.


* eCPMs for large Web sites (more than 100 million page views per
month) dropped dramatically by 52% from 38 cents in March to 18 cents
in April 2008.

* Medium Web sites (1 million to 100 million page views per month) were
nearly flat, with monetization dropping from 34 cents in March to 33
cents in April.

* Small Web sites managed to improve their monetization, increasing from $1.17 in March to $1.29 in April.


The overall trends you pick up from report are not that surprising.
For instance, the improved monetization of small websites because they
have more focused content presents more targeted advertising
opportunity. Again no surprise that Social Networking led the plunge
with monetization dropping 47% from 37 cents in March to 19 cents in
April, below January lows of 22 cents. Too much damn inventory.

Article Link