Duck Soup: The Web Gets Smarter

Thursday, May 24, 2012

The Web Gets Smarter

Last Wednesday, with relatively little fanfare, Google introduced a new technology called Google Knowledge Graph. Type in “François Hollande,” and you are offered a capsule history (with links) to his children, partner, birthday, education, and so forth. In the short-term, Knowledge Graph will not make a big difference in your world—you might get much the same information by visiting Hollande’s Wikipedia page, and a lot of people might still prefer to ask their friends. But what’s under the hood represents a significant change in engineering for the world’s largest search-engine company. And more than that, in a decade or two, scientists and journalists may well look back at this moment as the dividing line between machines that dredged massive amounts of data—with no clue what that data meant—and machines that started to think, just a little bit, like people.

Since its beginning, Google has used brute force as its main strategy to organize the Internet’s knowledge, and not without reason. Google has one of the largest collections of computers in the world, wired up in parallel, housing some of the largest databases in the world. Your search queries can be answered so quickly because they are outsourced to immense data farms, which then draw upon enormous amounts of precompiled data, accumulated every second by millions of virtual Google “spiders” that crawl the Web. In many ways, Google’s operation has been reminiscent of I.B.M.’s Deep Blue chess-playing machine, which conquered all human challengers not by playing smarter but by computing faster. Deep Blue won through brute force and not by thinking like humans do. The computer was all power, no finesse. (...)

For the last decade, most work in artificial intelligence has been dominated by approaches similar to Google’s: bigger and faster machines with larger and larger databases. Alas, no matter how capacious your database is, the world is complicated, and data dredging alone is not enough. Deep Blue may have conquered the chess world, but humans can still trounce computers in the ancient game of Go, which has a larger board and more possible moves. Even in a Web search, Google’s bread and butter, brute force is defeated often, and annoyingly, by the problem of homonyms. The word “Boston,” for instance, can refer to a city in Massachusetts or to a band; “Paris” can refer to the city or to an exhibitionist socialite.

To deal with the “Paris” problem, Google Knowledge Search revives an idea first developed in the nineteen-fifties and sixties, known as semantic networks, that was a first guess at how the human mind might encode information in the brain. In place of simple associations between words, these networks encode relationships between unique entities. Paris the place and Paris the person get different unique I.D.s—sort of like bar codes or Social Security numbers—and simple associations are replaced by (or supplemented by) annotated taxonomies that encode relationships between entities. So, “Paris1” (the city) is connected to the Eiffel tower by a “contains” relationship, while “Paris2” (the person) is connected to various reality shows by a “cancelled” relationship. As all the places, persons, and relationships get connected to each other, these networks start to resemble vast spiderwebs. In essence, Google is now attempting to reshape the Internet and provide its spiders with a smarter Web to crawl.

by Gary Marcus, The New Yorker | Read more:
Illustration by Arnold Roth