Tuesday, November 22, 2011

The News Forecast


The 20 employees of Recorded Future aren't foreign-policy experts. They aren't traders either, but if you'd started using Recorded Future's predictions to buy US stocks on January 1, 2009, you would have made an annual return of 56.69 per cent. (The S&P 500 had an annualised return of 17.22 per cent over the same period.) Between May 13 and August 5 this year, as markets behaved with vertiginous abandon, their strategy returned 10.4 per cent; in contrast, the S&P 500 lost 9.9 per cent of its value. They're data experts: computer scientists, statisticians and experts in linguistics. And in the data, they think, lies the future.

All Recorded Future's predictions, whatever the field, are based on publicly available information -- news articles, government sites, financial reports, tweets -- fed into the company's own algorithms. The result, it claims, is a "new tool that allows you to visualise the future" -- one that is changing how government intelligence agencies gather information and how giant hedge funds place bets. On its website, Recorded Future states: "We don't grant interviews and we don't issue press releases." But behind closed doors, the company is developing the technology that has been described be one tech blog as an "information weapon".

The company, cofounded by Christopher Ahlberg, an entrepreneur who sold his first business for $195 million and served in the Swedish special forces, has $8.5 million in funding. Its first two investors were Google and the CIA. Recorded Future counts US government agencies, banks and hedge funds among the clients paying million-dollar contracts. But its true ambition is to organise all the data on the internet for similar predictive analysis -- to make the future calculable.  (...)

The first generation of search engines, such as Lycos and Alta Vista, used traditional text search to deliver web pages, deploying their own algorithms, but essentially looking at individual documents in isolation. Google changed this in 1998. Its PageRank algorithm analysed the links between web pages, promoting those that had more links pointing to them from other sites. Recorded Future is part of the third generation: instead of explicit link analysis, it examines implicit links -- what it calls "invisible links" between documents that refer to the same entities or events. It does this by separating the documents and their content from what they talk about, identifying canonical entities and events that exist outside of the article.

"What matters is that it's freaking complicated," says Ahlberg. In practice, Recorded Future harvests 25,000 data sources as RSS feeds, which could include Companies House and US Securities and Exchange Commission filings, a New York Times article, Twitter and Facebook posts, obscure blogs (there's one on Norwegian salmon fishing) or transcripts from earnings calls or political speeches -- "just a flood of stuff", says Ahlberg. It does the same for Chinese and Arabic sources. "Then we look for entities -- people, places, technologies; and events -- a murder, a bomb explosion, a person moving from A to B, product launches."

by Tom Cheshire,Wired  |  Read more:
Photo: Natalie Lees