Sunday, October 19, 2014

Google Scholar: Making the World’s Problem Solvers 10% More Efficient

Anurag Acharya is the key inventor of Google Scholar, but the real origin of the project lies in his college years at the Khargpur campus of the Indian Institute of Technology. The IIT is India’s version of MIT and Stanford combined, and has produced a long list of now-celebrated engineers and executives at Internet companies here and abroad. But even in that elite school, it was difficult for students to get hold of relevant scholarly materials. For Indian high schoolers, it was nearly impossible. “If you knew the information existed, you would write letters,” he says, “That’s what I did. Roughly half of the people would send you something, maybe a reprint. But if you didn’t know the information was there, there was nothing you could do about it.” Acharya was haunted by the realization that the great minds were deprived of inspiration, and the wonderful works that did have the impact they would have because of their limited distribution.

The eventual solution to this problem would be Google Scholar, which celebrates its tenth anniversary this November. Some people have never heard of this service, which treats publications from scholarly and professional journals as a separate corpus and makes it easy to find otherwise elusive information. Others have seen it occasionally when a result pops up on their search activity, and may even know enough to use it for a specific task, like digging into medical journals to gather information on a specific ailment. But for a significant and extremely impactful slice of the population: researchers, scientists, academics, lawyers, and students training in those fields — Scholar is a vital part of online existence, a lifeline to critical information, and an indispensable means of getting their work exposed to those who most need it. (...)

Google Scholar was revolutionary for a number of reasons. Acharya and his team worked hard to get academic publishers to allow Google to crawl their journals. Since many of the articles unearthed by Scholar were locked behind paywalls, simply locating something in a search would not mean that a user could read it. But he or she would know that it existed, and that makes a tremendous difference. (Imagine setting off on a research project and finding out months later that someone had done the same work.) Google also pushed the paywall publishers to allow users to see abstracts of the work. The world’s biggest online archive of journal articles, JSTOR, offered only scans of articles, and had no way to separate the abstract from the whole piece. (Those accessing JSTOR through subscribing institutions could see full text.) So Scholar convinced JSTOR to provide its users to see the first scanned page of the article for free. “Often the first page has the abstract, or in older articles you have the introduction,” says Acharya, whose job title at Google is Distinguished Engineer. “That at least allows you to get a sense of it so you can decide whether you should put in additional effort.” Google Scholar will then provide the information that will help users get the complete text, whether online for free, downloaded for a fee, or in a nearby library.

(All Google users benefited from all that newly crawled information, too, as the company included those articles and books in its general search index.)

At launch, Google Scholar won wide acclaim, even from those generally skeptical about the company. Two well known library scientists, Shirl Kennedy and Gary Price wrote, “When big announcements come from Google and web engines, we often get nervous…. Not this time, however. This is BIG news and something that should have been around for years.” (There was some criticism, though. One complaint was that Google Scholar had no API to allow other services to access it. Others said that since Google didn’t share information like its ranking algorithm and all its sources, it fell short of a “scholarly” standard.)

Some in the research community favorably contrasted it to Google’s more controversial Book Search, which was launched at the same time. Scholar avoided the sort of copyright controversy that Book Search generated, despite the fact scholarly publishing world is a war zone, with an increasing number of academics lodging protests against powerful publishers who control the major journals. This is a conflict pitting profit against public good. It was the principle of open research that led Internet activist Aaron Swartz to download a corpus of JSTOR documents legally provided to MIT; the government prosecution of that act ended only with Swartz’s suicide. Google Scholar does not officially take a stand on the issue, but its implicit philosophy seems to endorse an egalitarian spread of information. In any case, when possible, Scholar tries to help negotiate around paywalls for non-subscribers by linking to articles in multiple locations — often, authors of paywalled works have free versions on their personal websites. (...)

Only at Google, of course, would the world’s most popular scholarly search service be seen as a relative backwater. Acharya isn’t permitted to reveal how big Scholar’s index is, though he does note that it’s an order of magnitude bigger than when it started. He can also say, “It’s pretty much everything — every major to medium size publisher in the world, scholarly books, patents, judicial opinions, small, most small journals…. It would take work to find something that’s not indexed.” (One serious estimate places the index at 160 million documents as of May 2014.) But like it or not, the niche reality was reinforced after Larry Page took over as CEO in 2011, and adopted an approach of “more wood behind fewer arrows.” Scholar was not discarded — it still commands huge respect at Google which, after all, is largely populated by former academics—but clearly shunted to the back end of the quiver. Not only was Scholar missing from the list of top services (Image Search, News, etc.) but bumped from the menu promising “more” services like Gmail and Calendar. Its new place was a menu labeled “even more.”

Asked who informed him of what many referred to as Scholar’s “demotion,” Acharya says, “I don’t think they told me.” But he says that the lower profile isn’t a problem, because those who do use Scholar have no problem finding it. “If I had seen a drop in usage, I would worry tremendously,” he says. “There was no drop in usage. I also would have felt bad if I had been asked to give up resources, but we have always grown in both machine and people resources. I don’t feel demoted at all.”

by Steven Levy, Medium/Backchannel |  Read more:
Image: Talia Herman