But earlier this year an artificial intelligence program called AlphaFold, developed by the Google-owned company DeepMind, predicted the 3-D structures of almost every known protein—about 200 million in all. DeepMind CEO Demis Hassabis and senior staff research scientist John Jumper were jointly awarded this year’s $3-million Breakthrough Prize in Life Sciences for the achievement, which opens the door for applications that range from expanding our understanding of basic molecular biology to accelerating drug development.
DeepMind developed AlphaFold soon after its AlphaGo AI made headlines in 2016 by beating world Go champion Lee Sedol at the game. But the goal was always to develop AI that could tackle important problems in science, Hassabis says. DeepMind has made the structures of proteins from nearly every species for which amino acid sequences exist freely available in a public database.
Scientific American spoke with Hassabis about developing AlphaFold, some of its most exciting potential applications and the ethical considerations of highly sophisticated AI.
[An edited transcript of the interview follows.]
Why did you decide to create AlphaFold, and how did you get to the point where it can now fold practically every known protein?
We pretty much started the project roughly the day after we came back from the AlphaGo match in Seoul, where we beat Lee Sedol, the world [Go] champion. I was talking to Dave Silver, the project lead on AlphaGo, and we were discussing “What’s the next big project that DeepMind should do?” I was feeling like it was time to tackle something really hard in science because we had just solved more or less the pinnacle of games AI. I wanted to finally apply the AI to real-world domains. That’s always been the mission of DeepMind: to develop general-purpose algorithms that could be applied really generally across many, many problems. We started off with games because it was really efficient to develop things and test things out in games for various reasons. But ultimately, that was never the end goal. The end goal was [to develop] things like AlphaFold.
It’s been a mammoth project—about five or six years’ worth of work before CASP14 [Critical Assessment of Structure Prediction, a protein-folding competition]. We had an earlier version at the CASP13 competition, and that was AlphaFold 1. That was state of the art, you know, a good deal better than anyone had done before and I think one of the first times that machine learning had been used as the core component of a system to try and crack this problem. That gave us the confidence to push it even further. We had to reengineer things for AlphaFold 2 and put a whole bunch of new ideas in there and also bring onto the team some more specialists—biologists and chemists and biophysicists who worked in protein folding—and combine them with our engineering and machine-learning team.
I’ve been working on and thinking about general AI for my whole career, even back at university. I tend to note down scientific problems I think one day could be amenable to the types of algorithms we build, and protein folding was right up there for me always, since the 1990s. I’ve had many, many biologist friends who used to go on about this to me all the time.
Were you surprised that AlphaFold was so successful?
Yeah, it was surprising, actually. I think it’s definitely been the hardest thing we’ve done, and I would also say the most complex system we’ve ever built. The Nature paper that describes all the methods, with the supplementary information and technical details, is 60 pages long. There are 32 different component algorithms, and each of them is needed. It’s a pretty complicated architecture, and it needed a lot of innovation. That’s why it took so long.
by Tanya Lewis, Scientific American | Read more:
Image: Tobias Hase/dpa/Alamy Live News[ed. From the referenced Nature paper:]
"Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’—has been an important open research problem for more than 50 years. Despite recent progress, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known."
"Proteins are essential to life, and understanding their structure can facilitate a mechanistic understanding of their function. Through an enormous experimental effort, the structures of around 100,000 unique proteins have been determined, but this represents a small fraction of the billions of known protein sequences. Structural coverage is bottlenecked by the months to years of painstaking effort required to determine a single protein structure. Accurate computational approaches are needed to address this gap and to enable large-scale structural bioinformatics. Predicting the three-dimensional structure that a protein will adopt based solely on its amino acid sequence—the structure prediction component of the ‘protein folding problem’—has been an important open research problem for more than 50 years. Despite recent progress, existing methods fall far short of atomic accuracy, especially when no homologous structure is available. Here we provide the first computational method that can regularly predict protein structures with atomic accuracy even in cases in which no similar structure is known."