'They have shown that this is not some impossible thing': Academic lab copies Google’s big biological breakthrough
When Demis Hassabis, CEO of Google’s AI outfit DeepMind, announced last year that they had cracked one of the toughest puzzles in biology — successfully predicting a protein’s shape from its amino acid sequence — Minkyung Baek watched with a curious mixture of dread and excitement.
“It felt like I just lost my job,” said Baek, a postdoc at the University of Washington’s Institute for Protein Design.
Baek had spent the last three years trying to do the exact same thing, working with what for years had been the leading protein design technology at the leading protein design lab. Now DeepMind had fully eclipsed her. She felt the scientific community’s excitement, even as she pondered her professional future.
DeepMind’s breakthrough, though, had a caveat: The company hadn’t actually shown how they cracked the puzzle, nor had they made their headline-grabbing software available to any researchers outside of Google’s disparate offices. Baek wondered if she could, with the few breadcrumbs Google left, reconstruct DeepMind’s software and distribute it to the world.
On Thursday, she showed she could. Baek and her team at UW’s Baker lab detailed machine learning software in Science that was almost — albeit not quite — as powerful as DeepMind in predicting a protein’s structure from its sequence, and demonstrated how it could be deployed to probe questions intricately linked to understanding disease and designing new drugs. They have made the tool, known as RoseTTaFold, available on GitHub, where UW claims it has already been downloaded by over 140 research teams.
“If you wanted to be negative about it, you could say they’re playing catch-up and got results that were not quite as good,” said John Moult, a computational biologist at the University of Maryland who in 1994 launched the annual challenge where DeepMind debuted their results. “I think the more positive and proper way of looking at it is that they have done it nearly as well and they have already provided a server, which works at least in a couple of times. And they have done a full release of their code.
“They have shown that this is not some impossible thing for other people to achieve,” Moult added.
The paper came out the same time Nature published the detailed methods behind DeepMind’s work — a coincidence the folks at the Baker lab chalk up to their decision to release the preprint and open source software last month. Hassabis’ team similarly promised to make their software public, although they have not provided a similar server to Baek’s.
The two approaches are broadly similar on a high level, said Baek, relying on broad datasets of known protein sequences and structures and similarities between co-evolved proteins. But they differ vastly in how they technically carry out their vision.
Collectively, they give researchers two solutions to a decades-old problem and offer improvements over the original results released last year, including the ability to predict structure in minutes or hours, rather than days.
A protein’s function is determined by its structure, but for decades the only way to determine that structure was through a variety of imaging techniques, such as X-ray crystallography, that could be lengthy or difficult and didn’t produce accurate depictions for every protein. Although those techniques have improved over time, the potential to predict a a structure from its sequence alone — known as the folding problem — continued to be a holy grail.
Baek’s lab, run by biochemist David Baker and famous for its work in designing proteins from scratch, had been leading the race for years before DeepMind leapfrogged them. Seeing the company’s progress, they adopted more machine learning techniques to catch up.
In their paper, Baek showed a few of the ways widespread access to such technology could be used to build new drugs. They predicted structures for three classes of proteins key to a range of diseases, including cancer and dementia, showing how rare mutations could warp the protein’s shape and revealing possible openings to target drugs.
“It’s not as good as DeepMind,” said Jinbo Xu, a computational biologist at the University of Chicago. “But some of the structures are very accurate. It will be useful.”
Until DeepMind releases their open-source work, RoseTTaFold, Xu said, will be more helpful to the field. They may not have long to wait, though.
Baek posted their paper as a preprint on June 15. Three days later, Hassabis tweeted out a “brief update” on DeepMind’s protein folding software, saying that a full paper outlining their methods was under review and that they would be providing the source code and “broad access” to the software to researchers. Xu said a server for researchers to access DeepMind’s tech would be the ideal tool.
Moult noted the Baker paper took their software to places that DeepMind didn’t. In addition to predicting the structure of individual proteins, they also predicted how different proteins come together to form larger structures, as they do to perform manifold biological functions. Baek predicted the differences between two immune complexes: one formed by IL-23 and one by IL-12. The results could help drug developers identify molecules that block one but not the other, creating more precise drugs for autoimmune diseases.
Baek said her next big project is improving the software’s ability to predict these interactions.