Artificial intelligence has solved one of the biggest puzzles in biology by predicting the shape of every protein expressed in the human body.
The research was conducted by London-based AI firm DeepMind, which used its AlphaFold algorithm to build the most complete and accurate database to date of the human proteome, which underlies human health and disease.
Last week, DeepMind published the methods and code for his model, AlphaFold2 in Nature, showing that it could predict the structures of known proteins with near-perfect accuracy.
This was followed by its second Nature paper in as many weeks, published on Thursday, which showed the model confidently tracked the structural position of nearly 60 percent of amino acids, the building blocks of proteins, in the human body, as well as in numerous other organisms such as the fruit fly, the mouse and E.coli bacteria.
The structural position of only about 30 percent of the amino acids was previously known. By understanding the position of amino acids, researchers can predict the three-dimensional structure of a protein.
The set of 350,000 protein structure predictions is now available through a public database hosted by the European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI).
“Predicting their structures accurately has a huge range of scientific applications, from developing new drugs and treatments for diseases, to designing future crops that can withstand climate change, or enzymes that can break down plastic,” said Edith Heard, Director General of the EMBL. “The applications are only limited by our imagination.”
Protein structures are important because they determine how proteins do their job. Knowing the shape of a protein — say a Y-shaped antibody — tells scientists more about that protein’s role.
Malformed proteins can cause diseases such as Alzheimer’s disease, Parkinson’s disease and cystic fibrosis. If scientists can easily predict the shape of a protein, scientists can control and modify it so they can improve its function by altering its DNA sequence, or target drugs that can attach to it.
Accurate prediction of a protein’s structure from its DNA sequence has been one of biology’s greatest challenges. Current experimental methods to determine the shape of a single protein take months or years in a lab. Therefore, only about 180,000 protein structures have been solved, of the more than 200 million known proteins in living things.
“We believe this will be the most significant contribution AI has made to advance the state of scientific knowledge to date,” said DeepMind chief executive Demis Hassabis. “Our ambitions are to expand [the database] coming months to the entire protein universe of more than 200 million proteins.”
Scientists not involved in DeepMind’s research used phrases like “exciting” and “transforming” to describe the impact of the advances, comparing the dataset to the human genome.
“It was one of those moments where my hair on the back of my neck stood on end,” said John McGeehan, director of the Center for Enzyme Innovation at the University of Portsmouth, and a structural biologist who has been using the AlphaFold algorithm for more than a few months.
“We can immediately use that information to develop faster enzymes for breaking down plastic. Those experiments will start right away, so the acceleration to that project here is several years.”
AlphaFold is not without limitations. Proteins are dynamic molecules that constantly change shape depending on what they bind to, but DeepMind’s algorithm can only predict a protein’s static structure, said Minkyung Baek, a researcher at the University of Washington’s Institute for Protein Design.
However, the biggest contributor to scientists was the fact that it was open source, she said. “Last year they showed [this] is all possible but didn’t provide any code so people knew it was there but couldn’t use it.
In the seven months following DeepMind’s announcement, Baek and her colleagues used DeepMind’s idea to build their own open source version of the algorithm they called RosettaFold, and it was published in the journal Science last week. “I’m really glad they made it all public, that’s a huge contribution to biological research and also to commercial pharma,” she said. “Now more people can benefit from their method [and] it goes much faster.”