Predicting Antibody Structures with AI
By adapting artificial intelligence models known as large language models, researchers have made significant progress in predicting a protein’s structure from its sequence. However, this approach hasn’t been as successful for antibodies, due to the hypervariability seen in this type of protein.
Overcoming the Limitation
To overcome this limitation, MIT researchers have developed a computational technique that allows large language models to predict antibody structures more accurately. This breakthrough could enable researchers to sift through millions of possible antibodies to identify those that could be used to treat SARS-CoV-2 and other infectious diseases.
"Our method allows us to scale, whereas others do not, to the point where we can actually find a few needles in the haystack," says Bonnie Berger, the Simons Professor of Mathematics, the head of the Computation and Biology group in MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), and one of the senior authors of the new study. "If we could help to stop drug companies from going into clinical trials with the wrong thing, it would really save a lot of money."
Modeling Hypervariability
Proteins consist of long chains of amino acids, which can fold into an enormous number of possible structures. In recent years, predicting these structures has become much easier to do, using artificial intelligence programs such as AlphaFold. Many of these programs, such as ESMFold and OmegaFold, are based on large language models, which were originally developed to analyze vast amounts of text, allowing them to learn to predict the next word in a sequence. This same approach can work for protein sequences — by learning which protein structures are most likely to be formed from different patterns of amino acids.
The Challenge of Antibodies
However, this technique doesn’t always work on antibodies, especially on a segment of the antibody known as the hypervariable region. Antibodies usually have a Y-shaped structure, and these hypervariable regions are located in the tips of the Y, where they detect and bind to foreign proteins, also known as antigens.
The Breakthrough
To model those hypervariable regions, the researchers created two modules that build on existing protein language models. One of these modules was trained on hypervariable sequences from about 3,000 antibody structures found in the Protein Data Bank (PDB), allowing it to learn which sequences tend to generate similar structures. The other module was trained on data that correlates about 3,700 antibody sequences to how strongly they bind three different antigens.
The Results
The resulting computational model, known as AbMap, can predict antibody structures and binding strength based on their amino acid sequences. To demonstrate the usefulness of this model, the researchers used it to predict antibody structures that would strongly neutralize the spike protein of the SARS-CoV-2 virus.
Conclusion
The researchers started with a set of antibodies that had been predicted to bind to this target, then generated millions of variants by changing the hypervariable regions. Their model was able to identify antibody structures that would be the most successful, much more accurately than traditional protein-structure models based on large language models.
Frequently Asked Questions
Q: How can the new model help in the development of new treatments for diseases?
A: The new model can help in the development of new treatments for diseases by predicting antibody structures that would strongly neutralize the spike protein of the SARS-CoV-2 virus, for example.
Q: How can the new model be used to analyze entire antibody repertoires from individual people?
A: The new model can be used to analyze entire antibody repertoires from individual people by quickly generating structures for all of the antibodies found in an individual, which could help to solve the problem of why some people respond to infection differently.
Q: How can the new model help to identify a variety of good candidates early in the development process?
A: The new model can help to identify a variety of good candidates early in the development process by predicting antibody structures and binding strength based on their amino acid sequences, which could help to save a lot of money and time in the development of new treatments.