AI Predicts Location of Proteins in Human Cells

Introduction to Protein Localization

A protein located in the wrong part of a cell can contribute to several diseases, such as Alzheimer’s, cystic fibrosis, and cancer. But there are about 70,000 different proteins and protein variants in a single human cell, and since scientists can typically only test for a handful in one experiment, it is extremely costly and time-consuming to identify proteins’ locations manually.

The Challenge of Protein Localization

One of the largest datasets for protein localization is the Human Protein Atlas, which catalogs the subcellular behavior of over 13,000 proteins in more than 40 cell lines. However, this dataset has only explored about 0.25 percent of all possible pairings of all proteins and cell lines within the database. This means that there is still a lot to be discovered about protein localization.

A New Computational Approach

Now, researchers from MIT, Harvard University, and the Broad Institute of MIT and Harvard have developed a new computational approach that can efficiently explore the remaining uncharted space. Their method can predict the location of any protein in any human cell line, even when both protein and cell have never been tested before. This technique goes one step further than many AI-based methods by localizing a protein at the single-cell level, rather than as an averaged estimate across all the cells of a specific type.

How the Technique Works

The researchers combined a protein language model with a special type of computer vision model to capture rich details about a protein and cell. The user inputs the sequence of amino acids that form the protein and three cell stain images — one for the nucleus, one for the microtubules, and one for the endoplasmic reticulum. Then, the model does the rest, outputting an image of a cell with a highlighted portion indicating the model’s prediction of where the protein is located.

Collaborating Models

The technique, called PUPS, utilizes a two-part method for prediction of unseen proteins’ subcellular location. The first part uses a protein sequence model to capture the localization-determining properties of a protein and its 3D structure based on the chain of amino acids that forms it. The second part incorporates an image inpainting model, which is designed to fill in missing parts of an image. This computer vision model looks at three stained images of a cell to gather information about the state of that cell, such as its type, individual features, and whether it is under stress.

A Deeper Understanding

The researchers employed a few tricks during the training process to teach PUPS how to combine information from each model in such a way that it can make an educated guess on the protein’s location, even if it hasn’t seen that protein before. For instance, they assign the model a secondary task during training: to explicitly name the compartment of localization, like the cell nucleus. This extra step was found to help the model learn more effectively.

Potential Applications

PUPS can even understand, on its own, how different parts of a protein’s sequence contribute separately to its overall localization. Because PUPS can generalize to unseen proteins, it can capture changes in localization driven by unique protein mutations that aren’t included in the Human Protein Atlas. This technique could help researchers and clinicians more efficiently diagnose diseases or identify drug targets, while also enabling biologists to better understand how complex biological processes are related to protein localization.

Conclusion

In conclusion, the new computational approach developed by researchers from MIT, Harvard University, and the Broad Institute of MIT and Harvard has the potential to revolutionize the field of protein localization. By predicting the location of any protein in any human cell line, this technique can help researchers and clinicians better understand the underlying causes of diseases and develop more effective treatments.

FAQs

Q: What is protein localization?
A: Protein localization refers to the process of determining where a protein is located within a cell.
Q: Why is protein localization important?
A: Protein localization is important because it can help researchers and clinicians understand the underlying causes of diseases and develop more effective treatments.
Q: What is the Human Protein Atlas?
A: The Human Protein Atlas is a dataset that catalogs the subcellular behavior of over 13,000 proteins in more than 40 cell lines.
Q: How does the new computational approach work?
A: The new computational approach uses a combination of a protein language model and a computer vision model to predict the location of a protein within a cell.
Q: What are the potential applications of this technique?
A: The potential applications of this technique include helping researchers and clinicians diagnose diseases, identify drug targets, and understand complex biological processes.