Introduction to Evo
The researchers argue that this setup lets Evo “link nucleotide-level patterns to kilobase-scale genomic context.” In other words, if you prompt it with a large chunk of genomic DNA, Evo can interpret that as an LLM would interpret a query and produce an output that, in a genomic sense, is appropriate for that interpretation.
How Evo Works
The researchers reasoned that, given the training on bacterial genomes, they could use a known gene as a prompt, and Evo should produce an output that includes regions that encode proteins with related functions. The key question is whether it would simply output the sequences for proteins we know about already, or whether it would come up with output that’s less predictable.
Testing Evo’s Capabilities
To start testing the system, the researchers prompted it with fragments of the genes for known proteins and determined whether Evo could complete them. In one example, if given 30 percent of the sequence of a gene for a known protein, Evo was able to output 85 percent of the rest. When prompted with 80 percent of the sequence, it could return all of the missing sequence. When a single gene was deleted from a functional cluster, Evo could also correctly identify and restore the missing gene.
Novel Proteins
The large amount of training data also ensured that Evo correctly identified the most important regions of the protein. If it made changes to the sequence, they typically resided in the areas of the protein where variability is tolerated. In other words, its training had enabled the system to incorporate the rules of evolutionary limits on changes in known genes.
Creating New Proteins
So, the researchers decided to test what happened when Evo was asked to output something new. To do so, they used bacterial toxins, which are typically encoded along with an anti-toxin that keeps the cell from killing itself whenever it activates the genes. There are a lot of examples of these out there, and they tend to evolve rapidly as part of an arms race between bacteria and their competitors. So, the team developed a toxin that was only mildly related to known ones, and had no known antitoxin, and fed its sequence to Evo as a prompt. And this time, they filtered out any responses that looked similar to known antitoxin genes.
Conclusion
Evo has shown great potential in understanding and generating new proteins. Its ability to interpret genomic DNA and produce output that is appropriate for that interpretation is a significant breakthrough. The system’s capability to identify and restore missing genes, as well as create new proteins, makes it a valuable tool for scientists.
FAQs
Q: What is Evo?
A: Evo is a system that uses a large language model to interpret genomic DNA and produce output that is appropriate for that interpretation.
Q: How does Evo work?
A: Evo works by using a large amount of training data to learn the patterns and rules of genomic DNA, and then using that knowledge to generate new proteins.
Q: What are the potential applications of Evo?
A: The potential applications of Evo include creating new proteins, identifying and restoring missing genes, and understanding the rules of evolutionary limits on changes in known genes.
Q: Can Evo create new proteins that are not similar to known ones?
A: Yes, Evo can create new proteins that are not similar to known ones, as demonstrated by its ability to generate an antitoxin for a toxin that was only mildly related to known ones.








