Introduction to Chemical Reaction Prediction
Many attempts have been made to harness the power of new artificial intelligence and large language models (LLMs) to try to predict the outcomes of new chemical reactions. These have had limited success, in part because until now they have not been grounded in an understanding of fundamental physical principles, such as the laws of conservation of mass.
The Problem with Current Models
While large language models such as ChatGPT have been very successful in many areas of research, these models do not provide a way to limit their outputs to physically realistic possibilities, such as by requiring them to adhere to conservation of mass. These models use computational “tokens,” which in this case represent individual atoms, but “if you don’t conserve the tokens, the LLM model starts to make new atoms, or deletes atoms in the reaction.” Instead of being grounded in real scientific understanding, “this is kind of like alchemy,” says Joonyoung Joung, a recent postdoc at MIT.
A New Approach to Reaction Prediction
A team of researchers at MIT has come up with a way of incorporating physical constraints on a reaction prediction model, and thus greatly improving the accuracy and reliability of its outputs. The new work was reported in the journal Nature, in a paper by Joung and his colleagues. The team made use of a method developed back in the 1970s by chemist Ivar Ugi, which uses a bond-electron matrix to represent the electrons in a reaction. They used this system as the basis for their new program, called FlowER (Flow matching for Electron Redistribution), which allows them to explicitly keep track of all the electrons in the reaction to ensure that none are spuriously added or deleted in the process.
How FlowER Works
The system uses a matrix to represent the electrons in a reaction, and uses nonzero values to represent bonds or lone electron pairs and zeros to represent a lack thereof. “That helps us to conserve both atoms and electrons at the same time,” says Mun Hong Fong, a former software engineer at MIT. This representation, he says, was one of the key elements to including mass conservation in their prediction system.
Limitations and Future Directions
The system they developed is still at an early stage, says Connor Coley, the Class of 1957 Career Development Professor in the MIT departments of Chemical Engineering and Electrical Engineering and Computer Science. “The system as it stands is a demonstration — a proof of concept that this generative approach of flow matching is very well suited to the task of chemical reaction prediction.” While the team is excited about this promising approach, he says, “we’re aware that it does have specific limitations as far as the breadth of different chemistries that it’s seen.” Although the model was trained using data on more than a million chemical reactions, obtained from a U.S. Patent Office database, those data do not include certain metals and some kinds of catalytic reactions, he says.
Conclusion
The FlowER model matches or outperforms existing approaches in finding standard mechanistic pathways, the team says, and makes it possible to generalize to previously unseen reaction types. They say the model could potentially be relevant for predicting reactions for medicinal chemistry, materials discovery, combustion, atmospheric chemistry, and electrochemical systems. While the model is still in its early stages, it has the potential to make a significant impact in the field of chemical reaction prediction.
FAQs
- What is the main problem with current chemical reaction prediction models? The main problem is that they do not take into account fundamental physical principles, such as the laws of conservation of mass.
- How does the FlowER model address this problem? The FlowER model uses a bond-electron matrix to represent the electrons in a reaction, which allows it to conserve both atoms and electrons at the same time.
- What are the limitations of the FlowER model? The model is still at an early stage and has specific limitations as far as the breadth of different chemistries that it’s seen. It does not include certain metals and some kinds of catalytic reactions.
- What are the potential applications of the FlowER model? The model could potentially be relevant for predicting reactions for medicinal chemistry, materials discovery, combustion, atmospheric chemistry, and electrochemical systems.








