Unlocking the Secrets of Life: BioAutoMATED and the Role of Machine Learning in Biology
AI now deciphers DNA, RNA and proteins.
Machine learning is a powerful data analysis and prediction tool that can be useful in various fields of science. However, not all researchers have sufficient experience and resources to create and customize machine learning models for their tasks. How to make this process more accessible and efficient?
MIT professor Jim Collins and colleagues suggested solution to this problem. They developed BioAutoMATED, an automated machine learning system specifically tailored for biological data. They described their work in the journal Cell Systems.
BioAutoMATED is able to independently select and build the appropriate machine learning model for a given dataset, as well as pre-process and format data that take up most of the project time. The system can work with different types of models, such as binary classification, multiclass classification, and regression, as well as different types of data, such as DNA, RNA, proteins, and glycans.
“The fundamental language of biology is based on sequences,” explains Louis Soenksen, a postdoctoral fellow at the Abdul Latif Jamil Machine Learning Clinic (Jamiel Clinic) and first co-author of the paper. “Biological sequences such as DNA, RNA, proteins, and glycans have the amazing informational property of being inherently standardized, like an alphabet. Many AutoML tools are designed for text, so it made sense to extend it to [биологические] sequences.”
BioAutoMATED can cut a multi-month process down to hours, making it very convenient for researchers who want to use machine learning in their projects. “Our tool explores models that are better suited to small and sparse biological datasets, as well as more complex neural networks,” says Jacqueline Valery, a bioengineering doctoral student at the Collins Lab and first co-author of the paper.
BioAutoMATED has already been tested on several real-world biology applications and has shown good results. For example, the system has helped predict the function of unknown proteins based on their sequence, determine the role of glycans in the human immune system, and identify potential drug targets for cancer treatment.
BioAutoMATED is an innovative method that can help accelerate scientific discovery in biology and medicine. It can automate part of the scientific process and suggest new directions for research. However, BioAutoMATED does not replace the human mind and the need for experimental hypothesis testing. It is only a tool that can expand the horizons of science.