MIT continues its efforts to transform the process of drug design and manufacturing with a new MIT-industry consortium, the Machine Learning for Pharmaceutical Discovery and Synthesis. The new consortium already includes eight industry partners, all major players in the pharmaceutical field, including Amgen, BASF, Bayer, Lilly, Novartis, Pfizer, Sunovion, and WuXi. A large number of these have a research presence in Cambridge or the surrounding areas, allowing for close cooperation and the creation of a center for artificial intelligence (AI) applications in pharmaceuticals.
The drug discovery process can often be exceedingly expensive and time-consuming, but machine learning offers tremendous opportunities to more efficiently access and understand vast amounts of chemical data — with great potential to improve both processes and outcomes. The consortium aims to break down the divide between machine learning research at MIT and drug discovery research — bringing MIT researchers and industry together to identify and address the most significant problems.
As part of the broader initiative to bring together machine learning and drug research, in April, MIT hosted a summit led by Regina Barzilay, the Delta Electronics Professor of Computer Science, and Dina Katabi, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science. The summit gathered MIT researchers with leaders of technology, biotech, and regulatory agencies to engage in ways digital technologies and artificial intelligence can help address major challenges in the biomedical and health care industries.
The earliest seeds for the consortium began with software and technology funded by the Defense Advanced Research Projects Agency (DARPA) “Make-It” program, which has the goal of integrating machine learning with automated systems for chemical synthesis. MIT researchers discussed the potential for a consortium with pharmaceutical industry contacts, initially meeting with company representatives in May 2017 and again in September 2017 — at which time there was great interest from both industry and MIT researchers in working together. Since then, through work with the MIT Office of Sponsored Programs (OSP) and the MIT Technology Licensing Office (TLO), the consortium has been officially formed. A consortium meeting on May 3 brought together industry members and MIT researchers.
“The enthusiasm of the member companies and the potential of machine learning create a tremendous opportunity for advancing the toolbox for medical scientists in the chemical and pharmaceutical industries,” says Klavs Jensen, the Warren K. Lewis Professor of Chemical Engineering and professor of materials science and engineering.
“Machine learning can help plan chemical synthesis pathways and help identify which chemical parts within a molecule contribute to particular properties,” adds Jensen. “Also, this may ultimately lead us to explore new chemical spaces, increase chemical diversity, and give us a larger opportunity to identify suitable compounds that will have specific biological functions.”
The May 3 meeting aimed to introduce the industry members to fundamentals of machine learning through tutorials and joint research projects. Toward this goal, Barzilay taught the first tutorial on the basics of supervised learning; the tutorial covered neural models and focused on representation learning with the goal of preparing participants for technical presentations in the afternoon.
“We are at the beginning of a relatively unexplored field with endless opportunities for new science, which has real impact on people’s lives,” says Barzilay. “Our colleagues from the pharmaceutical industry care about science the way we do at MIT — this is key to successful collaboration. I am continuously learning from them and getting new problems to think about.”
Barzilay says that one of the goals of the consortium is to establish evaluation standards and create benchmark datasets for assessing the accuracy of machine learning methods. Currently, most research groups evaluate their results on proprietary datasets, which prevents comparison across different models — and slows scientific progress. To make the matters worse, many publicly available datasets are not representative of the real complexities that pharma researchers are facing.
“It is for the benefit of everybody — both researchers and users of new technology — to really understand where we stand and what is true capacity of new machine learning technology,” Barzilay says.
MIT principal investigators for the consortium span different areas and departments, bringing expertise in machine learning, chemistry, and chemical engineering. In addition to professors Jensen and Barzilay, PIs include: William H. Green, the Hoyt C. Hottel Professor in Chemical Engineering; Tommi Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science and the Institute for Data, Systems, and Society; and Timothy Jamison, the Robert R. Taylor Professor and head of the Department of Chemistry.
“By marrying chemical insights with modern machine learning concepts and methods, we are opening new avenues for designing, understanding, optimizing, and synthesizing drugs,” says Jaakkola. “The consortium contributes to lifting chemistry to the realm of data science and bringing about a new interdisciplinary area akin to computational biology, with its own key questions and goals. The collaboration also offers a new training ground for students and researchers alike.”
At the recent consortium meeting, Connor Coley, a graduate student in the Department of Chemical Engineering who works in the research groups of professors Green and Jensen, presented an overview and demonstration of synthesis planning software — which could have an especially significant impact in the area of small molecule discovery and development. (Some aspects of this synthesis planning work are summarized in the recent paper, “Machine Learning in Computer-Aided Synthesis Planning.”) Coley says that although synthesis planning software has existed for decades, no system has yet achieved widespread adoption.
“We’re in a unique position now where access to large amounts of chemical data and computing power has enabled new approaches that might finally make these tools useful and appealing to practicing chemists,” says Coley. “Synthesis planning has a clear role in early stage discovery, where rapidly identifying synthetic strategies for novel molecules can decrease the cycle time of design-synthesis-test iterations. We’re looking forward to working closely with the consortium members to help facilitate the work they do and see our methodologies and tools translated into practice.”
Likewise, industry partners see great value and potential in implementing machine learning approaches.
“The application of machine learning tools provides an opportunity to augment and accelerate drug discovery and development — and get new medicines to patients more quickly,” says Shawn Walker, director of process development of pivotal drug substance technologies at Amgen. “Machine learning tools could help to design the best molecules based on binding affinity and minimizing toxicity, design the best and most cost-effective synthetic processes to manufacture these molecules, and extract insights from disparate sources — including chemical literature and company databases. The possibilities are endless, and we hope that partnering top scientific talent with the best machine learning tools will lead to better outcomes for patients.”
“We are excited to participate in this MIT machine learning consortium, along with our other industry partners,” says José Duca, head of computer aided drug discovery in global discovery chemistry at the Novartis Institutes for BioMedical Research. “This consortium will tackle the challenge of efficient and targeted route-planning using state-of-the-art machine learning approaches. Ultimately, we hope this accelerates our ability to make safer, more potent drugs against human disease.”