OpenAI GPT-5.4 AI Chemist Boosts Medicinal Chemistry Yields

OpenAI's GPT-5.4 AI chemist ran 10,080 lab reactions in three months and chipped away at a long-standing headache in drug synthesis, the company says in a new blog post. A chemist doing three reactions a day would need more than a decade to cover that much ground.

The AI chemist went further than scribbling ideas on a whiteboard. It designed the experiments, read the data that came back, and landed on a finding that human chemists could later test at the bench.

Behind the work is Molecule.one, the team that built Maria, an autonomous chemistry agent plugged into a high-throughput robotic lab. OpenAI set one loose target: pick an important class of reactions and make it work better.

GPT-5.4 drafted and scored thousands of possible approaches, then human chemists chose four to carry into the lab.

What GPT-5.4 Found in Chan-Lam Coupling

The winning idea, tagged OAI-M1-03, took aim at a hard form of Chan-Lam coupling, the reaction used to stitch carbon and nitrogen atoms together. GPT-5.4 singled out primary sulfonamides, a valuable yet awkward substrate class, then floated an unexpected fix: mild oxidants like TEMPO.

Chemists were intrigued, the post notes, partly because nobody saw it coming. This particular coupling, joining sulfonamides to boronic acids, had long been stuck at poor yields, and that is a real problem since sulfonamides turn up in cancer treatments, antibiotics, and diuretics.

Two rounds of testing in Maria Lab backed the hunch. Average yield moved up to 25.2% from 16.6%, the company reports, and the slice of reactions topping a 30% yield grew to 37.5% from 15.6%. On the boronic acid side, 88% gave better results; among the sulfonamides, 83% improved too.

Why the Finding Matters for Drug Discovery

Making molecules is one of the tightest chokepoints in drug discovery, because a team can only study the compounds it manages to build. A steadier Chan-Lam coupling hands medicinal chemists a more practical route to the drug-like structures they want to explore.

Scale is part of why the result carries weight. A reaction can shine on a single substrate pair and then collapse across a wider set, so thousands of runs let the team confirm TEMPO stood out among ten oxidants and map where it stops working. A follow-up turned up a bonus: a far cheaper cousin, 4-hydroxy-TEMPO, delivered almost the same lift.

Human chemists then rebuilt a sample of the reactions by hand at full scale. Yields rose for 11 of the 14 pairs they retried, more than doubling in most, the post says. That hands-on step matters: very small wells occasionally produce flukes, and a finding only earns trust once it survives at a bigger scale.

What Comes Next

OpenAI is careful here. It calls the work an early signal, not proof that a model can run a research program on its own. People stayed in charge of the key calls throughout, the setup leaned on specialized lab hardware, and there is no evidence yet that the trick carries over to other reactions or substrate families.

Four outside experts who looked over the preprint judged it novel enough to share. The bigger exam is still ahead: can other labs reproduce it, and will the method help across a wider library of molecules? Anyone following AI in the sciences should keep an eye out for that independent replication.

Source: OpenAI