An AI system to help scientists write expert-level empirical software
|
|
Background
The cycle of scientific discovery is frequently stalled by the slow, manual creation of software to support computational experiments. To address this, we present an AI system that creates expert-level scientific software whose goal is to maximize a quality metric. In nearly every field, researchers rely on "empirical software", code designed to maximize a specific quality score, such as how well a model fits observed data. Historically, building this software requires years of specialized labor and often depends on human intuition rather than a systematic search for the best approach. Because of how slow this development process is, it severely limits the range of complex hypotheses that can be practically explored. There is a critical need for an automated system capable of producing expert-level scientific software at scale to overcome these research bottlenecks. Approach We developed an AI system that combines a Large Language Model with a Tree Search algorithm to systematically navigate the massive space of potential code solutions. To refine our methodology, we utilized a benchmark of 16 data science competitions on the Kaggle platform, which allowed us to calibrate the AI’s performance against thousands of human participants in a fast-paced environment. The system works by intelligently rewriting code and evaluating each version against a specific quality metric in a secure "sandbox". To reach expert-level results, we provide the AI with complex research ideas from scientific papers and textbooks, which it then recombines into novel, high-performing software. Key Findings We demonstrated that our system achieves "superhuman" performance by identifying high-quality code solutions that often surpass human experts across various disciplines.
Impact This research represents a major step toward accelerating scientific progress by automating the most tedious aspects of research software creation. By reducing the time required to test new research ideas from months to just hours or days, we enable a significantly faster cycle of discovery. The lab believes this capability will revolutionize fields where solutions can be numerically scored, such as drug discovery, climate modeling, and environmental monitoring. Ultimately, this technology puts scientific advancement on the precipice of a revolutionary acceleration. |
Resources
Published Paper: Aygün E, Belyaeva A, Comanici G, Coram M, Cui H, Garrison J, Johnston R, Kast A, McLean CY, Norgaard P, Shamsi Z, Smalling D, Thompson J, Venugopalan S, Williams BP, He C, Martinson S, Plomecka M, Wei L, Zhou Y, Zhu Q-Z, Abraham M, Brand E, Bulanova A, Cardille JA, Co C, Ellsworth S, Joseph G, Kane M, Krueger R, Kartiwa J, Liebling D, Lueckmann J-M, Raccuglia P, Wang X(J), Chou K, Manyika J, Matias Y, Platt JC, Dorfman L, Mourad S, Brenner MP. An AI system to help scientists write expert-level empirical software. arXiv:2509.06503 [cs.AI]. 2025. DOI: https://doi.org/10.48550/arXiv.2509.06503.
Source Code Repository: github.com/google-research/s
Data Repositories:
Source Code Repository: github.com/google-research/s
Data Repositories: