Imagine trying to find a new material that is stronger than steel, lighter than aluminum, and biodegradable. You could spend decades mixing chemicals, testing samples, and publishing results, only for a colleague across the world to unknowingly repeat your work. Now imagine a system that does all of that in days, learns from every experiment, shares results in real time, and never makes the same mistake twice.
That system is no longer imaginary. It is being built right now, piece by piece, by researchers around the world. And at its core are two powerful forces: artificial intelligence and the open source movement. A sweeping new review published in 2026 in Communications Materials lays out, in remarkable detail, what this new infrastructure looks like, how it works, and what it means for the future of science, industry, and the planet.
Why This Matters Beyond the Laboratory
Materials science might sound abstract, but everything you touch, wear, drive, eat from, or power your life with began as a materials discovery. The solar panel on your roof. The battery in your phone. The packaging around your food. The implant in someone's hip. Every one of those started with someone asking: is there a material that can do this job better?
For most of human history, finding that material required decades of painstaking trial and error. Even with computers and modern chemistry, the challenge has remained enormous. There are essentially infinite possible chemical combinations. Testing them one by one is simply not feasible. The review, produced by researchers at North Carolina State University and IBM, argues that artificial intelligence has fundamentally changed this equation, and that the real breakthrough is not just the AI itself, but how it is being deployed within open, collaborative, and ethically grounded frameworks.
From Trial and Error to Generative Design
The review traces the history of materials discovery through five major eras. It begins with empirical science, the ancient approach of experimenting by hand and learning through observation. It moves through the theoretical era, when equations from thermodynamics and quantum mechanics gave researchers a way to predict material behavior on paper. Then came computational science in the 1960s and 1970s, when computers made it possible to simulate atoms and molecules virtually. Next came the era of big data and machine learning, when vast datasets allowed algorithms to find patterns no human could detect.
Now we are in the fifth era: generative AI. This is where the story gets genuinely exciting. Instead of simply analyzing existing materials, generative AI systems can propose entirely new ones. They learn what makes a good material in one context, and then create entirely novel molecular structures that satisfy those requirements, sometimes dreaming up combinations that no human chemist would have thought to try. The authors describe this as a fundamental shift: AI is no longer just accelerating discovery, it is expanding what is even conceivable.
The Quiet Revolution: Open Source Changes Everything
Perhaps the most important insight in the review is not about a specific algorithm or breakthrough tool. It is about openness. The paper argues that the transformative potential of AI in materials science can only be fully realized if the tools, data, and methods are openly shared.
This might seem obvious, but it represents a significant cultural and institutional shift. Science has historically been competitive, with labs and companies guarding their data and methods as proprietary advantages. The review makes the case that this approach is now a bottleneck rather than a benefit.
Open source databases like the Materials Project, the Open Quantum Materials Database, and NOMAD have already made millions of material property records freely available to researchers worldwide. These repositories serve as the training grounds for AI models that can predict material properties, suggest synthesis pathways, and screen thousands of candidates in the time it would previously take to test one. The review argues that expanding and standardizing these open resources is not just good science, it is an ethical imperative.
Autonomous Laboratories: Science Without Sleep
One of the most remarkable sections of the review describes something called an autonomous laboratory. This is a real, working system where robots, AI, and automated measurement tools work together in a continuous loop, designing experiments, running them, analyzing the results, and then designing the next round of experiments without any human intervention.
The AI sets goals, proposes experiments, the robots execute them, sensors capture the results in real time, and the AI learns from each outcome to get closer to the target material. These systems can operate around the clock, every day of the year, running experiments and accumulating knowledge at a speed no human team could match.
The review describes a system called Ada, which uses this approach to optimize thin film solar cells. Instead of spending months or years manually tweaking deposition conditions, Ada explores a vast landscape of possibilities on its own, finding high performing configurations in a fraction of the time. The authors suggest that autonomous laboratories could fundamentally change the pace of clean energy innovation, one of the most urgent challenges of our time.
The Data Problem and How AI Is Solving It
One of the persistent challenges in materials science has been data. Experiments are expensive and time consuming, which means datasets are often small. Different labs use different methods, which means data from one group is hard to compare with data from another. And a huge amount of valuable knowledge is buried in scientific papers, locked in text rather than organized in any useful format.
The review describes a wave of AI tools that are attacking all three of these problems simultaneously.
For small datasets, techniques like generative adversarial networks and variational autoencoders can create realistic synthetic data that supplements real experimental results, effectively giving AI models more to learn from without requiring additional experiments.
For inconsistent data, powerful preprocessing pipelines using tools like Python's Pandas library and specialized chemistry toolkits can standardize and clean data from diverse sources, making it possible to combine information that would otherwise be incompatible.
For knowledge trapped in text, large language models are being trained to read scientific literature and extract structured information automatically. One system described in the review processed over 130,000 scientific abstracts in 60 hours, pulling out more than 300,000 material property records. Another extracted over one million records from 681,000 full text journal articles. What once required armies of graduate students can now be accomplished by an algorithm in days.
Explaining the Black Box: Why Trust Matters
A recurring theme in the review is the importance of making AI systems explainable. This matters for reasons that go beyond scientific curiosity.
When an AI model predicts that a certain material will have exceptional properties, a researcher needs to understand why. Is the prediction based on a genuine pattern in the data, or is it a statistical artifact? If the model is wrong, can we tell where it went wrong? If it is right, can we learn something from its reasoning that leads to even better materials?
The review discusses a field called explainable artificial intelligence, or XAI, which provides tools for answering these questions. Techniques like SHAP analysis (which stands for Shapley Additive Explanations, a method borrowed from game theory) and attention mapping allow researchers to look inside complex models and understand which features are driving predictions. This is essential for building scientific trust, for debugging models when they fail, and for translating AI insights into actionable guidance for experimentalists.
The authors are clear that explainability is not optional in a serious research environment. It is a requirement, especially in contexts where AI is guiding decisions about which experiments to run, which materials to prioritize, or which processes to optimize.
Blockchain: An Unexpected Player in Science
One section of the review may surprise readers who associate blockchain primarily with cryptocurrency. The researchers make a compelling case for blockchain technology as a tool for scientific integrity.
The core problem is trust. When a material property is reported in a database, how do we know where that number came from? Was the experiment conducted properly? Has the data been modified? In a world where AI models are being trained on millions of data points from thousands of sources, the origin of that data matters enormously.
Blockchain offers a way to create records of where data came from, who collected it, under what conditions, and how it has been used, and these records cannot be altered or faked. Smart contracts can enforce data sharing agreements automatically. Supply chains for critical materials can be traced transparently from source to end product. The review describes systems where blockchain is being used to coordinate global research collaborations, ensure reproducibility, and protect intellectual property while still enabling open sharing.
The Carbon Cost of Computation
The review does not shy away from an uncomfortable paradox. AI driven materials discovery could help us solve climate change by finding better solar cells, batteries, and carbon capture materials. But training and running AI models consumes enormous amounts of energy, which itself contributes to the problem.
The numbers are striking. Training the BLOOM language model emitted between 24.7 and 50.5 metric tons of carbon dioxide, depending on the accounting method. GPT-3 had an estimated carbon footprint equivalent to the lifetime emissions of an average car. As AI models grow larger and more computationally demanding, these numbers will only increase.
The review calls for researchers to take this seriously. It describes tools like CodeCarbon, Carbontracker, and eco2AI, which allow scientists to monitor the energy consumption and carbon emissions of their computational work in real time. It argues for using more energy efficient hardware, running computations on servers powered by renewable energy, and designing algorithms that achieve good results with less computation rather than simply throwing more computing power at problems.
This is part of a broader argument the authors make throughout the paper: that responsible innovation means thinking about environmental impact at every stage of the discovery process, not just when evaluating the materials themselves.
Quantum Computing on the Horizon
Looking further ahead, the review discusses quantum computing as a potentially transformational tool for materials science. Classical computers struggle to simulate the quantum mechanical behavior of complex materials at the atomic level. Quantum computers are inherently suited to this task.
The review describes how researchers are already using quantum algorithms to calculate the ground state energies of molecules, simulate corrosion resistant alloys at high temperatures, and optimize the design of metal organic frameworks for carbon capture. Google's Willow quantum chip is cited as an example of recent hardware progress that is bringing these capabilities closer to practical reality.
Quantum machine learning, which combines quantum computing with machine learning techniques, is also described. Early results suggest that quantum approaches can outperform classical methods for certain high dimensional problems, particularly in materials classification and property prediction.
The authors are cautious: quantum hardware is still maturing, and significant challenges around noise, decoherence, and error correction remain. But they position quantum computing as a coming layer of the open source AI infrastructure for materials science, one that will likely play an important role in tackling the most computationally demanding challenges.
A Case Study in Packaging Materials
To ground all of this in concrete terms, the review includes a detailed case study showing how the proposed framework would work in practice for developing sustainable fiber based packaging materials. Think of it as designing a biodegradable alternative to plastic wrap or foam, but engineered to be as effective as the synthetic materials it replaces.
The case study walks through every stage: aggregating data from open databases and scientific literature using AI extraction tools, standardizing that data, using generative AI to propose candidate formulations, using neural networks to predict how those candidates would perform, validating the best candidates with physics based simulations, and then deploying autonomous laboratory systems to test and refine the real materials.
Throughout this process, blockchain records every step for reproducibility and traceability, lifecycle assessment tools evaluate the environmental impact of each candidate material, and edge computing sensors monitor manufacturing in real time. The authors describe this not as a futuristic vision but as a realistic near term blueprint, using tools that either already exist or are in active development.
The Ethical Dimension
Running through the entire review is a concern that is easy to overlook when the conversation focuses on algorithms and datasets: who benefits from all of this, and who is left behind?
The authors argue that the democratizing potential of open source AI infrastructure is real, but not automatic. If the tools, data, and computational resources required to do cutting edge materials discovery remain concentrated in a small number of wealthy institutions and corporations, the promises of faster innovation and global sustainability will not be kept. Ensuring equitable access, building diverse and representative datasets, and developing AI systems that are transparent and accountable are not afterthoughts. They are design requirements.
The review explicitly endorses a set of principles for ethical and responsible AI development in this space, including explainability, privacy preservation, lifecycle awareness, and alignment with international sustainability goals. These are not soft suggestions. The authors argue they are prerequisites for science that is genuinely trustworthy and genuinely useful.
What Comes Next
The review closes with a vision for the future that is both ambitious and grounded. It calls for advances in human centered explainable AI, expanded federated learning systems that allow institutions to collaborate without sharing sensitive data, deeper integration between AI and physical laboratory automation, stronger data standards, and the gradual introduction of quantum computing capabilities.
Most fundamentally, it argues for a shift in how scientists think about infrastructure. Just as roads and power grids are shared public goods that enable private activity, open source AI infrastructure for materials discovery should be thought of as a shared scientific commons, built collaboratively, maintained responsibly, and accessible to anyone who wants to contribute to solving the world's hardest problems.
The materials that will power the next generation of solar cells, batteries, medicines, and sustainable products are out there waiting to be discovered. The infrastructure to find them faster, more sustainably, and more equitably than ever before is being built right now.
Publication Details: Year of Online Publication: 2026; Journal: Communications Materials; Publisher: Springer Nature; DOI: https://doi.org/10.1038/s43246-026-01105-0
Credit & Disclaimer: This article is based on the peer reviewed research paper. All scientific facts, findings, data, and conclusions presented in this article are drawn directly from that original research. Readers are strongly encouraged to consult the full research article for complete details, methodology, raw data, and scientific context.






