FHI
The NOMAD Laboratory

Novel Materials Discovery at the FHI of the Max-Planck-Gesellschaft
and IRIS-Adlershof of the Humboldt-Universität zu Berlin

Artificial Intelligence, Coupled-Cluster Theory for Solids, and GW Method

Matthias Scheffler

Open Positions

Development of AI Methods with Reliable Prediction Uncertainties and Domain of Applicability

Artificial intelligence (AI) is significantly impacting our daily life as it can identify correlations and complex patters in data. Materials science is no exception.[1, 2] The risk that AI predictions fail when addressing situations that were not part of the training data, e.g. rare events, is, however, significant, and typically there is no reliable warning. Improving this unsatisfying situation is the topic of this research project.

In materials science, AI is being successfully used to create maps of materials functions (see Ref. [1]), and it is increasingly employed, e.g., for creating machine-learning interatomic potentials (MLIPs), starting from high-level electronic-structure-theory data. These MLIPs are then used for extensive molecular dynamics runs and statistical mechanics. 

An AI description should be understood as an interpolation of the data set that has been used for training. Thus, the key question in AI concerns the reliability of an AI description when used for situations that are different to (outside) the training data. Unfortunately, the term “situations that are different” and “outside” are typically ill defined.

Why are uncertainty estimates of AI predictions critical?

Thermodynamic or kinetic simulations are guaranteed to explore also uncommon regions of configurational space. An example may be that a rare event, that substantially influences the dynamics, may be missed by the MLIP. For instance, the rare, but spontaneous formation of defects is known to be a key trigger for phase transitions. Were the critical defect or the new phase not known beforehand, they will not be part of the training data, so that it is unclear if the AI would be able to find it at all. 

Several strategies have been developed for uncertainty estimates of AI predictions, spanning from rigorous and computationally expensive Bayesian estimates to pragmatic analyses of ensembles-of-models. However, such estimates are getting more and more overconfident, the more the prediction data are driven from further outside the sampling distribution of the training data (see for example Refs. [3, 4] and references therein). Besides the urgent needs of reliably quantifying models’ uncertainties, these estimates are also a vital part of so-called active-learning algorithms, where an AI model requests new (training) data either in regions where a property of interest is optimized (exploitation regime) or in regions where the model uncertainty is large (exploration regime). The model is then retrained and improved with the new acquired information. Active learning is often necessary in materials science, where only little initial information is known about a material or materials class.

Where to start with the project?

A promising route is the use of subgroup discovery for the identification of the so-called domains of applicability. These are domains in descriptor space where an AI model yields small errors).[5] Outside the domain of applicability the AI description will likely fail. Although it has been shown that domains of applicability can be found, the method has not been developed to systematically identify outliers. It has also not been exploited to improve the underlying AI model, e.g., in an active learning fashion.

In this project, we will analyze the limitations of the current approaches for uncertainty estimates and outliers detection and advance the domain of applicability concept. Demonstrating these developments by urgent environmentally relevant applications will be a true breakthrough for the field of materials science and even beyond. 

For further information, please feel free to contact scheffler@fhi-berlin.mpg.de.

Here you can find the general job offer.

 

References

[1] C. Draxl and M. Scheffler, Big-Data-Driven Materials Science and its FAIR Data Infrastructure. Plenary Chapter in Handbook of Materials Modeling, edited by W. Andreoni, and S. Yip (Springer, 2020), p. 49. 
[2] R. Ramprasad, R. Batra, G. Pilania, A. Mannodi-Kanakkithodi, and C. Kim, Machine Learning and Materials Informatics: Recent Applications and Prospects. npj Comp. Mat. 3, 54 (2017).
[3] L. Kahle and F. Zipoli, Quality of Uncertainty Estimates from Neural Network Potential Ensembles. Phys. Rev, E 105, 015311 (2022).
[4] S. Lu, L. M. Ghiringhelli, C. Carbogno, J. Wang, and M. Scheffler, On the Uncertainty Estimates of Equivariant-Neural-Network-Ensembles Interatomic Potentials. arXiv 
[5] C. Sutton, M. Boley, L. M. Ghiringhelli, M. Rupp, J. Vreeken, and M. Scheffler, Identifying Domains of Applicability of Machine Learning Models for Materials Science. Nat Commun 11, 4428 (2020).