Introduction

In materials discovery problems, it is desirable to select and test candidates which hold the most promise for satisfying a particular functional target, while maintaining as broad a diversity in the search space as possible. To this end, a data-driven approach is often used to meet these needs, whereby materials are first rapidly screened via high-throughput experimentation or computational modeling to identify potential candidates1,2,3. However, the vastness of the accessible chemical search spaces can present serious challenges for current experimental or even computational methods due to the significant evaluation time and resource demands. Machine learning provides promising solutions to this problem by providing a cheaper surrogate for the computational calculations, or by producing new candidates which have a specified target property4,5,6,7,8,9. The latter approach may involve the use of generative models, which allows for sampling within a continuous chemical or materials latent space which can map to unique and undiscovered materials10,11,12,13,14. Although more challenging to implement than discriminative models, generative modeling is highly appealing for its potential to realize the “inverse design” of materials and to efficiently “close the loop” between modeling and experiments4,15,16,17,18.

In general inverse problems, given a forward process y = f(x), the goal is to then find a suitable inverse model x = f−1(y) to map the reverse process. In the context of materials discovery, the forward mapping from materials design parameters to target property can take the form of experimental measurements or computational calculations, such as density functional theory (DFT). However, the reverse process from target property to design parameters cannot be obtained directly via the same methods but can be inferred with machine learning. One such approach uses variational autoencoders19 (VAEs), where the encoder and decoder models learn to approximate inverse solutions upon convergence. Instead, we propose to use invertible neural networks20 (INNs) and conditional INNs21 (cINNs) where a single model can be trained on a forward process and the exact inverse solution can then be obtained for free. The intrinsic invertibility of INNs offers potential advantages in stability and performance20,21,22,23 over VAEs and the popular generative adversarial networks24 (GANs), which suffer from difficulties in training due to mode collapse.

In this work, we leverage the INN architecture to solve materials design problems and develop a framework utilizing these models to generate high-quality materials candidates with the targeted properties, outlined in Fig. 1. Starting with a given materials design space, we begin by generating training data sampling within this space using DFT. We use the data to train the invertible neural network to obtain forward and reverse mappings. The trained network is then evaluated in the reverse direction to generate samples given a target property. We note the generated samples may not directly yield sufficiently high-fidelity candidates within chemical accuracy of the target. We therefore include additional (1) down-selection based on fitness criteria and (2) optimization to localize generated samples to the exact solutions using the initialized samples from the INN. In this work, the fitness criteria for down-selection limits samples to those which are close to the desired property and with parameters within the training data distribution, but this can be further expanded to include additional criteria such as experimental feasibility as needed. Selected samples are then optimized by gradient descent with automatic differentiation25. For down-selection and optimization, the forward mode of the INN can be conveniently used as the property prediction surrogate and provide gradients via backpropagation. These final optimized samples may be further validated by DFT, or if the performance is sufficiently high, may be omitted and the generated samples then analyzed directly. Here we will show our framework, MatDesINNe (Materials Design with Invertible Neural Networks), can successfully achieve inverse materials design with a high accuracy.

Fig. 1: Inverse materials design workflow.
figure 1

Starting with a specified design space, training data is obtained with DFT which is then fed to train the MatDesINNe framework using invertible neural networks. Samples are generated, down-selected and localized to ensure high quality candidates. An optional validation step with DFT ensures the candidates have the intended target properties.

We apply MatDesINNe to the band gap engineering problem for monolayer MoS2, the archetypical 2D dichalcogenide, where the design parameters are applied strain and external electric field and the target property is the electronic band gap (Eg). Strain can been applied to 2D materials to monotonically tune their band gaps, which can be experimentally accomplished via various methods including bending or stretching the substrate, thermal expansion, or through the application of local stress from an atomic force microscope tip26,27,28,29,30. Similarly, an external electric field can also be used to further modulate the band gap due to band-bending, and provides an additional degree of freedom which is not constrained by the elastic strength of the material31,32. The ability to tune the band gap freely allows the material to be designed for a target application, including in photocatalysis, electronics, sensors, and neuromorphic devices26,33,34,35,36. Here, we will represent applied strain as deviations in the equilibrium lattice constants a, b, c, α, β, γ, and an electric field E normal to the monolayer as the seventh dimension in the design space. Our focus is to demonstrate our framework on a model system that is sufficiently complex to compare its merits over existing approaches, yet simple enough to yield useful physical insights for a technologically relevant problem.

Results

Data generation

To generate the training data, we performed approximately 11,000 DFT calculations by sampling over the entire range of the design space (Fig. 2a). In this work, we use a sampling range of 20% above and below the equilibrium for the six lattice parameters. While this may exceed realistic achievable ranges in some cases, we focus first on achieving an accurate mapping of the forward and reverse process from the full design space, and experimental limitations can be applied later at the down-selection stage. In addition to strain, an external electric field is applied in a range from −1 to 1 V/Å. The DFT-calculated band gap at equilibrium is 1.97 eV, and we find most samples lie below this value under the applied strain and electric field (Fig. 2b). Most notably, the vast majority of cases result in a metal-insulator transition (MIT) to a band gap of zero, and this data imbalance can present significant challenges for generative models which we will see shortly. We also illustrate here the full design space in two dimensions using the Uniform Manifold Approximation and Projection (UMAP) method37 (Fig. 2c), which shows no clear decision boundary between low Eg and high Eg states, especially in the 0–1 eV region where substantial intermixing of states can be observed. Therefore, the generative model used must also have a high degree of fidelity to avoid large errors in the target property of generated candidates.

Fig. 2: Materials design space and data distribution.
figure 2

a MoS2 model and design parameter space specification, which can be categorized as tensile/compressive (green arrow), shear (orange arrow) and electric field (blue arrow). b Distribution of DFT-computed band gaps within the sampled design space for monolayer MoS2. c UMAP embedding of sampled design space with DFT-computed band gaps.

Benchmarking model performance

To determine whether our approach succeeds for this type of challenging inverse problem, we first train and subsequently validate our model by comparing the band gap of generated samples with the surrogate model. We also compare our models with several other inverse-mapping methods in order to calibrate performance: (1) mixture density methods (MDN)38 and (2) conditional variational autoencoders (cVAE)39. For our models we implemented the base (3) invertible neural networks (INN)20 and (4) conditional invertible neural networks (cINN)21 as well as the additional steps included in the MatDesINNe framework as (5) MatDesINNe-INN and (6) MatDesINNe- cINN. The results are compiled in Table 1, where 10,000 samples are obtained with each model and validated against the surrogate to reveal their relative performance. We find all models performed adequately well for the Eg = 0 case, an unsurprising result, as statistically most samples will have zero band gaps. However, for non-zero cases such as Eg = 0.5 and Eg = 1.0 the performance of the baseline models drops dramatically, far too low for use in any materials design situation. With invertible neural networks, cINN maintains an appreciable performance of ~0.2 eV for both non-zero gap cases, though the error still remains higher than ideal. Finally, with the MatDesINNe framework, the error is further reduced to near-zero (with respect to the surrogate model) for all target band gaps using MatDesINNe-cINN.

Table 1 Surrogate-validated performance and speed of tested models for 10000 samples given three targets: Eg = 0, 0.5 and 1 eV.

There are several potential reasons for the exceptional performance of MatDesINNe-cINN over the other models (MDN, cVAE, base INN and cINN). MDN is a classic model which relies on the Gaussian mixture model, which may not fit complex and strongly non-Gaussian problems such as our case here. cVAE is a model which is trained on a forward process utilizing the L2 loss to achieve a good reconstruction of the original input x, and the backward process to solve x which is decided from random samples drawn from latent space z conditioned on y. Here, the inference performance from decoding is limited by its invertible expressivity leading to inverse solutions with poor accuracy. This brings us the INN and cINN models. The cINN transforms x directly to a latent representation z given the observation y, which is done by providing y as an additional input to each affine coupling layer during both the forwards and backwards network passes, which yields a better performance for inference. These base models perform better than MDN and cVAE, though they still do not provide solutions with a sufficiently high accuracy. Consequently, the localization step in our framework plays an important role here to push these posterior samples (from INN and cINN) to the optimal solutions via gradient descent. Prior to the localization, we also filtered out the out-of-distribution samples which cannot localize properly. As cINN provides better posterior samples than INN, MatDesINNe-cINN is also able to localize much better than MatDesINNe-INN as seen in Table 1.

In terms of computational cost, all machine learning models required less than a minute to generate 10,000 samples on a single standard CPU. We note this does not include data-generation or training time into the calculation. MatDesINNe requires slightly more time due to the additional localization step but provides substantially improved performance. The speeds provided for the models can be considered to be essentially on-the-fly for the generation, which can be further increased with using GPUs. Regardless of the method, the computation times are negligible compared to the DFT calculations: to generate 10,000 samples with a band gap of 1 ± 0.1 eV with DFT would require 1.03 * 107 seconds or over five orders of magnitudes longer than MatDesINNe due to both the intrinsic DFT evaluation time and the low statistical probability for finding a target band gap. The speedup provided here with inverse learning can be considered as a lower bound for general problems, as more complex properties and higher levels of theory will necessitate much longer DFT evaluation times.

We then validate the best-performing model, MatDesINNe-cINN with DFT calculations to find the absolute performance relative to DFT. The DFT band gaps for each of the three targets are shown in Fig. 3 for 200 samples, showing an excellent performance with a MAE of 0.1 eV or lower. This accuracy is only slightly higher than usual experimental error bars for band gap measurements40, which is a typical point of comparison when discussing chemical accuracy. While our model is trained on DFT band gaps which contain its own error bars, we expect a model expressive enough to fit well to the DFT energies to do similarly well for another, more accurate, quantum chemistry method. A low incidence of outliers is also observed; only 6 out of 200 samples, or 3%, have non-zero gaps for Eg = 0, and less than 2% of samples have zero band gaps for Eg = 0.5 and Eg = 1. The absolute performance here is limited by the accuracy of the surrogate model used for gradient-based optimization, hence the discrepancy between DFT-validated performance and earlier surrogate-validated performance in Table 1. By improving the accuracy of the surrogate model and thereby allowing the optimized samples to be closer to the true DFT value, we anticipate the DFT-validated performance can be further improved.

Fig. 3: Validation by DFT.
figure 3

DFT-validated performance of generated candidates for three targets: a Eg = 0, b 0.5 and c 1 eV (shown by the position of the red line). Example structures are shown for each target.

In addition to the high fidelity of the generated candidates with this approach, we find they adequately cover the distribution of the original training data and while maintaining a high degree of diversity in the design parameter space. To demonstrate this, distributions of the training data and generated data are compared side-by-side for Eg = 0, 1 in Fig. 4, and plotted with respect to average in-plane (a, b, γ) and out-of-plane (c, α, β) strain for ease of visualization. The distributions for both Eg = 0 and Eg = 1 are shown to match quite well with the training data and not localized to a specific region in the design space. This example illustrates another strength of this approach compared to methods such as conditional GANs which often struggle with maintaining a high sample diversity21. Furthermore, we investigated the uniqueness of the generated samples by comparing their similarity to the original training data. Samples whose parameters which fall outside a tolerance of an average or maximum of 0.01 or 1% of the existing data are considered unique. The results are compiled in Table 2 which shows in general, the majority of generated samples were found to be unique according to this criterion.

Fig. 4: Generated data distributions compared to DFT.
figure 4

Distributions of average in-plane strain vs. out-of-plane strain for a DFT training data and b generated data cases. For each case, red denotes Eg = 0 eV samples, while blue denotes the Eg = 1 ± 0.1 eV band gap samples.

Table 2 Percentage of unique samples for a given target band gap.

Materials generation and applications

With a sufficiently accurate and expressive generative model as demonstrated here, the problem of generating specific design parameters with a target band gap becomes trivial. We next expand upon these capabilities by investigating the overall parameter space as mapped out by the model. For example, we can group the strain parameters into two categories: tensile/compressive (a, b, c) and shear (α, β, γ) and plot their overall distributions in Fig. 5. Here, we find the distribution in average shear strain is fairly normal, while it is significantly more skewed for tensile/compressive strain. It is readily apparent Eg is far more sensitive to shear strain than tensile/compressive strain, where few samples exist for Eg = 1 when average shear strain is over 5% in either direction. When viewed with respect to the absolute strain (Fig. 5c, d), zero gap and non-zero gap samples can be better distinguished, with a rough correlation where higher absolute strain leads to greater probability for finding Eg = 0 materials, especially for the shear strain case. Meanwhile, Eg = 0.5 and 1 share much of the same strain space, with the exception of the tensile strain region (positive values) where Eg = 1 drops off quickly but Eg = 0.5 persist up to approximately 17% in tensile strain. These analyses can help provide design principles and provide guidance into regions of the strain space for further sampling.

Fig. 5: Generated data distributions by tensile/compressive and shear strain.
figure 5

Distributions of generated data by a average tensile/compressive strain, b average shear strain, c average absolute tensile/compressive strain and d average absolute shear strain for three targets: Eg = 0, 0.5 and 1 eV.

For a more targeted application, we propose to use the generated samples to probe the metal-insulator transition of MoS2, a key property in neuromorphic devices. One useful insight gained by generative modeling is revealing regions in the design parameter space where MIT occurs with minor perturbations41,42. To illustrate this, we use UMAP to reduce the 7-dimensional space into two dimensions for two situations, the MIT between Eg = 0 and 0.5 (Fig. 6a) and the MIT between Eg = 0 and 1 (Fig. 6b). In terms of global structure, we see two relatively distinct regions in the reduced dimensional space for both cases, though the Eg = 0 and 1 case shows a much clearer decision boundary. This is an expected behavior consistent with the observation suggested previously in Fig. 5 in which a high Eg correlates with low shear strain and vice versa with little overlap. Meanwhile, locally, there are many regions in the reduced dimensional space where zero and nonzero band gaps coexist, even for the case of Eg = 0 and 1. As UMAP tends to locally group entities which are similar in the full input dimension, an overlap in points with zero and nonzero gaps suggest a transition between these two states only requires a minor change in the applied strain with two examples shown in the zoomed-in insets for Fig. 6. In the example for the Eg = 0 and 0.5 case, MIT occurs from a 7% tensile strain in the y-axis and a 0.82 V/Å change in electric field. For the Eg = 0 and 1 case, a 6% tensile strain in the y-axis and 7% compressive strain in the z-axis is needed. This can highlight potentially useful regions in strain space where MIT can be easily induced, allowing for fast and energy-efficient switching. Alternatively, if the goal is to prevent the occurrence of MIT, it is then desirable to select regions in the strain space where no Eg = 0 cases can be found in the vicinity.

Fig. 6: Metal-insulator transition in embedding space.
figure 6

UMAP embedding of strain parameters for a Eg = 0.5 and b Eg = 1 eV with Eg = 0 eV samples. Red denotes Eg = 0 eV samples, while blue denotes the nonzero band gap samples. Selected regions (in yellow) are zoomed in to highlight specific examples where MIT occurs. Samples were validated with DFT and the true band gaps are shown in parentheses.

Discussion

INNs exhibit the attractive property of intrinsic invertibility and serves as a promising approach for solving inverse design problems. In our materials design framework MatDesINNe, we train INNs and use then to generate samples given a target property. To ensure samples have target properties within chemical accuracy, we further incorporate down-selection and localization via optimization on the generated samples. We apply the framework to engineering band gaps in monolayer MoS2 via applied strain and electric field. We find our approach outperforms other baseline methods and succeeds at the conditional generative task with a target property MAE of around 0.1 eV or lower. This approach is then used to provide high-fidelity and diverse candidates at several orders of magnitude lower computational cost than direct screening with DFT. Using this model, we then generate large quantities of candidate materials to explore the design space and obtain useful design principles or insights for further sampling. Finally, we show how our model can help tackle complex problems such as the MIT in MoS2 by densely populating the design space for two target band gap states and identifying potential regions where fast switching of states can be achieved. Moreover, our approach is materials and applications agnostic and can be applied for general materials design tasks for arbitrary structures and compositions, provided well-defined design parameters and target properties(s) are chosen and available for training. We have made this code available in an open-source repository for this purpose.

In the current work, we focused on inverse learning models and investigated their ability to deal with imbalanced datasets and accurately map complex underlying physics for materials discovery. We showed this is a challenging problem even for a fairly small parameter space, where current methods such as MDN, VAEs and base INNs will fail. An alternative existing approach for this problem is by modeling only the forward process with machine learning and using high-throughput screening to find the candidates with random or grid-based sampling. Using a deep neural network with two hidden layers, we obtained a forward prediction MAE of 0.084 eV on the same dataset which is on par with the performance our inverse learning model. However, the forward modeling approach quickly becomes intractable for high-dimensional materials design spaces, a problem which can be tackled by inverse learning models like INNs, as previously shown for image generation21. One potential concern is that the affine coupling layers used in this work may have limitations in its expressivity for capturing complex distributions43. In the future, other invertible architectures may be considered, such as neural spline flows, which have better expressivity44.

We have also not incorporated additional atomic or compositional dimensions as design parameters, which have been included in recent studies using variational autoencoders (VAEs) and generative adversarial networks (GANs)45,46,47,48. One challenge here is that generative methods involving atomic structure remain hampered by a lack of a suitable invertible crystal representation which is ideally rotationally and translationally invariant11. We leave the problem of a generative model for atomic structures for the future when such a representation is developed.

We anticipate a materials design framework such as MatDesINNe will provide both theoretical insights as shown in this work as well as offer a means of integrating computation with experiments. The traditional approach of linear discovery via model, make, and measure remains severely limited by the large number of variables, complex coupled/competing underlying phenomena, and slow process times. Methods which can navigate the design space in a rational fashion to select experiments, as well as learn the outcome of these experiments can therefore greatly improve efficiency and introduce autonomous guidance. By generating high fidelity and diverse samples on-the-fly with chemical accuracy, MatDesINNe can satisfy many of these requirements and be incorporated into the autonomous experimentation process.

Methods

Density functional theory

Lattice constants and angles a, b, c, α, β, γ were sampled within a range of 20% deviation from the equilibrium crystal parameters: a = b = 3.16 Å, c = 12.30 Å, α = β = 90°, γ = 120°. An external electric field is also applied in the z-direction from −1 to 1 V/Å. A total of 10799 structures were generated and band gaps obtained using density functional theory (DFT). The DFT calculations were performed with the Vienna Ab Initio Simulation Package (VASP)49,50. The Perdew-Burke-Ernzerhof (PBE)51 functional within the generalized-gradient approximation (GGA) was used for electron exchange and correlation energies. The projector-augmented wave method was used to describe the electron-core interaction49,52. A kinetic energy cutoff of 500 eV was used. All calculations were performed with spin polarization. The Brillouin zone was sampled using a Monkhorst-Pack scheme with a 9x9x1 grid53. We note that PBE, and more generally GGA functionals, often struggle with calculating accurate band gaps, especially for strongly correlated systems. In the present case, our computed PBE gap of an unperturbed monolayer MoS2 (1.97 eV) was found to be close to the experimental gap (1.90 eV)54 and should serve an adequate method for demonstration purposes.

Invertible neural network model

Inverse problem specification

Typically, a mathematical or physical model is developed to describe how measured observations \(y \in {\Bbb R}^M\) arise from the hidden parameters \(x \in {\Bbb R}^D\) to yield such a mapping \(y \in {{{\mathrm{{\Omega}}}}}(x)\). To completely capture all possible inverse solutions given observed measurements, a proper inverse model should enable the estimation of the full posterior distribution \(p(x|y)\) of hidden parameters x conditioned on an observation y.

Invertible neural networks

A recent study21 showed that the invertible neural networks (INNs) can first be trained in the forward pass and then used in the reverse mode to sample from \(p(x|y)\) for any specific y. This is achieved by adding a latent variable \(z \in {\Bbb R}^K,K = D - M\), which encodes the inherent information loss in the forward process. In other words, the latent variable z drawn from a Gaussian distribution \(p\left( z \right) = N(0,I_K)\) is able to encode the intrinsic information about x that is not contained in y. To this end, an augmented inverse problem is formulated based on such a bijective mapping, as shown in Fig. 7:

$$x = h\left( {y_a;\phi } \right) = h\left( {y,z;\phi } \right),z\sim p(z)$$
(1)

where h is a deterministic function of y and z, parametrized on an INN with parameter Φ. Forward training optimizers the mapping \(x \to y_a = [y,z]\) and implicitly determines the inverse mapping \(x = h(y,z)\). In the context of INNs, the posterior distribution \(p(x|y)\) is represented by the deterministic function \(x = h(y,z)\) that transforms the known probability distribution \(p(z)\) to parameter x-space, conditional on measurements y. Thus, given a chosen observation y* with the learned h, we can obtain the posterior samples xk which follows the posterior distribution \(p(x|y ^\ast )\) via a transformation \(x_k = h(y ^\ast ,z_k)\) with prior samples drawn from \(z_k\sim p(z)\).

Fig. 7: Inverse problem specification.
figure 7

The original inverse problem is often ill-posed due to the one-to-many mapping. An augmented inverse problem is formulated based on bijective mapping with introducing additional latent random variable z.

The invertible architecture allows us to simultaneously learn the model \(h\left( {y,z;\phi } \right)\) of the inverse process jointly with a model \(f\left( {x;\phi } \right)\) which approximates the true forward process \({{{\mathrm{{\Omega}}}}}({{{\mathrm{x}}}})\):

$$\left[ {y,z} \right] = f\left( {x;\phi } \right) = \left[ {f_x\left( {x;\phi } \right),f_z\left( {x;\phi } \right)} \right] = h^{ - 1}\left( {x;\phi } \right)$$
(2)

Where \(f_y\left( {x;\phi } \right) \approx {{{\mathrm{{\Omega}}}}}({{{\mathrm{x}}}})\), model f and h share the same parameters Φ in a single invertible neural network. Therefore, our approximated posterior model \(\hat p(x|y)\) is built into the invertible neural network representation as

$$\hat p(x = h(y,z;\phi)|y)=\frac{p(z)}{|J_x|},\quad J_x={\mathrm{det}}\left( \frac{\partial h(y,z;\phi )}{\partial [y,z]}\right)$$
(3)

Where Jx is the Jacobian determinant that can be efficiently computed by using affine coupling blocks55.

Invertible neural network training

In this work use an invertible architecture with affine coupling layers20. A forward L2 loss is defined where yt is the true output:

$$L_y = \left\| {y - y_t} \right\|_2^2$$
(4)

Then a backward loss \(L_z = {\mathrm{MMD}}(z)\) is used to fit the probability distribution of latent variable p(z) to a standard Gaussian distribution. MMD refers to the Maximum Mean Discrepancy56. The total loss is then defined with weighting factors λy and λz:

$$L\left( {y,z} \right) = \lambda _yL_y + \lambda _zL_z$$
(5)

It is also possible to train the invertible neural network with a maximum likelihood loss by integrating the forward L2 loss and unsupervised backward loss20:

$$L(y,z)=\frac{1}{2}\left(\frac{1}{\sigma ^2}(y - y_t)^2+z^2\right)-\log\left|\det J_{x \mapsto [y,z]}\right|$$
(6)

Conditional invertible neural networks (cINN)

Instead of training an invertible neural network to predict y and x with additional latent variable z, the conditional invertible neural network21 transforms x directly to a latent representation z conditional on the observation y. This is achieved by using y as an additional input to each affine coupling layer in both forward and backward processes. The model is then trained with a maximum likelihood loss:

$$L(z)=\frac{1}{2}z^2-\log\left|\det J_{x \mapsto z}\right|$$
(7)

Localization from posterior samples

After finishing the invertible neural network training, a set of posterior samples can be drawn from the approximate posterior distribution \(\hat p\left( {x{{{\mathrm{|}}}}y} \right)\) . Compared with the prior distribution \(p(x)\), which is typical a uniform random guess, these posterior samples drawn from \(\hat p\left( {x{{{\mathrm{|}}}}y} \right)\) significantly narrow the search space from a global search to a local search. The small gap between the approximate posterior and true posterior is due to the residual training loss but can be bridged by gradient descent with few steps. Conventional global search methods via optimization are very computationally intensive when the design space is high. However, the invertible model helps us narrow the search through generative posterior samples, which can be interpreted as intelligent prior information. Incorporating a local search via gradient descent, we then obtain more accurate inverse solutions. This procedure typically consists of three steps:

Step 1: Prior exploration

Given a specific target property \(\hat y\), we first draw samples from the latent space \(z_i\sim p\left( z \right)_{i = 1}^m\) and then transform these samples to the design space \(\hat x_i\sim \hat p\left( {x|\hat y} \right)_{i = 1}^m\) via an invertible bijective mapping. These posterior samples serving as good initialization shorten the distance to the exact inverse solution.

Step 2: Gradient estimation

We save the surrogate model with minimized L2 loss in INN training and evaluate the learned model by changing the input x to the network. The gradient at the current \(\hat x_i\) can be efficiently computed by automatic differentiation:

$$g_i = \left. {\frac{{\partial L\left( {\hat f_y\left( {\hat x_i;\phi ^ \ast } \right),\hat y} \right)}}{{\partial x}}} \right|_{x = \hat x_i}\;\hat x_i\sim \hat p\left( {x|\hat y} \right),i = 1, \ldots ,m$$
(8)

Step 3: Localization via gradient decent

We precisely localize the posterior samples drawn from \(\widehat p\left( {x|\widehat y} \right)\) to exact inverse solutions via gradient descent \(\hat x_{i + 1} = \hat x_i - \gamma g_i\), where γ is the learning rate. We use the Adam optimizer to adaptively update the solution. This local search with intelligent priors is much more efficient compared with a generic random search in the entire design space.

MatDesINNe framework

The MatDesINNE framework consists of (1) training, (2) inference, (3) down-selection, and (4) localization steps. In step 1, for the INN, weight coefficients for each loss are initialized and the total loss is minimized via bi-directional training with stochastic gradient descent. The forward model with minimal l2 loss is saved as a forward surrogate. For the cINN, the loss is minimized via maximum likelihood training with stochastic gradient descent. In step 2, random samples are generated from the latent space p(z), and the corresponding posterior samples conditioned on the prior sample z and a given observation y through an invertible transformation are computed. In step 3, down-selection is performed by removing far outlier samples based on the surrogate predictions, as well as samples with parameters outside the training range. In the case of cINN, the majority of samples still remain after down-selection, while the ones which are removed are unlikely to localize to good solutions and. In step 4, localization is performed on the remaining samples by computing the gradient at the current x using the saved forward surrogate with automatic differentiation and optimized via gradient decent.

Model implementation

Our implementation is based on PyTorch and FrEIA (https://github.com/VLL-HD/FrEIA) which is used for building the invertible neural network blocks. For INN and cINN model, we use 6 invertible blocks, and 2 fully connected (fc) layers for each block. There are 256 neurons in each fc layer and the ReLU activation is used in each fc. We use Adam algorithm as the optimizer for invertible training and localization training with a learning rate 1E-3 and a weight decay rate 1E-5. The with weighting factors λy and λz in INN total loss are equal to 1. To stabilize the invertible training, we add small perturbations with Gaussian noise where σy is 5E-3 and σz is 2E-3. We use a total number of 11000 DFT samples where 80% DFT data are used for training and 20% DFT data are used for testing. The batch size for invertible training is 500 and the number of training epochs is 1000. The experiments are performed on Linux workstation (Ubuntu 18.04) with 16 Xeon CPUs and a NVIDIA Quadro RTX 6000 GPU. The average training time per model is about 10–15 minutes using a single GPU.

Baseline methods

Conditional variational autoencoders (cVAE)

The conditional variational autoencoder39 uses the evidence lower bound and encodes the x into a Gaussian distributed random latent variable z conditioned on y. The forward training process utilizes the L2 loss to achieve a good reconstruction of the original input x while the backward process solves x which is decoded from random samples that are drawn from z conditioned on y. The loss function is defined as:

$$L=\alpha(x-\hat x)^2-\frac{1}{2}\beta\left(1+\log\sigma_z-\mu_z^2-\sigma_z\right)$$
(9)

Our implementation of cVAE is based on the literature57. The encoder and decoder models are constructed by three layers with 128 hidden nodes for each layer. The latent dimension is 3. The optimizer is Adam with a learning rate 1E-3 and batch size 500. The entire training is done by 1000 epochs.

Mixture density networks (MDN)

Mixture density networks38 directly model the inverse problem, with p(x|y) as the input and predicts the parameters μx, \({{{\mathrm{{\Sigma}}}}}_x^{ - 1}\) of a Gaussian mixture model p(x|y) as output. The model was and trained by maximizing the likelihood of the training data with the loss function:

$$L = \frac{1}{2}\left( {x\mu _x^T \cdot {{{\mathrm{{\Sigma}}}}}_x^{ - 1} \cdot x\mu _x} \right) - \log\left| {{{{\mathrm{{\Sigma}}}}}_x^{ - 1}} \right|^{1/2}$$
(10)

The mixture density network consists of categorical network and mixture diagonal normal network. Both network models are constructed by three-layer neural networks. The hidden dimension is equal to the input dimension. For this case, we use 3 components for MDN model. Similarly, we use Adam optimizer with a learning rate 1E-3. The batch size is 500 and the number of training epochs is 1000.