Introduction

The availability of accurate and highly efficient interatomic potentials is crucial for the atomistic simulation of materials phenomena with intrinsic length and time scales inaccessible to first principles electronic structure theory. Examples in materials science include failure processes such as crack propagation1 and chemical dynamics at reactive surfaces2. The advent of machine-learning-based interatomic potentials (MLIPs) has meant that high-fidelity interatomic potentials based on Kohn–Sham density functional theory (KS-DFT) and beyond have become much more widely available3,4,5. Yet, the effort to generate MLIPs that are both transferable and accurate is still significant and heavily depends on the configurational space spanned by the underlying training data set6. Very few MLIPs have been reported that are able to capture different materials phases, surface terminations, and the effects of complex defects on the stability and structure of the material5,7,8.

More importantly, MLIPs and conventional interatomic potentials fundamentally neglect explicit electronic degrees of freedom of molecules and materials thereby removing access to the simulation of observables beyond structure and stability, such as electric conductivity and optical response, which depend on the electronic subsystem and electron–phonon coupling. While the ability to predict optical and electronic properties is desirable, the inclusion of electronic degrees of freedom will likely also benefit the transferability of MLIPs.

For decades, semi-empirical and tight-binding (TB) models of electronic structure have sought to combine the efficiency of interatomic potentials with the explicit description of electrons. A plethora of approaches based on two-centre and three-centre integral approximations have led to established method frameworks such as the AM1 and PM3 methods9,10, the density functional tight-binding (DFTB) method11,12, the Sankey–Niklewski approach as implemented in the FIREBALL code13,14, and the xTB approach15. Unfortunately, the rigid mathematical form of the integral tabulations in most approaches means that TB parametrizations are limited in accuracy and often do not transfer beyond the materials classes for which they were originally intended.

As ML methods make inroads across a diverse range of molecular simulation workflows16, approaches beyond MLIPs are being pursued that incorporate electronic properties. For molecules, Li et al. have proposed a neural-network-based parametrization pipeline for DFTB17, while Stoehr et al. have proposed deep tensor neural networks (DTNNs) to construct beyond-pairwise repulsion potentials18. Qiao et al. have shown that the use of symmetry-adapted atomic-orbital features can significantly improve transferability and prediction accuracy of molecular stability19.

In the realm of condensed phase materials, the automated construction of tight-binding models from ab initio data has been a topic of great interest as it can benefit high-throughput materials screening studies20. Most commonly, electronic structure simulations of materials are performed in non-atom-centred basis representations such as the pseudopotential plane wave framework, which is not easily amenable to the construction of TB models. TB Hamiltonians are typically constructed via transformation into a maximally localised Wannier function representation21, which provides a compact atom-centred basis representation with local support22. It is also possible to fit Slater–Koster parameters directly to DFT calculations in a data-driven fashion23,24. Materials simulations in atom-centred orbital representations as provided by, for example, the FHI-aims code25 are becoming more common, where Wannierization is not necessary and the basis representation provided by the code is directly amenable to machine learning approaches based on local representations of atomic neighbourhoods6. Examples of such representations include Behler–Parinello symmetry functions3,26, the SOAP descriptor27 or the atomic cluster expansion28,29. First efforts of direct machine learning prediction of electronic structure have been reported in literature. For example, SchNOrb30 is a DTNN representation of molecular mean-field electronic structure Hamiltonians, which has been used to predict Hamiltonians in local atomic orbital and optimised effective minimal basis representations for organic molecules including up to 13 heavy atoms30,31. Hedge and Bowen32 employed Kernel ridge regression with a bispectrum representation33 for an analytical representation of a minimal basis DFT Hamiltonian for bulk copper and diamond. Equivariant parameterisations for molecular systems along similar lines to what we describe here have been reported, learning either from the Hamiltonian34 or from wavefunctions and electronic densities35. These works apply linear or nonlinear equivariant models, respectively, to the MD17 molecular dataset, both of which improve on the non-equivariant SchNOrb approach of ref. 30. However, to our knowledge, the present work is the first to address the specific challenges of learning Hamiltonians in solid state systems.

In this work, we present a completely data-driven approach to analytical model construction based on ab initio electronic structure theory. The model is able to faithfully represent electronic structure as a function of atomic configuration and materials composition in nonorthogonal local atomic orbital representation via the Hamiltonian and overlap matrices. This goes beyond conventional TB descriptions as it represents DFT to full order, rather than in two-centre or three-centre approximations. We achieve this by introducing an ACE descriptor to represent intraatomic onsite and interatomic offsite blocks of Hamiltonian and overlap matrices that transform equivariantly with respect to the full rotation group in three dimensions. This equivariant descriptor is integrated in an automated data-driven workflow that enables rapid parameterisation of environment dependent TB models directly from DFT data as illustrated in Fig. 1. We showcase the capabilities of this approach by predicting the band structure of bulk aluminium in different crystal systems.

Fig. 1: Schematic of the ACEhamiltonians (atomic cluster expansion for Hamiltonians) workflow.
figure 1

The upper panel shows data generation with the FHI-aims electronic structure theory code, the central panel model fitting with the ACE.jl and ACEhamiltonians.jl packages, and the lower panel prediction.

Results

In most electronic structure calculations the ground state of a system is obtained by solving an eigenvalue problem

$$\hat{H}{\psi }_{i}={\epsilon }_{i}{\psi }_{i},i=1,2,\cdots$$
(1)

where

$$\hat{H}=-\frac{1}{2}{\nabla }^{2}+{V}_{{{{\rm{eff}}}}}.$$
(2)

For example, in the widely used Kohn–Sham DFT model,

$${V}_{{{{\rm{eff}}}}}={V}_{{{{\rm{eff}}}}}[\rho ],\,{{{\rm{where}}}}$$
(3)
$$\rho =\mathop{\sum}\limits_{i}{f}_{i}| {\psi }_{i}{| }^{2},$$
(4)

and fi is the occupancy of electronic eigenstate i with wave function ψi; i.e., (1) becomes a nonlinear eigenvalue problem, which is extremely computationally demanding and is usually solved by employing a self-consistent field (SCF) algorithm36,37.

In this paper, we are concerned with finding an analytical representation of a self-consistent Hamiltonian operator \(\hat{H}=-\frac{1}{2}{\nabla }^{2}+{V}_{{{{\rm{eff}}}}}\) in discrete basis representation.

Hamiltonians for extended materials in atomic orbital basis representation

To achieve a finite basis representation, we expand the wave functions ψi in a local nonorthogonal atom-centred basis representation

$${\chi }_{a}({{{\boldsymbol{x}}}})={R}_{nl}(r){Y}_{lm}(\theta ,\phi )$$
(5)

where a = (n, l, m; I) is a composite index, and the spatial electron coordinate x and its components r, θ, and ϕ in centrosymmetric coordinates around the atom I are used. Ylm are spherical harmonics that define the angular dependence, and \(n=0,\ldots ,{n}_{\max }\), \(l=0,\ldots ,{l}_{\max }\), \(m=-{l}_{\max },\ldots ,{l}_{\max }\) characterise the radial and angular nodal structure of the atomic orbital. The choice of Rnl(r) varies between different types of atomic orbital basis representations and can involve linear combinations (contractions) of Gaussian functions or numerically tabulated functions. Here we choose the latter as defined in the numeric atom-centred orbital (NAO) basis employed in the FHI-aims code25, with the onsite and offsite block structure as illustrated in Fig. 2. With this definition, we can express the overlap between basis functions and the interactions as mediated by the Hamiltonian as follows:

Fig. 2: Block structure and atomic orbital subblocks in the Hamiltonian and overlap matrices used in our models.
figure 2

Each block within panel a is a 14 × 14 matrix with the atomic orbital structure HIJ shown in panel (b). Blocks coloured green in a are onsite blocks, while those shown in purple are offsite blocks. Note that the onsite HII are self-adjoint and hence, e.g., only one of the ps and sp blocks needs to be fitted.

$${H}_{ab}=\left\langle {\chi }_{a}| \hat{H}| {\chi }_{b}\right\rangle \,{{{\rm{and}}}}$$
(6)
$${S}_{ab}=\left\langle {\chi }_{a}| {\chi }_{b}\right\rangle .$$
(7)

Given a crystal-periodic structure \({{{\bf{R}}}}={\{{{{{\bf{L}}}}}_{\kappa },{{{{\boldsymbol{r}}}}}_{I},{Z}_{I}\}}_{I}\) specified through a set of lattice vectors Lκ=1,2,3, atom positions rI and chemical species ZI, we must consider periodic boundary conditions. As such, a Hamiltonian defined over the whole crystal volume reduces to a block diagonal Hamiltonian where each block corresponds to a vector k in reciprocal space, which can be solved via an independent generalised eigenvalue problem:

$${{{\bf{H}}}}({{{\boldsymbol{k}}}}){\psi }_{i{{{\boldsymbol{k}}}}}={\epsilon }_{i{{{\boldsymbol{k}}}}}{{{\bf{S}}}}({{{\boldsymbol{k}}}}){\psi }_{i{{{\boldsymbol{k}}}}},\quad i=1,2,\ldots ,$$
(8)

where ψik are Bloch wave functions and H(k) and S(k) are Hamiltonian and overlap matrices defined in terms of a discrete crystal-periodic basis. In the Methods section IV D, we show how H(k) and S(k) can be constructed at arbitrary points k in reciprocal space from real-space representations of Hamiltonian and overlap matrices that span the full crystal volume (typically considered within a certain radius around the central unit cell). As the k-dependent matrices and the solution of the set of generalised eigenvalues completely follow from the real-space H and S in Eqs. (6) and (7), we will go on to develop a representation for those two matrix quantities as a function of the structure R.

Recall that \(\hat{H}=-\frac{1}{2}{\nabla }^{2}+{V}_{{{{\rm{eff}}}}}\). The effective potential Veff is not only a function of the spatial electron coordinate x but also of the entire atomic structure, i.e., one should think of

$${V}_{{{{\rm{eff}}}}}={V}_{{{{\rm{eff}}}}}({{{\boldsymbol{x}}}};{{{\bf{R}}}}).$$
(9)

For example, in KS-DFT, this dependence arises due to the dependence of Veff on the self-consistent electron density. Our aim will be to construct a general regression scheme for the discretised Hamiltonian exploiting three fundamental, general properties of \(\hat{H}\) and in particular Veff: (i) near-sightedness of electronic structure; (ii) smoothness under changes in the atomic structure; and (iii) equivariance of the Hamiltonian. We will discuss in the next section how these properties are to be exploited in the parameterisation.

In preparation, we first make (iii) more precise: let Q O(3) denote an isometry (rotation and reflection) and \(Q{{{\bf{R}}}}={\{{{{{\bf{L}}}}}_{\kappa },Q{{{{\boldsymbol{r}}}}}_{I},{Z}_{I}\}}_{I}\) (where we also rotate the cell). Further, let HIJ = HIJ(R) denote the Hamiltonian block corresponding to interactions between orbitals centred at sites I and J. It is then straightforward to deduce that

$${{{{\boldsymbol{H}}}}}_{IJ}(Q{{{\bf{R}}}})=D{(Q)}^{* }{{{{\boldsymbol{H}}}}}_{IJ}({{{\bf{R}}}})D(Q),$$
(10)

where D(Q) is a block-Wigner-D matrix,

$$D(Q)={{{\rm{Diag}}}}({D}^{{l}_{1}}(Q),{D}^{{l}_{2}}(Q),\cdots \,),$$
(11)

and (l1, l2, … ) specify the types of orbitals at each site. More details can be found in the “Methods” section “Equivariance of HIJ”. Since the focus of the present work is on elemental metallic systems we ignore chemical species information entirely in the present work; this will be addressed in the future either directly as is done for ACE interatomic potentials38 or using compressed species information39,40.

Crucially, there are only two distinct functional relationships that must be “learned” in order to represent the entire Hamiltonian: one for off-site blocks that represent interactions between orbitals centred at two different atoms and one for on-site blocks representing interactions of orbitals at the same atom. More precisely, the translation invariance and permutation equivariance of the Hamiltonian imply that

$$\begin{array}{lll}{{{{\boldsymbol{H}}}}}_{II}&=&{{{{\boldsymbol{H}}}}}_{{{{\rm{on}}}}}({{{{\bf{R}}}}}_{I}),\qquad {{{\rm{and}}}}\\ {{{{\boldsymbol{H}}}}}_{IJ}&=&{{{{\boldsymbol{H}}}}}_{{{{\rm{off}}}}}({{{{\bf{R}}}}}_{IJ}),\end{array}$$
(12)

where RI denotes the atomic environment of atom I and RIJ the bond environment of (multiple) bonds between the two atoms i, j which also contains the position of bonds. These environments are defined as follows:

$$\begin{array}{lll}{{{{\bf{R}}}}}_{I}&:=&\left\{{{{{\boldsymbol{r}}}}}_{IK}\,| \,K\,\ne\, I\right\},\,{{{\rm{and}}}}\\ {{{{\bf{R}}}}}_{IJ}&:=&\left\{{{{{\boldsymbol{r}}}}}_{IJ};\{{{{{\boldsymbol{r}}}}}_{K}-\frac{1}{2}({{{{\boldsymbol{r}}}}}_{I}+{{{{\boldsymbol{r}}}}}_{J})\,| \,K\,\ne\, I,J\}\right\},\end{array}$$
(13)

where rIJ = rIrJ. In the above definitions, the index K runs over all unit cells N within the crystal volume. According to Eq. (10) the functions Hon and Hoff are equivariant in the sense that

$${{{{\boldsymbol{H}}}}}_{{{{\rm{on}}}}/{{{\rm{off}}}}}(Q{{{\bf{R}}}})=D{(Q)}^{* }{{{{\boldsymbol{H}}}}}_{{{{\rm{on}}}}/{{{\rm{off}}}}}({{{\bf{R}}}})D(Q).$$
(14)

Translation invariance is now built into the dependence of Hon/off on relative positions only, while permutation equivariance of H is built into Eq. (12).

Several simplifications apply for the treatment of the overlap matrix. For each atom we choose a set of basis functions χ that are orthogonal, which means that the on-site blocks SII are identity matrices. The off-site blocks follow the same symmetry as the Hamiltonian off-site blocks.

Parameterisation

We parameterise the real-space Hamiltonian and overlap matrix blocks Hon, Hoff and Soff using an equivariant ACE basis28,29,41. Similar techniques have previously been proposed in other contexts34,40,42. In this section, we present a general outline of the ideas, making certain choices of approximation parameters concrete in the “Methods” section “Parameter estimation”.

We denote the parameterised Hamiltonian and overlap by \(\tilde{{{{\boldsymbol{H}}}}},\tilde{{{{\boldsymbol{S}}}}}\). For the sake of simplicity we focus the presentation on \(\tilde{{{{\boldsymbol{H}}}}}\) and remark on the relevant modification for \(\tilde{{{{\boldsymbol{S}}}}}\) at the end. All procedures are straightforward to generalise for multiple species with the only effect being an increased number of \(\tilde{{{{\boldsymbol{H}}}}}\) and \(\tilde{{{{\boldsymbol{S}}}}}\) blocks that have to be considered as element combinations increase. In the present case, \({\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}\) is invariant under permutations of RI and \({\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}\) is invariant under permutations of RIJ. Both can therefore be parameterised by the ACE model. Here, we closely follow the procedures introduced in refs. 29,38,41.

1. Parameterisation of H on

We start by choosing a one-particle basis,

$${\phi }_{v}({{{\boldsymbol{x}}}}):={\phi }_{nlm}^{{{{\rm{on}}}}}({{{\boldsymbol{x}}}}):={P}_{nl}(r){Y}_{lm}(\hat{{{{\boldsymbol{x}}}}}){f}_{{{{\rm{cut}}}}}(r)$$
(15)

where \({{{\boldsymbol{x}}}}=r\hat{{{{\boldsymbol{x}}}}}\) and we have identified the composite index v ≡ (nlm). The radial cutoff or envelope function fcut(r) ensures that only interactions of nearby atoms are taken into account, exploiting the near-sightedness of electronic structure.

Given the one-particle basis we can form the density projection and projected ν-correlations (product basis),

$${A}_{v}^{I}:=\mathop{\sum}\limits_{J\ne I}{\phi }_{v}({{{{\boldsymbol{r}}}}}_{IJ}),$$
(16)
$${{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{I}:=\mathop{\prod }\limits_{t=1}^{\nu }{A}_{{v}_{t}}^{I}\qquad {{{\rm{for}}}}\,{{{\boldsymbol{v}}}}=({v}^{1},\ldots ,{v}^{\nu }),\nu =1,2,\ldots .$$
(17)

The \({{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{I}\) form a complete basis of permutation-invariant (PI) polynomials, hence we can approximate

$${{{{\boldsymbol{H}}}}}_{II}={{{{\boldsymbol{H}}}}}_{{{{\rm{on}}}}}({{{{\bf{R}}}}}_{I})\approx {\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}^{{{{\rm{PI}}}}}({{{{\bf{R}}}}}_{I})=\mathop{\sum}\limits_{{{{\boldsymbol{v}}}}}{C}_{{{{\boldsymbol{v}}}}}{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{I},$$
(18)

where \({{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{I}\) are scalar and the parameters \({C}_{{{{\boldsymbol{v}}}}}={({C}_{{{{\boldsymbol{v}}}}}^{{\alpha }_{1}{\alpha }_{2}})}_{{\alpha }_{1},{\alpha }_{2} = 1}^{{N}_{{{{\rm{orb}}}}}}\) have the same dimensionality as HII i.e., Norb × Norb (recall that HII denotes the onsite Hamiltonian block corresponding to orbitals centred at atom I). The summation over v will be restricted to a finite set, the choice of which is a crucial aspect of the model accuracy; cf. in the section “Parameter estimation”.

The expansion (18) incorporates translation and permutation invariance but not yet the O(3)-equivariance (10). Following the general ACE construction29 we can achieve this by simply averaging the representation over the group O(3), i.e.,

$${\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}({{{{\bf{R}}}}}_{I})=-\!\!\!-\!\!\!\!\!\!\!\!{\int}_{{{{\rm{O}}}}(3)}D(Q){\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}^{{{{\rm{PI}}}}}(Q{{{{\bf{R}}}}}_{I})D{(Q)}^{* }dQ,$$
(19)

In step 4. we will review how this integration is explicitly resolved.

2. Parameterisation of H off

The procedure for parameterising Hoff is similar to that of Hon, the main difference being that the presence of a bond rather than a site changes the permutation-invariance. Specifically, we now need to define one-particle basis functions for the bond variable and for the environment variables

$$\begin{array}{ll}{\phi }_{nlm}^{{{{\rm{b}}}}}({{{{\boldsymbol{r}}}}}_{IJ})&={P}_{nl}^{{{{\rm{b}}}}}({r}_{IJ}){Y}_{lm}({\hat{{{{\boldsymbol{r}}}}}}_{IJ}){f}_{{{{\rm{cut}}}}}^{{{{\rm{b}}}}}({r}_{IJ}),\\ {\phi }_{nlm}^{{{{\rm{e}}}}}({{{{\boldsymbol{r}}}}}_{IJ,K})&={P}_{nl}^{{{{\rm{e}}}}}({r}_{IJ,K}){Y}_{lm}({\hat{{{{\boldsymbol{r}}}}}}_{IJ,K}){f}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}({{{{\boldsymbol{r}}}}}_{IJ,K},{{{{\boldsymbol{r}}}}}_{IJ}).\end{array}$$
(20)

where \({{{{\boldsymbol{r}}}}}_{IJ}={r}_{IJ}{\hat{{{{\boldsymbol{r}}}}}}_{IJ}\) and \({{{{\boldsymbol{r}}}}}_{IJ,K}:={{{{\boldsymbol{r}}}}}_{K}-\frac{1}{2}({{{{\boldsymbol{r}}}}}_{I}+{{{{\boldsymbol{r}}}}}_{J})\). Note in particular that the cutoff function for the environment, \({f}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}\), no longer depends only on the radius but may be more general: we require only that \({f}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}({{{{\boldsymbol{r}}}}}_{IJ,K},{{{{\boldsymbol{r}}}}}_{IJ})\) is invariant under joint rotation of both arguments which allows, e.g., ellipsoidal or cylindrical cutoff geometries.

The density projection for the bond environment RIJ is now given by

$${A}_{v}^{IJ}:=\mathop{\sum}\limits_{K\ne I,J}{\phi }_{v}^{{{{\rm{e}}}}}({{{{\boldsymbol{r}}}}}_{IJ,K}),$$
(21)

and the product basis becomes

$${{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{IJ}:={\phi }_{{v}^{0}}^{{{{\rm{b}}}}}({{{{\boldsymbol{r}}}}}_{IJ})\cdot \mathop{\prod }\limits_{t=1}^{\nu }{A}_{{v}^{t}}^{IJ},$$
(22)

for v = (v0, v1, …, vν), with ν = 0, 1, 2, … the correlation order of the bond environment. As in the on-site case, the \({A}_{{{{\boldsymbol{v}}}}}^{IJ}\) form a complete basis of polynomials that are invariant under permutations of RIJ and we may therefore approximate

$${{{{\boldsymbol{H}}}}}_{IJ}={{{{\boldsymbol{H}}}}}_{{{{\rm{off}}}}}\approx {\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}^{{{{\rm{PI}}}}}({{{{\bf{R}}}}}_{IJ}):=\mathop{\sum}\limits_{{{{\boldsymbol{v}}}}}{C}_{{{{\boldsymbol{v}}}}}{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{IJ}.$$
(23)

which we finally symmetrise to obtain also the O(3)-equivariance,

$${\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}({{{{\bf{R}}}}}_{IJ}):=-\!\!\!-\!\!\!\!\!\!\!\!{\int}_{{{{\rm{O}}}}(3)}\,\,\,\,\,D(Q){\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}^{{{{\rm{PI}}}}}(Q{{{{\bf{R}}}}}_{IJ})D{(Q)}^{* }dQ.$$
(24)

3. Parameterisation of S off

The environment-dependence of Hoff enters only through the effective potential Veff which is not present in the overlap matrix definition. Therefore, we simply parameterise Soff by

$${\tilde{{{{\boldsymbol{S}}}}}}_{{{{\rm{off}}}}}({{{{\boldsymbol{r}}}}}_{IJ}):=-\!\!\!-\!\!\!\!\!\!\!\!{\int}_{{{{\rm{O}}}}(3)}\,\,\,\,\,D(Q)\left[\mathop{\sum}\limits_{v}{C}_{v}{\phi }_{v}^{{{{\rm{b}}}}}(Q{{{{\boldsymbol{r}}}}}_{IJ})\right]D{(Q)}^{* }\,dQ.$$
(25)

This is formally equivalent to a Slater Koster representation of two-centre integrals43, which is exact in the case of the overlap. For our ACE parameterisation, this means that we only need to use correlation order ν = 0, i.e. no environment-dependence of the bond integral needs to be considered.

Recursive symmetrisation

In all three cases \({\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}},{\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}},{\tilde{{{{\boldsymbol{S}}}}}}_{{{{\rm{off}}}}}\) we have reduced the parameterisation to an integral over the symmetry group O(3), i.e.,

$$\tilde{{{{\boldsymbol{K}}}}}({{{{\bf{R}}}}}_{\bullet })=-\!\!\!-\!\!\!\!\!\!\!\!{\int}_{{{{\rm{O}}}}(3)}\,\,\,\,\,D(Q)\left[\mathop{\sum}\limits_{{{{\boldsymbol{v}}}}}{C}_{{{{\boldsymbol{v}}}}}{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{\bullet }(Q{{{{\bf{R}}}}}_{\bullet })\right]D{(Q)}^{* },$$
(26)

where \(\tilde{{{{\boldsymbol{K}}}}}\) denotes one of the three model components \({\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}},{\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}},{\tilde{{{{\boldsymbol{S}}}}}}_{{{{\rm{off}}}}}\) and R denotes an atom environment RI or bond environment RIJ. In particular, for off-site overlap Soff,

$${{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{IJ}({{{{\bf{R}}}}}_{IJ})={\phi }_{v}^{{{{\rm{b}}}}}({{{{\boldsymbol{r}}}}}_{IJ}).$$
(27)

In order to make our description clearer, we denote v ≡ nlm with n, l, m being the lists of corresponding indices in \({{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{v}}}}}^{IJ}\). Thus, we can deduce that

$${{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{nlm}}}}}^{\bullet }(Q{{{{\bf{R}}}}}_{\bullet })=\mathop{\sum}\limits_{{{{\boldsymbol{\mu }}}}}{{{{\boldsymbol{D}}}}}_{{{{\boldsymbol{\mu m}}}}}^{{{{\boldsymbol{l}}}}}(Q){{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{nl\mu }}}}}^{\bullet }({{{{\bf{R}}}}}_{\bullet }),$$
(28)

where \({{{{\boldsymbol{D}}}}}_{{{{\boldsymbol{\mu m}}}}}^{{{{\boldsymbol{l}}}}}(Q)={\prod }_{t}{D}_{{\mu }_{t}{m}_{t}}^{{l}_{t}}(Q)\) since the angular dependence of the one-particle basis functions in all cases is in terms of spherical harmonics Ylm. Furthermore, we write

$${C}_{{{{\boldsymbol{v}}}}}=\mathop{\sum }\limits_{\alpha ,\beta =1}^{{N}_{{{{\rm{orb}}}}}}{c}_{{{{\boldsymbol{v}}}}}^{\alpha \beta }{E}^{\alpha \beta },$$
(29)

where \({E}^{\alpha \beta }\in {{\mathbb{R}}}^{{N}_{{{{\rm{orb}}}}}\times {N}_{{{{\rm{orb}}}}}}\) with \({E}_{\alpha ^{\prime} \beta ^{\prime} }^{\alpha \beta }={\delta }_{\alpha \alpha ^{\prime} }{\delta }_{\beta \beta ^{\prime} }\). Inserting these two identities into Eq. (26) yields

$$\begin{array}{ll}\tilde{{{{\boldsymbol{K}}}}}({{{{\bf{R}}}}}_{\bullet })&=\mathop{\sum}\limits_{{{{\boldsymbol{n}}}},{{{\boldsymbol{l}}}},{{{\boldsymbol{m}}}},\alpha ,\beta }{c}_{{{{\boldsymbol{v}}}}}^{\alpha \beta }\mathop{\sum}\limits_{{{{\boldsymbol{\mu }}}}}{{{{\mathcal{U}}}}}_{{{{\boldsymbol{l}}}}{{{\boldsymbol{\mu }}}}{{{\boldsymbol{m}}}}}^{\alpha \beta }{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}{{{\boldsymbol{\mu }}}}}^{\bullet }({{{{\bf{R}}}}}_{\bullet })\\ &=:\mathop{\sum}\limits_{{{{\boldsymbol{n}}}},{{{\boldsymbol{l}}}},{{{\boldsymbol{m}}}},\alpha ,\beta }{c}_{{{{\boldsymbol{nlm}}}}}^{\alpha \beta }{{{{\mathcal{B}}}}}_{{{{\boldsymbol{nlm}}}}}^{\alpha \beta }({{{{\bf{R}}}}}_{\bullet }),\end{array}$$
(30)

where the “generalised coupling coefficients” are given by

$${{{{\mathcal{U}}}}}_{{{{\boldsymbol{l}}}}{{{\boldsymbol{\mu }}}}m}^{\alpha \beta }=-\!\!\!-\!\!\!\!\!\!\!\!{\int}_{{{{\rm{O}}}}(3)}\,\,\,\,\,{{{{\boldsymbol{D}}}}}_{{{{\boldsymbol{\mu }}}}{{{\bf{m}}}}}^{{{{\bf{l}}}}}(Q)D(Q){E}^{\alpha \beta }D{(Q)}^{* }dQ.$$
(31)

Their definition involves an integral over products of Wigner-D matrices which can be precomputed explicitly (i.e., without the need for quadrature which would incur a discretisation error) using the recursion proposed by Dusson et al. 29 and independently by Nigam et al. 34.

Note that Eq. (30) parameterises \(\tilde{K}\) in terms of the scalar parameters \({c}_{{{{\boldsymbol{v}}}}}^{\alpha \beta }\), while the basis functions are now matrix-valued

$${{{{\mathcal{B}}}}}_{{{{\boldsymbol{nlm}}}}}^{\alpha \beta }({{{{\bf{R}}}}}_{\bullet })=\mathop{\sum}\limits_{{{{\boldsymbol{\mu }}}}}{{{{\mathcal{U}}}}}_{{{{\bf{l}}}}{{{\boldsymbol{\mu }}}}{{{\bf{m}}}}}^{\alpha \beta }{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}{{{\boldsymbol{\mu }}}}}^{\bullet }({{{{\bf{R}}}}}_{\bullet }).$$
(32)

Since the coupling coefficients \({{{\mathcal{U}}}}\) are extremely sparse, the operation to obtain \({{{\mathcal{B}}}}\) from A is relatively cheap.

Due to the coupling, the basis \({{{{\mathcal{B}}}}}_{{{{\boldsymbol{nlm}}}}}^{\alpha \beta }\) is normally overcomplete. This linear dependence arises exactly within fixed nl blocks. In a straightforward adaption of the general procedures outlined by Dusson et al. 29 we use elementary linear algebra techniques to reduce the basis in a block-by-block fashion by constructing reduced coupling coefficients \({{{{\mathcal{U}}}}}_{k{{{\boldsymbol{\mu }}}}}^{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}}\) and defining

$${{{{\mathcal{B}}}}}_{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}k}({{{{\bf{R}}}}}_{\bullet }):=\mathop{\sum}\limits_{{{{\boldsymbol{\mu }}}}}{{{{\mathcal{U}}}}}_{k{{{\boldsymbol{\mu }}}}}^{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}}{{{{\boldsymbol{A}}}}}_{{{{\boldsymbol{n}}}}{{{\boldsymbol{l}}}}{{{\boldsymbol{\mu }}}}}^{\bullet }({{{{\bf{R}}}}}_{\bullet }).$$
(33)

In summary, after dropping the detailed multi-index notation and replacing it with a simple enumeration of the basis, we obtain linear models for

$${\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}:={{{{\bf{c}}}}}^{{{{\rm{on}}}}}\cdot {{{{\mathcal{B}}}}}^{{{{\rm{on}}}}},$$
(34)
$${\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}:={{{{\bf{c}}}}}^{{{{\rm{off}}}}}\cdot {\tilde{{{{\mathcal{B}}}}}}^{{{{\rm{off}}}}},$$
(35)
$${\tilde{{{{\boldsymbol{S}}}}}}_{{{{\rm{off}}}}}:={{{{\bf{c}}}}}^{{{{\rm{S}}}}}\cdot {\tilde{{{{\mathcal{B}}}}}}^{{{{\rm{S}}}}},$$
(36)

all of which inherit exactly the translation and permutation invariance as well as O(3)-equivariance of Hon, Hoff, Soff. In the limit of infinite basis size and infinite cutoff radius these models can (in principle) be converged to within arbitrary accuracy. In this sense, they are universal. After imposing the symmetries outlined above we still need to ensure self-adjointness of the assembled Hamiltonian and overlap operators which we achieve by simply substituting \(\tilde{{{{\boldsymbol{H}}}}}\leftarrow \frac{1}{2}(\tilde{{{{\boldsymbol{H}}}}}+{\tilde{{{{\boldsymbol{H}}}}}}^{* })\), and analogously for the overlap.

Validation

We generated DFT data for FCC and BCC aluminium, and followed the procedure outlined above to construct ACE models for the Hamiltonian and overlap using several choices of basis sets. Full details of data generation, parameter estimation and prediction procedures are given in the “Methods” section.

The ACE basis sets need to be carefully chosen for a particular application. The larger the basis, the higher the achievable accuracy, but larger basis sets also carry a risk of loss of transferability through overfitting.

Each basis set is defined by three parameters: the correlation order ν and the maximum polynomial degrees \({n}_{\max },{l}_{\max }\) used in both the radial basis functions Pnl(r) and the angular basis function \({Y}_{lm}(\hat{{{{\boldsymbol{x}}}}})\) of Eqs. (15) and (20). In all our tests, the polynomial degrees are truncated in the manner of total degree, i.e., we let \(n+l\le {d}_{\max }\) for a given \({d}_{\max }\).

For the onsite models, the body order is one more than the correlation order, i.e. ν = 1 corresponds to two body and ν = 2 to three body, while for the offsite models the body order is two more than the correlation order (since each term in the body order expansion depends on the bond in addition to ν particles from the environment). The offsite model has further flexibility in that one can choose different \({d}_{\max }\) for bond and environment, say, \({d}_{\max }^{{{{\rm{b}}}}}\) and \({d}_{\max }^{{{{\rm{e}}}}}\). To avoid overemphasising the impact of environment, we set \({d}_{\max }^{{{{\rm{e}}}}}=\lceil {d}_{\max }^{{{{\rm{b}}}}}/2\rceil\) in our implementation.

We tested the accuracy of the fitted Hamiltonian and overlap matrices using different choices of these basis set parameters. The results are illustrated in Fig. 3. For the onsite blocks HII, we can obtain accurate and transferable results for all sub-blocks with correlation order ν = 2 (body order 3), with no significant overfitting as can be seen from the close agreement of prediction accuracies on the training and test datasets in Fig. 3a. The largest errors are on the dd subblock, which also has the largest matrix entries; the RMSE of ~10 meV on this sub-block corresponds to a ~2% relative error.

Fig. 3: Convergence of Hamiltonian and overlap blocks with respect to the order and maximum degree of the ACE basis set.
figure 3

a Onsite Hamiltonian blocks HII fitted with order 2 models of varying maximum degree. b Offsite Hamiltonian blocks with order 1 ACE models. c Offsite Hamiltonian blocks with order 2 ACE models. In all plots solid lines show errors on training data and dashed lines errors on test data. Colours match the block structure of Fig. 2. Note the distinct markers that distinguish the non-adjoint entries in the offsite Hamiltonian and overlap blocks.

For offsite blocks HIJ we considered models with correlation orders of both ν = 1 (body order 3), Fig. 3b, and ν = 2 (body order 4), Fig. 3c. Both approaches show good convergence in the accuracy of the training set as the maximum degree is increased. However, for sub-blocks that include interaction with s orbitals, we observe that overfitting occurs at lower degrees for the order 2 models than for the order 1 case. We speculate that this might result from the higher order basis sets providing too much flexibility for functions that have relatively simple functional behaviour. Since s orbitals have no intrinsic rotational dependence, all rotational equivariance behaviour in sp and sd sub-blocks comes from how the p or d orbitals are positioned with respect to the environment.

We find the correlation order 1 models provide sufficient accuracy, in fact closely comparable to that of the order 2 models on the training set, so to avoid issues of overfitting we use order 1 only for HIJ, and also limit the maximum polynomial degree for individual sub-blocks as discussed in more detail in the section “Cross-validation and model selection”.

As expected from the lack of environment dependence, the offsite overlap SIJ is very well reproduced at correlation order 0 (body order 2), with a RMSE of 10−4. We do not observe any over-fitting for the offsite overlap so we fixed the maximum polynomial degree for SIJ at 16, the highest value we tried.

Cross-validation and model selection

To eliminate overfitting we used the cross-validation results illustrated in Fig. 3 to select a customised basis set for each sub-block, as set out in Table 1. Note that the maximum polynomial degree can be chosen for each individual sub-block model shown in the schematic in Fig. 1, i.e. there are 9 ss models, 3 × 2 = 6sp models, and 2 × 2 = 4pp models. For the 3 × 3 = 9ss sub-blocks of the offsite Hamiltonian we found it necessary to reduce the degree only for the 3s−3s entry, which arises from the fact that the FHI-aims basis set features two s orbitals in the valence shell of Al.

Table 1 ACE basis set parameters for our optimised models for HII, HIJ and SIJ.

We used our optimised model to predict the Hamiltonian and overlap for the FCC and BCC equilibrium crystal geometries. These were not included in the training set, which comprises only perturbed structures from molecular dynamics, so can be viewed as a test of its transferability. The magnitudes and associated errors in the onsite and one of the nearest-neighbour offsite blocks of the Hamiltonian matrix are illustrated in Fig. 4 for the FCC case; BCC results are of comparable accuracy. These results demonstrate the correct equivariance of the predictions with matrix entries, i.e. entries which should be zero by symmetry being correctly captured. Comparing the upper and lower panels also illustrates that the errors are always orders of magnitude smaller than the corresponding magnitudes, ensuring that the relative error is well controlled (typically ~ 1% or less).

Fig. 4: Accuracy of predicted Hamiltonian blocks for the FCC crystal.
figure 4

Magnitudes (above) and errors (below) for onsite (left) and offsite (right) \(\tilde{{{{\boldsymbol{H}}}}}\) for prediction on the FCC ground state unit cell (not included in the training set).

Prediction of band structures and DoS

So far we have assessed only errors made on the quantities used in fitting the models, i.e. the Hamiltonian and overlap matrix elements. While it is reassuring that these are accurately captured, a stronger test of the predictive power of our formulation is to use it to predict electronic observables such as the band structure and DoS. Figure 5 compares predictions of these quantities for FCC and BCC aluminium with those computed from the reference FHI-aims Hamiltonian and overlap matrices. There is excellent agreement for all occupied bands, and also bands within 10 eV of the Fermi level (which is itself in close agreement between the reference and predicted systems). The DoS was integrated on a dense 9 × 9 × 9k-point mesh and also shows excellent agreement for the occupied states for both FCC and BCC, with significant errors only arising well above the Fermi level, giving confidence in the ability of our model to predict electronic observables.

Fig. 5: FCC and BCC band structures obtained with DFT (red) and predicted by an ACE model with onsite H order 2, and offsite H and S order 1 (blue).
figure 5

Confidence intervals shown with blue ribbons are from a priori analysis of the errors in band spectrum expected to result from known errors in \(\tilde{{{{\boldsymbol{H}}}}}\) and \(\tilde{{{{\boldsymbol{S}}}}}\) (see text). Energies are shown relative to the DFT Fermi level.

The figure also shows confidence intervals for the predicted band structures. These have been estimated to leading order using a simple a priori error analysis to propagate errors in the Hamiltonian \({{\Delta }}{{{\boldsymbol{H}}}}=\tilde{{{{\boldsymbol{H}}}}}-{{{\boldsymbol{H}}}}\) and overlap \({{\Delta }}{{{\boldsymbol{S}}}}=\tilde{{{{\boldsymbol{S}}}}}-{{{\boldsymbol{S}}}}\) to expected errors in the bands using the result44

$$\tilde{\epsilon }-\epsilon \sim \langle \phi | {{\Delta }}{{{\boldsymbol{H}}}}-\epsilon {{\Delta }}{{{\boldsymbol{S}}}}| \phi \rangle$$
(37)

in the limit as ΔH, ΔS → 0, where ϕ, ϵ and \(\tilde{\phi }\), \(\tilde{\epsilon }\) are eigenfunctions and eigenvalues of the reference and approximated systems, respectively. Repeating this for each k-point leads to the error bounds shown. The error estimates prove reliable: the DFT bands, shown in red, are almost always contained within the blue shaded region.

Figure 6 shows the convergence of band structures and DoS with respect to the maximum polynomial degree used in the ACE basis set, and for two choices of correlation order ν = 1 and ν = 2.

Fig. 6: Convergence of FCC and BCC band structures and DoS with respect to the correlation order ν and maximum polynomial degree \({d}_{\max }\) used in the ACE basis set.
figure 6

Dotted lines show the optimised model of the section “Cross-validation and model selection”. a Error in the full DoS. b Error in the occupied states, i.e. those below the Fermi level. c Band error computed with Eq. (38).

The error in the DoS is computed using the first Wasserstein (or ‘earthmover’) distance between the reference and predicted DoS, which is a natural metric for comparing densities of states since it is a distance between probability distributions (see, e.g., ref. 45). The error in band structures is defined as the RMSE in the k-dependent band energies

$${E}_{{{{\rm{band}}}}}({{{\bf{k}}}})=\mathop{\sum }\limits_{i=1}^{{N}_{{{{\rm{orb}}}}}}f\left(\frac{{\epsilon }_{i}-{\varepsilon }_{F}}{\sigma }\right){\epsilon }_{i}({{{\bf{k}}}})$$
(38)

along the high-symmetry k-paths shown in Fig. 5, where f(•) is the Fermi function, εF is the Fermi level of the system and the smearing width is taken to be σ = 0.086 eV, corresponding to an electronic temperature of 1000 K.

The models with untuned parameters shown with the solid lines and dashed lines in Fig. 6 are already sufficiently accurate to produce good band structures and densities of states. However, when increasing the maximum degree used for all subblocks simultaneously, some overfitting can be seen, similar to that observed in the direct validation results of Fig. 3, and once again this arises at lower degrees of 9–12 with ν = 2 than with ν = 1, where maximum degrees of up to 13–14 are possible without overfitting. Errors in the DoS and the band structure for both FCC and BCC are further reduced when using the optimised model of the section “Cross-validation and model selection”, shown with the horizontal dotted lines in the figure to produce band structures with a RMSE of <0.4 eV for both phases.

BCC to FCC transition

As a challenging test, we used our optimised model to predict the Hamiltonian and overlap matrices along the Bain transformation path from BCC to FCC. We then diagonalised the predicted matrices to obtain the eigenvalues and hence the DoS at each point along the path and compared them to reference values computed with FHI-aims for the same systems. As can be seen in Fig. 7, the predicted electronic structure agrees well at all points along the path, suggesting good extrapolative behaviour beyond the training set, which includes only environments accessible from the two minima at moderate temperatures during MD.

Fig. 7: Electronic structure along the transition from BCC \(c/a=1/\sqrt{2}\approx 0.71\) to FCC c/a = 1.
figure 7

a Error in the density of states made by our ACE model with respect to the DFT reference (measured with the Wasserstein distance) as a function of c/a along the Bain path. The solid line shows the full error in the DoS (right vertical axis), while the dashed line shows the error in the occupied states (left vertical axis; note the change of scale). b RMSE error in the electronic band structure (along high-symmetry k-path for the BCC structure) as a function of c/a along the Bain path. Insets illustrate the structure of the cubic cell at points along the path. c Comparisons of densities of state for the ACE model (blue) and DFT (red) at four points along the path, including the BCC (left) and FCC (right) structures.

Notably, nowhere along the path is the accuracy of the ACE model worse than it is for FCC or BCC, although accuracy drops off outside the BCC–FCC range \(1/\sqrt{2} < c/a < 1\). We interpret this as meaning that along the Bain path we see different global structures, but similar local environments, whereas to the left of BCC and to the right of FCC we go outside the range of local environments included in the training set.

Restricted training databases

To further test how well the model generalises across crystal systems, we carried out two further fits using the same optimal parameters as for the final model presented above, but with the training database restricted to either FCC only or BCC only configurations (using subsets of the same MD-generated structures as above). We then checked the ability of the resulting ACE Hamiltonian models to predict the DFT electronic structure of both crystals. The results, illustrated in Fig. 8 and summarised in Table 2 convincingly demonstrate the approach has excellent transferability, since the FCC DoS (and also the associated full band structure) can be accurately predicted using only BCC training data, and vice versa.

Fig. 8: Comparison of FCC and BCC DoS predicted with ACE models for full and restricted training databases.
figure 8

The reference DFT DoS is shown in red. A vertical shift has been applied to separate the DoS for each ACE model.

Table 2 Errors in the FCC and BCC DoS predicted with ACE models for full and restricted training databases.

Defected structures

As a final test of our models’ ability to predict outside of the domain of the training sets, we predicted the electronic structure of a 728 atom 9 × 9 × 9 FCC Aluminium supercell containing a single vacancy. The structure was obtained by deleting an atom from the supercell and performing a geometry optimisation with FHI-aims until the maximum force was <5 × 10−3 eV/Å. We then compared the projected DoS (PDoS) for the atomic orbitals neighbouring the vacancy as obtained with DFT with the predictions of our optimal ACE Hamiltonian model, without refitting. The DFT and ACE PDoS are shown in Fig. 9 and demonstrate convincingly that our model is able to capture the changes in the local electronic structure associated with the introduction of a defect, indicating that it correctly predicts the self-consistent field (SCF) relaxations of the Hamiltonian without a need for an explicit SCF loop in the approximate scheme.

Fig. 9: Comparison of PDoS for a 728-atom Al vacancy supercell between the reference DFT results (red) and those predicted by our ACE Hamiltonian model (blue), which was not trained on vacancy data.
figure 9

PDoS includes orbitals associated with the nearest neighbours of the vacancy. The PDoS for the perfect FCC structure is shown in green to allow the changes in the electronic structure due to the defect to be assessed.

Discussion

We have reported a data-driven scheme to construct predictive models of Hamiltonian and overlap matrices from ab initio data. Our scheme incorporates all relevant symmetry operations, giving an equivariant analytical map from first principles data to linear models for the Hamiltonian and overlap matrices as a function of the atomic and bond environments. We have shown that it is possible to apply our methodology to produce accurate predictions for the band structure in aluminium in both FCC and BCC phases from limited training data. The approach has huge potential for delivering comparable accuracy to DFT while at the same time reaching time and length scales far beyond its capabilities. For example, it opens the door to the high-throughput computation of quantities which depend on electronic properties, such as photoemission spectra, transport coefficients, and electron–phonon coupling constants, all of which can currently only be accurately computed with first principles methods46.

Our results are extremely encouraging, and there are a number of avenues open for further exploration. From a computational performance perspective, we note the evaluation of Hamiltonian and overlap blocks is trivially parallelisable with perfect scaling. Performance enhancements would also come from further optimisation of the ACE basis used to represent the Hamiltonian and overlap matrices, e.g. by sparsifying to reduce the basis set size, or by incorporating non-linearity to reduce the maximum degree required38. Moreover, Bayesian approaches to model selection could be used instead of cross-validation. This would lead to more efficient model construction, as well as the possibility of a priori error estimates on the accuracy of model predictions through uncertainty propagation.

Further comprehensive studies of the dependence of accuracy and transferability of models on quantity and type of training data, as well as an extension to materials and systems with more complex bonding environments are also necessary. In future, we will expand this approach to explore multi-component systems. A further extension will be to fit a potential \(\tilde{E}\) to allow total energy and forces to be predicted by adding a correction to the band energy. For example, \(\tilde{E}\) could be represented by an ACE potential determined from the local atomic environments.

Methods

Data generation

The datasets used in this work are constructed for face-centred cubic (FCC) and body-centred cubic (BCC) phases of Al. Our data was generated through electronic structure calculations with the all-electron numeric atomic orbital code FHI-aims (version 190530)25. We used the Perdew–Burke–Enzerhof (PBE) generalised gradient approximation47 to the exchange-correlation energy within the KS-DFT formulation, and neglected spin in our treatment. The convergence criteria for charge density, sum of eigenvalues, and total energy of the self-consistent cycles were set to 10−5 e/a\({}_{0}^{3}\), 5 × 10−5, and 10−6 eV, respectively. The default tight FHI-aims basis set and integration grid definitions were used, which uses a basis set confinement with a maximum radial basis function extent of 6 Å. We modify the set of atomic basis functions that we employ to achieve optimal computational efficiency. Systematic convergence tests showed that band energies converged up to 10 eV above the Fermi level when using a minimal basis plus a single d orbital from Tier 1. Therefore, we used a basis set comprising s and p orbitals of the minimal basis set plus one d orbital from the Tier 1 setting, yielding the 14 atomic basis functions for Al illustrated in Fig. 9b.

The optimal equilibrium lattice constants for FCC and BCC Al were determined in primitive cells with a 9 × 9 × 9 Monkhorst–Pack k-point mesh48 to be 4.05 and 3.29 Å, respectively. To sample a variety of distorted atomic configurations for Al, we carried out molecular dynamics (MD) simulations at a temperature of 500 K using 9 × 9 × 9 = 729 atom supercells of the primitive FCC and BCC unit cells. MD simuations for each phase were performed in the NPT ensemble using a 5 fs timestep and the embedded atom method (EAM) potential proposed by Zhou et al. 49. Single point DFT total energy calculations were carried out on the final configurations of each of these 500 MD simulations using FHI-aims with the parameters described above and a single k-point at Γ. We stored the resulting H and S matrices giving a dataset

$$\left\{({H}_{II},{{{{\bf{R}}}}}_{I})\right\},$$
(39)
$$\left\{({H}_{IJ},{{{{\boldsymbol{r}}}}}_{IJ},{{{{\bf{R}}}}}_{IJ})\right\},$$
(40)
$$\left\{({S}_{IJ},{{{{\boldsymbol{r}}}}}_{IJ},{{{{\bf{R}}}}}_{IJ})\right\}.$$
(41)

where II, IJ indicate on- and off-site blocks of the Hamiltonian and overlap matrices while r and R are the corresponding atomic structure data as defined in the section “Hamiltonians for extended materials in atomic orbital basis representation”. For the optimised model reported in the section “Cross-validation and model selection”, we used 1000 training and 1000 test blocks for the onsite part of the Hamiltonian and 2000 training and 2000 test blocks for the offsite Hamiltonian and overlap matrices (with more offsite than onsite data to reflect the far greater number of offsite blocks in the target matrices). Equal numbers of samples were taken from the FCC and BCC MD data.

Parameter estimation

We have defined three linear models for equivariant components of Hamiltonian and overlap matrices (up to the choice of approximation parameters). It remains to specify a parameter estimation procedure to determine the model parameters which typically number in the thousands to tens of thousands. There are essentially two choices we can make: (i) fit the models to observed properties such as band structure, energies, forces; or (ii) fit the models directly to match a reference Hamiltonian. Both approaches have advantages and disadvantages. We have chosen to follow route (ii) which is particularly attractive from both theoretical and numerical perspectives as it results in a linear least-squares problem.

Let \(\tilde{{{{\boldsymbol{K}}}}}={{{\boldsymbol{c}}}}\cdot {{{\mathcal{B}}}}\) be one of the three linear models, and \({\{\left(\right.{{{{\boldsymbol{K}}}}}_{* }^{(\tau )},{{{{\bf{R}}}}}_{\bullet }^{(\tau )}\}}_{\tau }\) the corresponding training set, then we set up the loss function

$${L}_{0}({{{\boldsymbol{c}}}})=\mathop{\sum}\limits_{\tau }\left|\right.{{{{\boldsymbol{K}}}}}_{* }^{(\tau )}-\tilde{{{{\boldsymbol{K}}}}}({{{{\bf{R}}}}}_{\bullet }^{(\tau )}){\left|\right.}^{2}.$$
(42)

Since \(\tilde{{{{\boldsymbol{K}}}}}\) is linear in c it follows that L can be rewritten as

$${L}_{0}({{{\boldsymbol{c}}}})={\left\Vert {{\Psi }}{{{\boldsymbol{c}}}}-{{{\boldsymbol{y}}}}\right\Vert }^{2},$$
(43)

where Ψ is the design matrix and y contains the reference model values. To prevent overfitting, we regularise the least-squares system with a generalised Tychonov term,

$${L}_{\lambda }({{{\boldsymbol{c}}}}):={\left\Vert {{\Psi }}{{{\boldsymbol{c}}}}-{{{\boldsymbol{y}}}}\right\Vert }^{2}+\lambda {\left\Vert {{\Gamma }}{{{\boldsymbol{c}}}}\right\Vert }^{2},$$
(44)

where Γ = diag(Γkk) with Γkk an estimate for the curvature of the kth basis function which enforces smoothness of the model29,38 and λ is a regularisation parameter. Throughout this work, we define Γkk by

$${{{\Gamma }}}_{kk}=\mathop{\sum}\limits_{\nu }({n}_{\nu }^{2}+{l}_{\nu }^{2}+{m}_{\nu }^{2}),$$
(45)

and λ is always set to be 10−7. We then solve the regularised least squares system (44) using an iterative LSQR algorithm with termination tolerance 10−6.

For the radial basis set Pnl we used

$$\xi (r)={\left(\frac{1+{r}_{0}}{1+r}\right)}^{2}$$
(46)
$${P}_{nl}(r)={Q}_{n}(\xi (r))$$
(47)

where Qn is a polynomial of degree n such that \(\int\nolimits_{{\xi }_{0}}^{{\xi }_{1}}{Q}_{n}(\xi ){Q}_{n^{\prime} }(\xi ){\mathrm {d}}\xi ={\delta }_{nn^{\prime} }\) and [ξ0, ξ1] = ξ([0, rcut]); see ref. 29 for full details.

The envelope function for both on-site term and off-site environment basis function is defined as

$${f}_{{{{\rm{cut}}}}}(r;{r}_{{{{\rm{cut}}}}})={f}_{{{{\rm{cut}}}}}^{{{{\rm{b}}}}}(r;{r}_{{{{\rm{cut}}}}})=\left\{\begin{array}{ll}{({r}^{2}/{r}_{{{{\rm{cut}}}}}^{2}-1)}^{2},&r\le {r}_{{{{\rm{cut}}}}},\\ 0,&r \,> \,{r}_{{{{\rm{cut}}}}},\end{array}\right.$$
(48)

and that for the offsite environment is given by a bond-related cylindrical cutoff function

$$\begin{array}{l}{f}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}(z,r;{z}_{{{{\rm{cut}}}}},{r}_{{{{\rm{cut}}}}})\\ =\left\{\begin{array}{ll}{\left(\frac{{r}^{2}}{{r}_{{{{\rm{cut}}}}}^{2}}-1\right)}^{2}&{\left(\frac{{z}^{2}}{{({z}_{{{{\rm{cut}}}}}+{l}_{{{{\rm{bond}}}}}/2)}^{2}}-1\right)}^{2},\\ &r\le {r}_{{{{\rm{cut}}}}},| z| \le {z}_{{{{\rm{cut}}}}}+{l}_{{{{\rm{bond}}}}}/2,\\ 0,&{{{\rm{otherwise}}}},\end{array}\right.\end{array}$$

where (z, r, θ) are the cylindrical coordinates of an environment atom (though θ is not used in this definition) and lbond is the length of the corresponding bond. Note that both fcut and \({f}_{{{{\rm{cut}}}}}^{{{{\rm{b}}}}}\) are rotation invariant, they will not influence the equivariance of the basis at all. Meanwhile, though the cylindrical curoff function \({f}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}\) is bond-dependent, it can be easily checked that it does no harm to rotation symmetry as well.

In our implementation, the on-site cutoff rcut is chosen to be 9.0 Å for Hon and the off-site bond cutoff is set to be 10.0 Å. We set \({r}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}={z}_{{{{\rm{cut}}}}}^{{{{\rm{e}}}}}=5.0\) Å for the off-site environment.

As noted above, we used correlation order ν = 0 for the offsite overlap SII since these blocks are not environment-dependent. For HII we used correlation order ν = 2 throughout, while for HIJ we tested correlation orders of both ν = 1 and ν = 2. The maximum polynomial degree was chosen on a case-by-case basis to control the balance between accuracy and transferability through a cross-validation procedure as discussed in more detail in the section “Methods” in the main text.

Prediction

The software implementation of our method follows the workflow illustrated in Fig. 1. The Julia packages ACE.jl50 and ACEhamiltonians.jl implement the general Atomic Cluster Expansion basis sets and the specialisation to fitting and predicting Hamiltonians, respectively. Given an input configuration R we use the scheme described above to predict \({\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{on}}}}}({{{\bf{R}}}}),{\tilde{{{{\boldsymbol{H}}}}}}_{{{{\rm{off}}}}}({{{\bf{R}}}})\) and \({\tilde{{{{\boldsymbol{S}}}}}}_{{{{\rm{off}}}}}({{{\bf{R}}}})\). We then assemble complete approximate Cartesian Hamiltonian and overlap matrices \(\tilde{{{{\boldsymbol{H}}}}}\) and \(\tilde{{{{\boldsymbol{S}}}}}\) from the predicted blocks. We can construct k-dependent variants and associated bandstructures via a standalone Julia implementation contained within the ACEhamiltonians.jl package.

Using either the reference or the predicted matrices we can solve the generalised eigenproblems of the form

$${{{\boldsymbol{H}}}}({{{\bf{k}}}}){\phi }_{i}={\epsilon }_{i}{{{\boldsymbol{S}}}}({{{\bf{k}}}}){\phi }_{i}$$
(49)
$$\tilde{{{{\boldsymbol{H}}}}}({{{\bf{k}}}}){\tilde{\phi }}_{i}={\tilde{\epsilon }}_{i}\tilde{{{{\boldsymbol{S}}}}}({{{\bf{k}}}}){\tilde{\phi }}_{i}$$
(50)

to obtain k-dependent band energies \({\epsilon }_{i},\tilde{{\epsilon }_{i}}\) and orbitals (eigenfunctions) \({\phi }_{i},{\tilde{\phi }}_{i}\) for the reference and predicted systems, respectively, where i = 1, …, Norb and in this work Norb = 14. Band structures, the density of states (DoS) and other derived quantities can be computed by post-processing the band energies following standard practices.

Transformation of H and S from real to reciprocal space representation

According to Bloch’s theorem, in crystal-periodic structures, the Hamiltonian and overlap matrices defined in terms of real-space atomic orbitals can be transformed into a block-diagonal form and solved via a set of Nk independent generalised eigenvalue problems where each block corresponds to a vector k within the reciprocal unit cell:

$${{{\bf{H}}}}({{{\boldsymbol{k}}}}){\psi }_{i{{{\boldsymbol{k}}}}}={\epsilon }_{i{{{\boldsymbol{k}}}}}{{{\bf{S}}}}({{{\boldsymbol{k}}}}){\psi }_{i{{{\boldsymbol{k}}}}}\quad i=1,2,\cdots$$
(51)

where ψνk are Bloch wave functions and H(k) and S(k) are Hamiltonian and overlap matrices defined in terms of a discrete crystal-periodic basis.

For this, we define crystal-periodic generalised basis functions χa,k from real-space basis functions as follows:

$${\chi }_{a{{{\boldsymbol{k}}}}}({{{\boldsymbol{x}}}})=\mathop{\sum}\limits_{N}\exp \{i{{{\boldsymbol{k}}}}\cdot {{{\bf{N}}}}{{{\bf{L}}}}\}{\chi }_{a}({{{\boldsymbol{x}}}}+{{{\bf{N}}}}{{{\bf{L}}}}).$$
(52)

In Eq. (52), L refers to the column matrix of lattice vectors and N = (N1, N2, N3) is an index vector that specifies the position of the unit cell (in multiples of the lattice vectors) in which orbital χa is located.

The matrix elements of H(k) and S(k), respectively, are constructed via

$${H}_{ab}({{{\boldsymbol{k}}}})=\left\langle {\chi }_{a{{{\boldsymbol{k}}}}}| \hat{H}| {\chi }_{b{{{\boldsymbol{k}}}}}\right\rangle =$$
(53)
$$\sum\limits_{N,N'} \exp \left\{ i {{\boldsymbol{k}}}\cdot\left({\mathbf{N}}'-{\mathbf{N}}\right)\cdot{\mathbf{L}}\right\} \mathop{\underbrace{{\left\langle \chi_{a,N'} \right|} {\hat{H}} {\left| \chi_{b,N} \right\rangle}}}\limits_{\substack{=H_{ab}({\mathbf{N}},{\mathbf{N}}')}}$$
(54)

and

$${S}_{ab}({{{\boldsymbol{k}}}})=\left\langle {\chi }_{a{{{\boldsymbol{k}}}}}| {\chi }_{b{{{\boldsymbol{k}}}}}\right\rangle =$$
(55)
$$\sum\limits_{N,N'} \exp{ \left\{ i{\boldsymbol{k}}\cdot ({\mathbf{N}}'-{\mathbf{N}})\cdot{\mathbf{L}}\right\} } \mathop{\underbrace{\left\langle \chi_{a,N'} | \chi_{b,N}\right\rangle}}\limits_{\substack{=S_{ab}\left({\mathbf{N}},{\mathbf{N}}'\right)}}$$
(56)

where \({H}_{ab}({{{\bf{N}}}},{{{\bf{N}}}}^{\prime} )\) and \({S}_{ab}({{{\bf{N}}}},{{{\bf{N}}}}^{\prime} )\) are as defined in Eqs. (6) and (7) for atomic orbitals defined in different unit cells N and \({{{\bf{N}}}}^{\prime}\).

In this work, we use this transformation to map the real-space matrices to arbitrarily dense k-grids as is common practice for localised basis sets such as atomic orbitals or maximally localised Wannier functions. We then calculate eigenvalues ϵνk at arbitrary points in reciprocal space to calculate converged electronic densities-of-state and band structures.

Equivariance of H IJ

For the real space Hamiltonian H(R), we decompose it as \({{{\bf{H}}}}({{{\bf{R}}}})={\left({{{{\boldsymbol{H}}}}}_{IJ}\right)}_{I,J = 1}^{{N}_{{{{\rm{atom}}}}}}\) (cf. Fig. 2). Denote \(a=(n,l,m;I):=(\alpha ;I),b=(n^{\prime} ,l^{\prime} ,m^{\prime} ;J):=(\beta ;J)\), we may then write

$${{{{\boldsymbol{H}}}}}_{IJ}^{\alpha \beta }({{{\bf{R}}}})=\langle {\chi }_{a}| \hat{H}| {\chi }_{b}\rangle .$$

In the definition of χa, the radial basis Rnl(r) is invariant under rotation and Ylm(Q(θ, ϕ)) can be expressed as linear combination of Ylμ(θ, ϕ), i.e.,

$${\chi }_{(n,l,m;I)}(Q{{{\boldsymbol{x}}}};Q{{{\bf{R}}}})=\mathop{\sum}\limits_{\mu }{D}_{\mu m}^{l}{\chi }_{(n,l,\mu ;I)}({{{\boldsymbol{x}}}};{{{\bf{R}}}}).$$
(57)

Here, χ is R-dependent since it is atom-centred. Besides,

$$\begin{array}{l}{{{{\boldsymbol{H}}}}}_{IJ}^{\alpha \beta }(Q{{{\bf{R}}}})\\ ={\int}_{{{\mathbb{R}}}^{3}}{\chi }_{a}{({{{\boldsymbol{x}}}};Q{{{\bf{R}}}})}^{* }{V}_{{{{\rm{eff}}}}}({{{\boldsymbol{x}}}},Q{{{\bf{R}}}}){\chi }_{b}({{{\boldsymbol{x}}}};Q{{{\bf{R}}}})d{{{\boldsymbol{x}}}}\\ ={\int}_{{{\mathbb{R}}}^{3}}{\chi }_{a}{(Q{{{\boldsymbol{x}}}};Q{{{\bf{R}}}})}^{* }{V}_{{{{\rm{eff}}}}}(Q{{{\boldsymbol{x}}}},Q{{{\bf{R}}}}){\chi }_{b}(Q{{{\boldsymbol{x}}}};Q{{{\bf{R}}}})d{{{\boldsymbol{x}}}}\\ ={\int}_{{{\mathbb{R}}}^{3}}{\chi }_{a}{(Q{{{\boldsymbol{x}}}};Q{{{\bf{R}}}})}^{* }{V}_{{{{\rm{eff}}}}}({{{\boldsymbol{x}}}},{{{\bf{R}}}}){\chi }_{b}(Q{{{\boldsymbol{x}}}};Q{{{\bf{R}}}})d{{{\boldsymbol{x}}}}.\end{array}$$
(58)

Combining Eqs. (57) and (58), we see immediately that

$${{{{\boldsymbol{H}}}}}_{IJ}(Q{{{\bf{R}}}})=D{(Q)}^{* }{{{{\boldsymbol{H}}}}}_{IJ}({{{\bf{R}}}})D(Q),$$
(59)

where

$$D(Q)={{{\rm{Diag}}}}({D}^{{l}_{1}}(Q),{D}^{{l}_{2}}(Q),\cdots \,),$$
(60)

and \({D}^{{l}_{i}}\) indicate the Wigner-D matrices.

Sometimes, the angular term in Eq. (5) is chosen to use real spherical harmonics rather than complex ones, i.e.,

$${\chi }_{a}({{{\boldsymbol{x}}}})={R}_{nl}(r){S}_{lm}(\theta ,\phi ),\,{{{\rm{and}}}}$$
(61)
$${S}_{lm}=\mathop{\sum}\limits_{m^{\prime} }{C}_{mm^{\prime} }{Y}_{lm},$$
(62)

where \(\{{C}_{mm^{\prime} }\}\) are the corresponding transforming coefficients. Equivalently, we may go through all possible indices m with respect to a fixedland obtain the following matrix form:

$${{{{\bf{S}}}}}_{l}({{{\bf{R}}}})={C}_{l}{{{{\bf{Y}}}}}_{l}({{{\bf{R}}}}).$$
(63)

In this case, the equivariance of HIJ simply follows, just with the varied equivariant matrix

$$\tilde{D}(Q)={{{\rm{Diag}}}}({\tilde{D}}^{{l}_{1}}(Q),{\tilde{D}}^{{l}_{2}}(Q),\cdots \,),$$
(64)

and \({\tilde{D}}^{{l}_{i}}(Q)={C}_{{l}_{i}}{D}^{{l}_{i}}(Q)\).