HInet: Generating Neutral Hydrogen from Dark Matter with Neural Networks

Digvijay Wadekar; Francisco Villaescusa-Navarro; Shirley Ho; Laurence Perreault-Levasseur

doi:10.3847/1538-4357/ac033a

1. Introduction

Different astronomical surveys have allowed us to quantify the amount and properties of several fundamental quantities like the age, geometry, and expansion rate of the universe and the amount of dark matter (DM) and dark energy. Some of the largest surveys in the past have been spectroscopic surveys of galaxies, which have mapped the universe at low redshifts. In future surveys, we want to observe the universe at high z because the cosmic volume is larger and the theoretical predictions at high z are relatively easier, as the density field is more linear.

The traditional technique of getting spectra of individual galaxies becomes harder to apply at high z, as the galaxies become fainter and sparser. One of the most promising alternative techniques to observe the high-z universe is intensity mapping (IM; Bharadwaj et al. 2001; Bharadwaj & Sethi 2001; Chang et al. 2008; Peterson et al. 2009; Pullen et al. 2014) The advantage of IM over the traditional techniques is that it does not rely on resolving point sources but instead measures the emission from many unresolved galaxies tracing the cosmic web in redshift space.

In this paper we focus our attention on IM of the 21 cm line from cosmic neutral hydrogen (hereafter H i). It is worth noting that 21 cm IM is not just restricted to the high z but is applicable over a wide range of redshifts (z = 0 to z ≃ 20). 21 cm surveys represent a different way to observe the universe, and they enable new cross-correlations with surveys at other wavelengths, which are very effective to mitigate systematic effects. Besides traditional bounds on cosmological parameters (Bull et al. 2015; Villaescusa-Navarro et al. 2017), 21 cm surveys can be used to improve our knowledge on the sum of neutrino masses (Villaescusa-Navarro et al. 2015a), warm DM (Carucci et al. 2015), modified gravity (Carucci et al. 2017), primordial non-Gaussianity (Karagiannis et al. 2020), and axion DM (Bauer et al. 2021), among many other things.

In this work, we focus our attention on the post-reionization regime (z < 6) of 21 cm IM,⁷ for which various radio surveys are planned or are already collecting data: Canadian Hydrogen Intensity Mapping Experiment (CHIME)⁸ Giant Meterwave Radio Telescope (GMRT),⁹ HIRAX (The Hydrogen Intensity and Real-time Analysis eXperiment),¹⁰ TIANLAI,¹¹ Five-hundred-meter Aperture Spherical Telescope (FAST),¹² ASKAP,¹³ MeerKAT,¹⁴ PUMA,¹⁵ and SKA (The Square Kilometer Array).¹⁶

One of the major aims of these surveys is to accurately constrain the value of the cosmological parameters. In order to achieve this, accurate theoretical predictions are needed to extract the cosmological information from the collected data. In the linear regime, these predictions are easy to obtain from analytical models and are accurate. However, there is a large amount of cosmological information that lies beyond the linear scales, particularly at low z. In this regime, one avenue to obtain such accurate predictions is from hydrodynamic simulations.

Current state-of-the-art hydrodynamic simulations have a very high computational cost, and they simulate a limited cosmological volume. For example, simulating the (75 h⁻¹ Mpc)³ IllustrisTNG box required 18 million CPU hours (Nelson et al. 2019; Pillepich et al. 2018; Weinberger et al. 2017). In order to make robust predictions for upcoming astronomical surveys, we need to simulate much larger cosmological volumes, of the order of tens to hundreds of (Gpc h⁻¹)³ (Modi et al. 2019a). Such large mock simulations would help us in various ways: (1) to study the effects of various observational systematics on the statistical properties of the tracers, (2) to obtain theoretical predictions for different cosmologies, (3) to determine which summary statistics are the most appropriate to constrain the value of different cosmological parameters, and (4) to quantify the cosmic variance in the surveys (i.e., to compute the covariance matrix). One way to simulate large H i volumes is to first generate DM fields using the relatively less expensive DM-only simulations. We then need quick and reliable methods to "paint" H i directly on the DM field. We now discuss some promising techniques in this regard.

1.1. Traditional Emulation Techniques

One of the most popular theoretical techniques used to make mock baryonic simulations is called the halo occupation distribution (HOD). HOD was first used to probabilistically model the number of galaxies residing in a host halo (Seljak 2000; Peacock & Smith 2000; Scoccimarro et al. 2001; Berlind & Weinberg 2002). The HOD technique assumes that the properties of various baryonic structures inside a halo are governed solely by the halo mass, and it ignores all other halo properties. The HOD technique therefore assumes a simple parametric relation between the halo mass and the baryonic properties and uses hydrodynamical simulations (and observations, if available) to calibrate the parameters in this relation. Recent applications of the HOD technique to H i have been in initial field reconstruction and testing the UV background effect on H i maps (Modi et al. 2019a, 2019b) based on the HOD model of Villaescusa-Navarro et al. (2018, hereafter VN18).

However, HOD ignores all environmental effects on the H i abundance and clustering. It is important to note that numerical simulations have shown that cosmological properties like the clustering of halos and galaxies are affected by properties other than halo mass, such as halo environment, halo concentration, spin, and velocity anisotropy (Wechsler et al. 2006; Dalal et al. 2008; Paranjape et al. 2018; Hadzhiyska et al. 2020). This phenomenon is referred to as assembly bias or secondary bias (Sheth & Tormen 2004; Gao et al. 2005).

Other techniques can be used to make mock baryonic simulations, such as subhalo abundance matching (SHAM) and semianalytic models (SAMs). SHAM involves assigning the highest H i mass to the most massive halos and vice versa, but it relies on multiple assumptions, such as that more massive baryonic structures are hosted by the most massive halos and that a monotonic relation, which is free of scatter, exists between masses of baryonic structures and halo masses. SAM, on the other hand, uses a set of simplified equations to model the key baryonic processes in the hydrodynamic simulations (see, e.g., Benson 2012).

It is worth mentioning that, apart from using simulations, there are also proposed perturbative forward-model techniques to evolve the tracer fields directly from linear initial conditions by using biasing schemes (Schmittfull et al. 2019; Modi et al. 2020), or modeling tracer fields by combining biasing schemes and N-body simulations (Sinigaglia et al. 2020).

1.2. Convolutional Neural Networks

Convolutional neural networks (CNNs) have been recently applied to numerous areas of physical research (Carleo et al. 2019) and have a lot of potential applications to cosmology (Ravanbakhsh et al. 2017; Modi et al. 2018; Giusarma et al. 2019; He et al. 2019; Ntampaka et al. 2019; Zamudio-Fernandez et al. 2019; Zhang et al. 2019; Kodi Ramanah et al. 2020). An important property of CNNs that makes them useful for cosmological applications is that they are translationally equivariant. In this paper, we use a modified version of a neural network architecture called U-net to generate mock 3D H i fields from a given DM field. Although we focus on H i in the post-reionization regime in this paper, it is worth mentioning some of the recent work on using neural networks to study the H i field during the epoch of reionization (Shimabukuro & Semelin 2017; Chardin et al. 2019; Gillet et al. 2019; Hassan et al. 2020; Kwon et al. 2020; List & Lewis 2020; Mangena et al. 2020; Villanueva-Domingo & Villaescusa-Navarro 2021).

Let us now briefly highlight some of the advantages of neural networks over the above traditional approaches. The HOD formalism typically assumes that the spatial H i density ρ_HI(x) at a particular point inside a halo only depends on the mass of the halo and the distance to its center:

$\begin{eqnarray}&&{\rho }_{\mathrm{HI}}({\bf{x}})=f({M}_{\mathrm{halo}},| {\bf{x}}-{{\bf{x}}}_{\mathrm{center}}| ).\end{eqnarray} \tag{ 1 }$

However, VN18 showed that including only the halo mass is not enough for precisely modeling the clustering of the H i field. A more general model for the H i field should also include the information on the environment of the halo and can be roughly intuited as

$\begin{eqnarray}&&{\rho }_{\mathrm{HI}}({\bf{x}})=g({\rho }_{{\text{}}m}({\bf{x}}),{\rho }_{{\text{}}m}({\bf{x}}^{\prime} )),\end{eqnarray} \tag{ 2 }$

where ${\rho }_{m}({\bf{x}}^{\prime} )$ is the matter density at points in the neighborhood¹⁷ of x. Neural networks are universal fitting functions (Hornik et al. 1989) and can be used to accurately approximate the function g in Equation (2); this is the goal of our paper.

The paper is organized as follows. In Section 2, we briefly describe the hydrodynamical simulations that we have used. We then present the specific architecture of our machine-learning model and the method used in Section 3. We discuss the parameters of our benchmark HOD model in Section 4. We present our results in Section 5. Finally, we discuss our results in Section 6 and conclude in Section 7.

2. Data

The data we use to train, validate, and test our network are obtained from the TNG100-1 simulation produced by the IllustrisTNG collaboration (Pillepich et al. 2018).¹⁸ That simulation is one of the current state-of-the-art hydrodynamical simulations and includes a wide range of relevant physical effects, such as radiative cooling, star formation, metal enrichment, supernova and AGN feedback, and magnetic fields. In this work, we choose to model the H i field at low redshift: z = 1; this is because modeling any baryonic field is more challenging at lower redshifts owing to the density fluctuations being relatively nonlinear. We therefore expect our neural network method to perform even better at higher redshifts.

The side length of the simulated box is 75 h⁻¹ Mpc. We first compute the H i density field by assigning H i masses of gas cells to a grid of 2048³ cells using the cloud-in-cell (CIC) interpolation scheme. The spatial resolution of the DM and H i fields is therefore very high, ∼35 h⁻¹ kpc. TNG provides both the hydrodynamical simulation output and the computationally cheaper DM-only simulation (TNG100-1-DM), evolved from the same initial conditions. Our goal is to train a neural network such that it can produce the H i field from the DM-only simulation. The network performs the mapping in 3D, at a fixed redshift.

2.1. Data Preprocessing

The overdensity in the H i field, ${\delta }_{\mathrm{HI}\,}={\rho }_{\mathrm{HI}}/{\bar{\rho }}_{\mathrm{HI}}-1$ , varies in the TNG100-1 simulation across ∼9 orders of magnitude. Because the resolution of the TNG100-1 simulation is much higher than the one expected from upcoming surveys, we smooth the H i data with a top-hat filter with a smoothing radius of 300 h⁻¹ kpc. This has a twofold advantage: First, the grid resolution for the H i field is lowered to 140 h⁻¹ kpc, which reduces the size of the data set. Second, the dynamical range over which the H i density field varies is reduced: δ_HI varies over three orders of magnitude, and this reduces the sparsity problem, which we later discuss in Section 3. Note that we did not change the resolution of the input DM field, in order to use as much information in the DM field as possible. Because the training of deep learning models is facilitated when the input data are in the ${ \mathcal O }(1)$ range, we further perform the scaling:

$\begin{eqnarray}\begin{array}{rcl}{\tilde{\delta }}_{\mathrm{HI}\,}({\bf{x}}) & \equiv & \displaystyle \frac{1}{2}{\left(1+{\delta }_{\mathrm{HI}\,}({\bf{x}})\right)}^{0.2}\\ {\tilde{\delta }}_{\mathrm{DM}}({\bf{x}}) & \equiv & \displaystyle \frac{1}{5}{\left(1+{\delta }_{\mathrm{DM}}({\bf{x}})\right)}^{0.1},\end{array}\end{eqnarray} \tag{ 3 }$

where δ_HI is the smoothed H i field. The above rescaling gets both ${\tilde{\delta }}_{\mathrm{HI}\,}$ and ${\tilde{\delta }}_{\mathrm{DM}}$ to be nearly in the range [0, 3]. We used a power law instead of a logarithm because the power-law distribution has a flatter tail for high values of δ_HI. We discuss why having a flatter tail is important in Section 3.2.

3. Methods

3.1. Choice of Network Architecture

The deep neural network architecture used in this paper is inspired by the Deep Density Displacement Model (D³M) of He et al. (2019). D³M is the generalization of the standard U-net, which was first proposed by Ronneberger et al. (2015) for use in medical applications. Convolution neural networks, like the U-net, naturally provide properties that are relevant for our problem, such as translational invariance. Variations of the D³M model have been used for large-scale structure applications like learning galaxy modeling and neutrino effects in cosmology (Giusarma et al. 2019; Yip et al. 2019; Zhang et al. 2019). The network architecture we use in this work is shown in Figure 1, and further details are presented in Appendix.

**Figure 1.** This scheme shows the architecture we use to find the mapping from DM to HI in 3D. Labels represent [number of channels × (number of voxels)³]. Further details can be found in Appendix. Notice that the volume of the H i output is 1/8 of the input DM volume, which enables inclusion of some of the nonlocal (environmental) information to predict the H i field.
Download figure:
Standard image High-resolution image

3.2. Challenge of Data Sparsity

Let us discuss an important challenge we face when working with HI data from simulations. The most relevant summary statistics for 21 cm IM, e.g., the HI power spectrum or the HI probability density function (pdf), are dominated by the voxels with the highest HI density. The reason for this is that 21 cm IM is sensitive to the mass-weighted HI, rather than the volume-weighted HI of, e.g., the Lyα forest. Unfortunately, those dense voxels are rare in the simulation. For instance, there are ∼10⁵ halos with M_halo ≥ 10¹⁰ h⁻¹ M_⊙ in our data set, which translates into a very small subset (∼1 in 10³) of voxels of our training sample having a nonnegligible HI density (see the distribution of the voxel H i masses in Figure 7). Because of such sparsity in our data distribution, our model could easily achieve a high accuracy by predicting the low-mass H i voxels, ignoring the high-mass H i voxels; this makes our model harder to train.

**Figure 2.** The top row shows the projected H i density field at z = 1 from a region of (48 Mpc h⁻¹)³ for the labeled cases. The bottom left and right panels show the residuals between the IllustrisTNG simulation and the H i fields obtained from the U-net and HOD method, respectively. The color scale is anchored for each row. The residuals for U-net are smaller than those for HOD in areas of high H i density.
Download figure:
Standard image High-resolution image

Modeling the fields in Lagrangian space rather than Eulerian space can in principle reduce the sparsity problem. This is because in Lagrangian space we use the displacement field, which is distributed over a larger region of space, as compared to the density field, which is largely concentrated inside the halo boundaries (He et al. 2019). However, modeling the H i field is not possible in Lagrangian space because, unlike DM, the number of gas particles is not fixed in the simulation.¹⁹

A similar sparsity challenge exists when predicting the galaxy positions from a 3D DM field. Some of the previous neural-network-based studies have tried to overcome this challenge by using a combination of two neural networks (two-phase model; Modi et al. 2018; Yip et al. 2019; Zhang et al. 2019): the first phase predicts the halo/galaxy position, and the second phase predicts the mass of the halo/ number of galaxies. However, H i, unlike galaxies, is scattered over a wide volume of the universe, not just at the centers of large DM halos. Ignoring the low-mass voxels would remove the Lyα forest from our data, which is not ideal, as the Lyα forest is a powerful cosmological probe by itself, and its contribution to the 21 cm signal at high redshift becomes more important (VN18).

We therefore implement a different kind of two-phase model: the first (second) network is geared toward predicting the low-mass (high-mass) H i voxels. We have used the same U-net architecture shown in Figure 1 for each of the two phases of our model. We provide further details of the two-phase model in Appendix.

3.3. Training the Network

As the memory of GPUs is limited, we split the 75 h⁻¹ Mpc TNG100 volume into smaller subboxes. We train the U-net to predict H i boxes of side length ∼1.17 h⁻¹ Mpc and containing 8³ voxels using input DM boxes of side length ∼2.34 h⁻¹ Mpc and 64³ voxels. As illustrated in Figure 1, we want to predict the H i in the subbox residing at the center of a larger DM input. The motivation of using the larger DM box is to account for environmental information for voxels near the boundaries of the HI volume.

We obtain ∼2.5 × 10⁵ nonoverlapping H i subboxes when splitting the TNG100-1 volume. We divide these subboxes into three chunks: ∼60% of the subboxes are used for training the network, 12.5% for validation, and 27.5% for testing the network. We have constructed the test set such that it comprises all the subboxes that correspond to a larger box of side length ∼48 h⁻¹ Mpc. The total number of trainable parameters in the U-net shown in Figure 1 is 2.1 × 10⁷, which, although seemingly gigantic, can be optimized using the technique of automatic differentiation (gradient descent). The gradients are calculated based on the following loss function:

$\begin{eqnarray}&&{\mathbb{L}}=\sum _{i}^{\mathrm{Voxels}}{\left({\tilde{\delta }}_{i}^{\mathrm{pred}}-{\tilde{\delta }}_{i}^{\mathrm{Illustris}}\right)}^{2}\times \exp [{\left({\tilde{\delta }}_{i}^{\mathrm{Illustris}}-\beta \right)}^{\alpha }],\end{eqnarray} \tag{ 4 }$

where $\tilde{\delta }$ is the scaled H i density field from Equation (3). Notice that our loss function is different from the traditional mean square error (MSE) loss; we use the additional hyperparameters α and β to add more weight to the high-mass voxels in order to alleviate the aforementioned sparsity problem.²⁰ We find that α = 2 and β = 0.7 give the best results. We provide further details on the training of the network in Appendix. We have also made use of the techniques of data augmentation and imbalanced sampling to tackle the sparsity problem, and we provide further details in Appendix A.3.

4. Benchmark Model: Halo Occupation Distribution

We will compare the results of our neural network against a benchmark model, which we describe in detail in this section.

There have been multiple recent attempts at developing a halo model for the abundance and spatial distribution of H i (Villaescusa-Navarro et al. 2014; Castorina & Villaescusa-Navarro 2017; Padmanabhan et al. 2016; Villaescusa-Navarro et al. 2018; Spinelli et al. 2020). The main idea behind those models is that most of the H i mass in the post-reionization era is inside halos: more than 99% at z < 0.2 (the fraction decreases to 88% at z = 0.5; VN18). We will use the halo model of VN18 as a benchmark to compare the performance of the neural network. We briefly describe their model here and refer the reader to VN18 for further details. The first step to produce 3D H i density fields through the HOD method consists in running a DM-only simulation and identifying halos: saving their positions, masses, and radii. A DM halo of friends-of-friends (FOF) mass M is then assigned an H i mass:

$\begin{eqnarray}&&{M}_{\mathrm{HI}}(M,z)={M}_{0}{\left(\displaystyle \frac{M}{{M}_{\min }}\right)}^{\alpha }\exp [-{\left({M}_{\min }/M\right)}^{0.35}],\end{eqnarray} \tag{ 5 }$

where M₀ is a normalization factor, α is the power-law slope, and M_min is the characteristic minimum mass of halos that host H i. These three parameters were fitted to reproduce the results of the TNG100-1 simulation by VN18, and their best-fit values at z = 1 are M₀ = 1.5 × 10¹⁰ h⁻¹ M_⊙, ${M}_{\min }=6\,\times {10}^{11}\,{h}^{-1}\,{M}_{\odot }$ , and α = 0.53. Given the total H i mass inside a halo, the HOD will provide its spatial distribution within the halo, i.e., its H i density profile. In small halos, H i is typically localized in their inner regions. For groups and galaxy clusters, the central region of the halo is typically H i poor, due to the action of processes such as AGN feedback, ram pressure, and tidal stripping. Therefore, VN18 fitted a simple power law with an exponential cutoff on small scales given by

$\begin{eqnarray}&&{\rho }_{\mathrm{HI}}(r)=\displaystyle \frac{{\rho }_{0}}{{r}^{{\alpha }_{* }}}\exp (-{r}_{0}/r)\end{eqnarray} \tag{ 6 }$

for the H i density profile. We implement this density profile by assigning 200 particles to each halo following the density profile in Equation (6). For simplicity, we adopt α_* = 3 and r₀ =1 h⁻¹ kpc for all halo masses (VN18). Note that if we do not include the one-halo term, the H i power spectrum becomes dominated by shot noise at k ∼ 1 hr Mpc⁻¹ (VN18). It is important to note that the HOD parameters have not been tuned to match the summary statistics like the power spectrum (as is typically done in the case of galaxy survey data analysis); rather, the HOD is fitted to the M_HI versus M_HOD plot from the TNG data (see Figure 3 of VN18). Our goal here is to test the HOD model and compare its performance with our neural network approach.

**Figure 3.** Comparison between the H i power spectrum (left) and the cross-correlation coefficient, defined as ${r}_{(A)}={P}_{A-\mathrm{Illustris}}/\sqrt{{P}_{A}{P}_{\mathrm{Illustris}}}$ (right), for the IllustrisTNG simulation (orange), the HOD (green), and the neural network (black). We can see that the network outperforms the HOD for both statistics on all scales.
Download figure:
Standard image High-resolution image

**Figure 3.** Comparison between the H i power spectrum (left) and the cross-correlation coefficient, defined as ${r}_{(A)}={P}_{A-\mathrm{Illustris}}/\sqrt{{P}_{A}{P}_{\mathrm{Illustris}}}$ (right), for the IllustrisTNG simulation (orange), the HOD (green), and the neural network (black). We can see that the network outperforms the HOD for both statistics on all scales.
Download figure:
Standard image High-resolution image

While the HOD performs well on high-density H i regions in the universe like the damped Lyman absorbers, it is expected to perform poorly for systems with low H i density like the Lyα forest. Other drawbacks of HOD are that it relies on simplistic parameterizations, like in Equation (5), and assumes a spherical distribution of H i within halos. More importantly, the only information used for predicting M_HI is the halo mass, and all other properties, such as the environment of the halo and its concentration, are ignored; we will return to this point in Section 6.

5. Results

In this section we present the results of our the neural network and its comparison with the HOD model.

We have reserved the subcubes corresponding to a larger cube of side 48 h⁻¹ Mpc in the IllustrisTNG simulation volume for testing our network. Once our network is trained, we concatenate the generated H i field corresponding to all the subcubes and use the larger cube to compare the summary statistics. A possible concern could be that the stacking of the individual H i boxes to make a larger box can lead to spurious edge effects in the summary statistics, but we have checked that such effects are negligible; see Appendix A.4 for further details.

We first show a visual comparison of the network output in Figure 2. In the bottom panels we have averaged over the absolute values of the differences in the fields along the projected axis. We now discuss multiple summary statistics and find that our network outperforms HOD up to the nonlinear scales k ≲ 1 h Mpc⁻¹ in all the statistics.

**Figure 4.** Comparison of the H i−galaxy cross-power (top), its relative error (middle), and the H i−galaxy cross-correlation coefficient (bottom), which is defined as ${r}_{\mathrm{HI}-\mathrm{Galaxy}}={P}_{\mathrm{HI}-\mathrm{Galaxy}}/\sqrt{{P}_{\mathrm{HI}}{P}_{\mathrm{Galaxy}}}$ ). The line labels are the same as in Figure 3. The network outperforms HOD up to the nonlinear scales (k ≲ 1 h Mpc⁻¹).
Download figure:
Standard image High-resolution image

**Figure 4.** Comparison of the H i−galaxy cross-power (top), its relative error (middle), and the H i−galaxy cross-correlation coefficient (bottom), which is defined as ${r}_{\mathrm{HI}-\mathrm{Galaxy}}={P}_{\mathrm{HI}-\mathrm{Galaxy}}/\sqrt{{P}_{\mathrm{HI}}{P}_{\mathrm{Galaxy}}}$ ). The line labels are the same as in Figure 3. The network outperforms HOD up to the nonlinear scales (k ≲ 1 h Mpc⁻¹).
Download figure:
Standard image High-resolution image

5.1. HI Power Spectrum

The most widely used summary statistic in cosmology is the power spectrum, which is the Fourier transform of the two-point correlation function. In 21 cm IM, the quantity that is directly observed is the 21 cm power spectrum, which is related to the HI power spectrum via

$\begin{eqnarray}&&{P}_{21\mathrm{cm}}(k)={\bar{T}}_{b}^{2}\,{P}_{\mathrm{HI}}(k),\end{eqnarray} \tag{ 7 }$

where ${\bar{T}}_{b}$ is the mean brightness temperature of the 21 cm line at redshift z. Let us now compare the two terms on the right-hand side separately. The mean brightness temperature scales as T_b ∝ Ω_HI, where Ω_HI is the ratio between the density of H i at redshift z and the universe's critical density at z = 0. The network and the HOD predict values of 10⁴ × Ω_HI(z = 1) in our test set to be 5.77 and 6.43, respectively, while that value is 5.82 for the IllustrisTNG simulation.

The second term to compare in Equation (7) is P_HI(k), and we show the results from different methods in Figure 3. We find that the network is able to reproduce the HI power spectrum from the simulations up to a deviation of ≲5% all the way to nonlinear scales k ≲ 1 h Mpc⁻¹, whereas the HOD deviates by ≲20% at low k, although it becomes slightly more accurate at high k. The increase in the HOD accuracy at high k arises mainly as a result of the one-halo term given in Equation (6). It is important, however, to note that the one-halo term in Equation (6) only aids in the accuracy of the power spectrum (amplitude of fluctuations) at high k but does not accurately model the phases of the H i fluctuations, which are relevant for the cross-correlation coefficient r_HI (compared in the right panel of Figure 3). VN18 argued that one possible cause of the discrepancy in the HOD power spectrum at low k may be the fact that they do not explicitly fit the total Ω_HI in the simulation volume, which would change the H i bias. However, even if the bias is changed, their r_HI should remain unchanged.

We do not show error bars arising from sample variance in the testing volume because both the DM and the H i fields from IllustrisTNG are evolved from the same initial conditions. We chose the particular range of k in Figure 3 because of the two following constraints: the low-k limit is set by the largest mode in the test set (making an even larger test set is possible, but at the expense of reduction in the training data), and the high-k limit is conservatively set such that the scales affected by the smoothing of the H i maps (∼5 × 300 h⁻¹ kpc) are removed (see Figure 9 for P_HI plotted until k = 10 h Mpc⁻¹).

**Figure 5.** H i−halo cross-correlation coefficient ( ${r}_{\mathrm{HI}-\mathrm{Halo}}={P}_{\mathrm{HI}-\mathrm{Halo}}/\sqrt{{P}_{\mathrm{HI}}{P}_{\mathrm{Halo}}}$ ) for multiple halo mass bins. Line labels are the same as in Figure 3. The network consistently outperforms the HOD up to nonlinear scales (k ≲ 1 h Mpc⁻¹).
Download figure:
Standard image High-resolution image

**Figure 5.** H i−halo cross-correlation coefficient ( ${r}_{\mathrm{HI}-\mathrm{Halo}}={P}_{\mathrm{HI}-\mathrm{Halo}}/\sqrt{{P}_{\mathrm{HI}}{P}_{\mathrm{Halo}}}$ ) for multiple halo mass bins. Line labels are the same as in Figure 3. The network consistently outperforms the HOD up to nonlinear scales (k ≲ 1 h Mpc⁻¹).
Download figure:
Standard image High-resolution image

It is worth noting that the accuracy of P_HI(k) is dependent on modeling of H i in high-mass halos (see, e.g., Figure 8). As mentioned in Section 3.2, the training of our network is challenging in the high-mass end, because of the sparsity associated with the abundance of voxels in that regime. Our results can be further improved if we train the U-net on a simulation with a larger volume,²¹ which would have a larger number of high-mass halos, or using a set of zoom-in simulations focused on galaxy clusters (Thiele et al. 2020).

**Figure 6.** The left panel shows the comparison of the bispectrum as a function of side length for the equilateral triangle configuration. The right panel shows the bispectrum when the angle between two particular sides of the triangle is varied, keeping the length of those sides k₁, k₂ fixed. Line labels are the same as in Figure 3. Note that the network is able to reproduce the bispectrum better than HOD for nearly all triangle shapes and sizes.
Download figure:
Standard image High-resolution image

Let us now estimate, very crudely, the accuracy of the power spectrum model required for future 21 cm surveys using some fisher forecasts in the literature. Obuljen et al. (2018) showed that the peak signal-to-noise ratio (S/N) could be ∼10 for upcoming surveys like HIRAX and CHIME (note that S/N will be much larger for future surveys like SKA), and therefore the accuracy of the method to produce 21 cm mocks should roughly be ≲10% for the power spectrum.

5.2. Cross-correlation with Galaxies and Halos

Large regions of future H i surveys will overlap with regions sampled by galaxy spectroscopic surveys like DESI or Euclid. One of the most important summary statistics in such overlapping regions is the H i−galaxy cross-power P_{HI − Galaxy}. Using the cross-correlations will boost the S/N and mitigate the effects of foregrounds (e.g., Villaescusa-Navarro et al. 2015b predict the S/N for the cross-correlation between future 21 cm surveys and Lyman break galaxies to be larger than 21 cm autocorrelation by a factor of ∼10 (∼4) at large (small) scales; see also Padmanabhan et al. 2020; Modi et al. 2021). Furthermore, unlike the auto-power spectrum of 21 cm, which is yet to be detected, there have already been multiple detections of the P_{HI − Galaxy} signal (Chang et al. 2010; Masui et al. 2013; Anderson et al. 2018).

In Figure 4, we compare P_{HI − Galaxy} from our approach. We have included all the galaxies in the TNG100-1 sample with the stellar mass M_* > 10¹⁰ M_⊙ (which roughly corresponds to a number density of n = 10⁻³ h³ Mpc³) for this measurement. Similar to the P_HI(k) in Figure 3, the P_{HI − Galaxy} exhibits a bias for the HOD at low k.

**Figure 7.** Comparison of the 1D H i pdf (left) and the VSF (right). Line labels are the same as in Figure 3. As there are very few high-mass halos in our test set volume, the high-mass tail of the pdf is dominated by sampling noise. Both our network and the HOD reproduce the above statistics to a good accuracy.
Download figure:
Standard image High-resolution image

Another important statistic along the same lines is the cross-correlation coefficient of H i with halos, which is more sensitive to the way H i mass is distributed across halos and to the one-halo term. We see in Figure 5 that the U-net outperforms the HOD up to k ≲ 1 h Mpc⁻¹ for all bins of halo masses that we have considered. It is worthwhile to note that if we extend Figures 4 and 5 for k > 4 h Mpc⁻¹, the green HOD curve diverges farther away from the orange IllustrisTNG curve. It is therefore likely a coincidence that the HOD outperforms the U-net for 1 h Mpc⁻¹ ≲ k ≲ 4 h Mpc⁻¹, as the green curve is merely crossing the orange curve to the other side. However, we do not show the scales k > 4 h Mpc⁻¹, as they are affected by the smoothing of H i maps and a high-resolution test has to be performed to be definitive.

**Figure 8.** Comparison of histogram (left) and the power spectrum (right) on using only a single U-net and a combination of two U-Nets (two-phase). The combination is designed to outperform a single U-net for the high-mass H i voxels.
Download figure:
Standard image High-resolution image

5.3. HI Bispectrum

If all the fluctuations in the universe were perfectly Gaussian, the field could be perfectly characterized by its two-point correlation function or its power spectrum. However, even if the primordial fluctuations were Gaussian, late-time gravitational clustering causes significant leakage of Gaussian information in the nonlinear regime (Scoccimarro et al. 1999; Takada & Jain 2004; Villaescusa-Navarro et al. 2020; Wadekar & Scoccimarro 2020). To recover this information, the lowest-order statistic that one needs to compute in Fourier space is the bispectrum. Unlike the power spectrum, the bispectrum is sensitive to the shape of the structures generated by gravitational instability and has the promise to break important degeneracies in the bias and cosmological parameters (Scoccimarro 2000; Sefusatti et al. 2006; Chudaykin & Ivanov 2019; Yankelevich & Porciani 2019; Hahn et al. 2020; Hahn & Villaescusa-Navarro 2021; Kamalinejad & Slepian 2020). The post-reionization 21 cm signal is expected to have significant information in the nonlinear regime, and there have been recent attempts at theoretical modeling of the H i bispectrum (Sarkar et al. 2019). One of the toughest parts in a bispectrum analysis is calculation of the error due to cosmic variance, and a fast technique to generate mock H i fields is therefore essential.

In Figure 6 we show the HI bispectrum of the IllustrisTNG simulation, together with the predictions of the HOD and neural network. We show two particular cases (which are representative of all possible triangle configurations): the first case is for equilateral triangles with different side lengths, and the second case is for triangles with a varying angle between two of its sides whose lengths are kept fixed. We checked that the results are similar for other triangle configurations. The residuals of the bispectrum of equilateral triangles from the network compared to the target are ≲20% for all scales we compared (0.2 h Mpc⁻¹ < k < 4 h Mpc⁻¹), and the residuals for HOD, on the other hand, are ≲45%.

**Figure 9.** Same as the left panel of Figure 3, but extended to high-k regions.
Download figure:
Standard image High-resolution image

5.4. 1D Probability Distribution Function

The upcoming H i surveys will observe systems ranging from low column densities (Lyα forest) to very high column densities (damped Lyman absorbers); the pdf of H i is a statistic that is sensitive to its distribution over the wide density range. The H i pdf can also be used for probing non-Gaussian information and for constraining luminosity functions (Breysse et al. 2017; Leicht et al. 2019). We show in Figure 7 the comparison of the U-net prediction over four orders of magnitude of H i voxel masses. The comparison is difficult in the high-mass end owing to sample variance: that regime is dominated by very massive halos, which are rare. Note that there is a slight discrepancy in the HOD approach for low H i mass voxels, as it does not take into account filaments and also misses very low mass halos that are below the simulation resolution threshold.

5.5. Abundance of HI Voids

Voids are the most underdense regions of the universe. In Figure 7 we show the void size function (VSF) of the HI field, which is defined as the number density of HI voids as a function of radius. We have used the algorithm described in Banerjee & Dalal (2016) to identify voids. The VSF is an important statistic, as it contains complementary information to the one from traditional clustering observables.

6. Discussion

Let us now briefly compare our method to some other neural network approaches for emulating hydrodynamic simulations. One difference from Tröster et al. (2019), Yip et al. (2019), Zamudio-Fernandez et al. (2019), and Zhang et al. (2019) is that our method can produce a hydrodynamic field of any large size (and is not restricted to the size of the input simulation box used for training the network). Zamudio-Fernandez et al. (2019) also focus on modeling H i and used a generative adversarial network (GAN) to generate 3D samples of the H i field at redshift z = 5 on very small scales: between 35 h⁻¹ kpc and 2.34 h⁻¹ Mpc. However, their method cannot model the H i fluctuations on large scales, which are relevant for 21 cm experiments. Our method, on the other hand, can model H i on all scales larger than 0.3 h⁻¹ Mpc.

Our network takes ∼1.8 hr to generate an H i box of side 100 h⁻¹ Mpc from a given DM box on a single GPU (for comparison, the IllustrisTNG simulation takes tens of millions of CPU hours for an equivalent volume; Nelson et al. 2019). Our method is therefore capable of making gigaparsec volume mock H i fields.

It is worth mentioning some of the caveats of using deep neural networks to make mock cosmological simulations. We have trained our model to emulate a particular IllustrisTNG simulation that has fixed values of parameters for cosmology and for various baryonic feedback prescriptions. The current model also uses a high-resolution input DM field. One should be very careful with extrapolating any machine-learning model beyond the range of data that it has been trained on (Pfeffer et al. 2019). It is not obvious if, without recalibration, our model can emulate a simulation with a different cosmology or baryonic feedback prescription or can work with a lower-resolution input DM field. However, on the upside, our model only takes a couple of days of training to emulate a given simulation. Our technique is flexible and can quickly learn to emulate future hydrodynamic simulations that will be better than the current ones because of better technology and more observational data.

Let us discuss one interesting direction to be explored in future work. Studies have shown that the distribution of baryons inside the halos is affected by its history (e.g., the halo formation time; Jiang & van den Bosch 2017). Because the U-net is very flexible on the dimensionality of the input field, we could take into account the halo history information by including multiple DM snapshots at different redshifts as input to our U-net. This way the U-net would be able to approximate a function f of an even more general form than the one presented in Equation (2):

$\begin{eqnarray}&&{\rho }_{\mathrm{HI}}({\bf{x}},t)=f({\rho }_{{\rm{m}}}({\bf{x}},t),{\rho }_{{\rm{m}}}({\bf{x}}^{\prime} ,t^{\prime} )),\end{eqnarray} \tag{ 8 }$

which implies that to predict the H i field at a particular point in space and time (x, t), the information should arise not only from the spatial vicinity x' but also from its time evolution t' ≤ t.

7. Conclusions

Multiple upcoming radio telescopes such as CHIME, HIRAX, and SKA will be able to map the 21 emission from cosmic HI in the post-reionization universe. Mock H i fields spanning gigaparsec volumes are needed to provide theory predictions in the nonlinear regime, to compute covariance matrices, and to evaluate the effect of observational systematics like foregrounds, among many other things.

We use a deep convolutional neural network to find the mapping between the 3D fields of DM (from an N-body simulation) to H i (from the IllustrisTNG hydrodynamic simulation). We compared the results of our network against a state-of-the-art HOD benchmark. We show that the neural network outperforms the results of the HOD in all the summary statistics considered: power spectrum, cross-correlation coefficient, bispectrum, pdf, and VSF. While the HOD method neglects any environmental dependence on the abundance of HI inside halos, our neural network can capture any underlying pattern present. We show in our more recent paper that there is indeed a significant effect of the environment of the halo on its H i content, and we model it using more interpretable machine-learning techniques like symbolic regression (Wadekar et al. 2020).

This study focuses on modeling H i in real space, and we will address the modeling in redshift space in a future work. We have used a DM field from a high-resolution N-body simulation as an input in this analysis, and we plan to explore whether the U-net technique can work on lower-resolution N-body simulations or other approximate gravity-only simulations. Although we have focused on H i in this paper, we anticipate neural networks to be able to produce mocks for other line intensity mapping surveys by emulating expensive hydrodynamic simulations.

We thank Gabriella Contardo, David Spergel, Roman Scoccimarro, and Leander Thiele for fruitful discussions. We are especially grateful to Yin Li for many enlightening discussions. We also thank Chirag Modi, Yin Li, and especially an anonymous referee for useful comments on the manuscript. F.V.-N. acknowledges funding from the WFIRST program through NNG26PJ30C and NNN12AA01C. The work of S.H. is supported by the Center for Computational Astrophysics of the Flatiron Institute in New York City. The Flatiron Institute is supported by the Simons Foundation. This work was also supported in part through the NYU IT High Performance Computing resources. The IllustrisTNG data are publicly available at https://www.tng-project.org/data/, while the scripts used in this project are readily available upon request. We have made use of the Pylians3 libraries (https://github.com/franciscovillaescusa/Pylians3) to carry out the analysis of the simulations.

Appendix: Details and Methods

A.1. Details of Network Architecture

We presented our network architecture in Figure 1, and we discuss its details in this section. Let us first discuss details of the input and the output of our network. The input DM box has a physical side length 2.34 h⁻¹ Mpc with a grid resolution of ∼35 h⁻¹ kpc and therefore has 64³ grid cells. As discussed in Section 2.1, the output H i box has a lower resolution (physical side length of 1.17 h⁻¹ Mpc with 8³ grid cells). A U-net typically consists of a contracting path and an expansive path of nearly equal lengths. In our case, due to lower dimensionality of the output as compared to the input, the expansive path is relatively much shorter.

The contracting path follows the typical architecture of a convolutional neural network and consists of four primary blocks. Each block consists of two successive convolutions with stride 1 and a down-sampling convolution with stride 2, each of the which is followed by batch normalization (BN) and a rectified linear unit (ReLU). For each convolutional layer, we use 3 × 3 × 3 filters. As our input DM fields are not periodic, we cannot use periodic padding similar to He et al. (2019); we instead apply a zero padding with size 1. At each down-sampling step we double the number of feature channels and reduce the number of grid points along each dimension by a factor of 2, and vice versa for the up-sampling step.

We only have one up-sampling step in our architecture. We concatenate the up-sampled map (dimensionality 8³) with two maps from the contracting path (denoted with filled boxes): one of the same dimensionality (8³) and another of a higher dimensionality (16³), but after application of a max-pooling layer. These concatenations help to train the network faster as the gradients are passed through the transverse connections. Concatenation also provides the network with various levels of granularity for the final prediction. We have also tried using a deeper neural network (which can be obtained by using stride 1 instead of stride 2 while moving between layers in Figure 1), but we did not find an improvement in our results.

A.2. Two-phase Model

As discussed earlier in Section 3.2, we needed to employ a two-phase model to tackle the problem of data sparsity. Let us now discuss in detail how we employ the two neural networks (labeled as FP (SP) for the first (second) phase hereafter). A simple way to think about the functions of the two networks is the following: FP fills the low H i mass voxels and identifies boundaries of large H i halos. SP is then used to assign an appropriate H i mass profile to the large halos.

Both FP and SP have the same U-net architecture that was shown in Figure 1. If we only use FP in our prediction, we get the results shown by a dashed line in Figure 8. From the pdf, we see that the FP is reproducing the pdf accurately for all H i voxels except the high-mass ones. We believe that this is due to a dearth of training data on the high-mass H i voxels and expect the FP to perform better with more data. To complement FP, we choose to employ an SP that is focused on predicting only the high-mass H i voxels. The training of the SP is therefore a little different from that of the FP. Equation (3) was used to generate rescaled fields to train the FP, but for SP, we use different rescaling transformations:

$\begin{eqnarray}\begin{array}{rcl}{\tilde{\delta }}_{\mathrm{HI}\,}^{(2)} & = & 0.03\,{\left(1+{\delta }_{\mathrm{HI}\,}\right)}^{0.5}\\ {\tilde{\delta }}_{\mathrm{DM}}^{(2)} & = & 0.0035\,{\left(1+{\delta }_{\mathrm{DM}}\right)}^{0.25}.\end{array}\end{eqnarray} \tag{ A1 }$

This is because the high-mass tails of the rescaled DM and H i distributions are flatter for the transforms in Equation (A1) as compared to Equation (3), which makes the SP easier to train in the high-mass regime. We had used the loss function in Equation (4) for training the FP. The loss function for training the SP is obtained by substituting $\tilde{\delta }\to {\tilde{\delta }}^{(2)}$ in Equation (4), and we find that the hyperparameter values β = 0.26 and α = 2.5 give the best results for training the SP.

Let us now discuss combining the predictions of FP and SP after they are trained separately. For a given DM input field, let us denote their unscaled prediction of a particular voxel of the output H i field by δ^FP and δ^SP. We get our final result for that particular voxel by combining the predictions using weights: w^FP δ^FP+w^SP δ^SP; the weights are given by

$\begin{eqnarray}\begin{array}{rcl}{w}^{\mathrm{FP}} & \equiv & \displaystyle \frac{1}{1+\exp \left[({\delta }^{\mathrm{SP}}-{\delta }^{\mathrm{threshold}})/100\right]}\\ {w}^{\mathrm{SP}} & \equiv & 1-{w}^{\mathrm{FP}},\end{array}\end{eqnarray} \tag{ A2 }$

and they were chosen such that the output from SP is given more importance for high-mass voxels and vice versa. We have adopted δ^threshold = 1500 for results in this paper, but we checked that our results are not sensitive to the threshold in the range δ^threshold ∈ (1000 , 2000).

A.3. Training Details and Data Augmentation

For training the network, we used the Adam optimizer (Kingma & Ba 2014) with a learning rate ranging between 10⁻⁸ and 10⁻⁴. The network takes nearly 30 epochs to train, and we choose to use a batch size of 28, as it was compatible with the memory constraints of our GPUs.

We also train the neural network to recognize polyhedral symmetries like translational and rotational invariance (He et al. 2019). We do this by generating multiple instances of a given input DM box by applying various symmetry transformations and then training the U-net on all the generated cases. This process is often referred to as data augmentation because we are generating multiple training simulations given using a single data realization. We use all transformations corresponding to the symmetry group of a cube that consists of 48 elements (octahedral group of order 24 with an additional factor of 2 to account for inversion across the origin ( $\vec{r}\to -\vec{r}$ )). As the cubes containing high H i densities are rare in our training set, using data augmentation greatly helps for training in the high-density regime. We also use a popular technique for tackling data sparsity called nonuniform sampling, where we input the cubes containing high H i density voxels more frequently during training. In our case, we first rank the cubes with respect to the maximum H i density inside them and then increase the frequency of occurrence of the top 1% of cubes 100-fold. However, even after using data augmentation and nonuniform sampling, we still had to use the two-phase neural network approach, as the data sparsity problem in our case is quite extreme.

We trained all of our models using New York University's High Performance Cluster using NVIDIA Tesla P-100 and P-40 GPUs.

A.4. Testing Edge Effects While Combining HI Boxes

In Section 5 we showed results from the test set H i cube. To construct this box, we used the U-net to generate small 1.17 h⁻¹ Mpc side H i subcubes and then stacked them together to produce the test 48 h⁻¹ Mpc cube. One possible concern might be that such stacking produces edge effects that will affect summary statistics at scales comparable to the length of the box (k ∼ 5 h Mpc⁻¹). We therefore check that there indeed are no artifacts on these scales and show the high-k H i power spectrum comparison in Figure 9. Furthermore, we also show a zoom-in version of the output of the test cube in Figure 10. A reason for the absence of edge artifacts could be the larger size of DM box used to predict a smaller H i box (see Figure 1), so information beyond the location of edges of the H i box is included in the U-net prediction. We also tried using the technique of annealing when concatenating the small H i subcubes to smoothen their edges, but our results did not change.

**Figure 10.** Same as the top panels of Figure 2, but zooming in on the lower left region of the box in order to verify that there are no spurious edge effects when H i subcubes of side 1.17hr Mpc⁻¹ are concatenated to produce the large U-net output cube.
Download figure:
Standard image High-resolution image

HInet: Generating Neutral Hydrogen from Dark Matter with Neural Networks

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract