Methods of Error Estimation for Delay Power Spectra in 21 cm Cosmology

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and

Published 2021 August 10 © 2021. The American Astronomical Society. All rights reserved.
, , Citation Jianrong Tan et al 2021 ApJS 255 26 DOI 10.3847/1538-4365/ac0533

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0067-0049/255/2/26

Abstract

Precise measurements of the 21 cm power spectrum are crucial for understanding the physical processes of hydrogen reionization. Currently, this probe is being pursued by low-frequency radio interferometer arrays. As these experiments come closer to making a first detection of the signal, error estimation will play an increasingly important role in setting robust measurements. Using the delay power spectrum approach, we have produced a critical examination of different ways that one can estimate error bars on the power spectrum. We do this through a synthesis of analytic work, simulations of toy models, and tests on small amounts of real data. We find that, although computed independently, the different error bar methodologies are in good agreement with each other in the noise-dominated regime of the power spectrum. For our preferred methodology, the predicted probability distribution function is consistent with the empirical noise power distributions from both simulated and real data. This diagnosis is mainly in support of the forthcoming HERA upper limit and also is expected to be more generally applicable.

Export citation and abstract BibTeX RIS

1. Introduction

The epoch of reionization (EoR)—when neutral hydrogen in the intergalactic medium (IGM) was ionized by photons from early galaxies and active galactic nuclei—remains one of the most exciting frontiers in modern astrophysics and cosmology. Precise measurements of this era will significantly enhance our understanding of the origin of the very first stars, the process of galaxy formation, and the thermal history of the IGM (Barkana & Loeb 2001; Dayal & Ferrara 2018). Some measurements, such as those of the optical depth of cosmic microwave background (CMB) photons (Planck Collaboration et al. 2020), the Gunn–Peterson trough in distant quasar spectra (Becker et al. 2001, 2015; Fan et al. 2006; Bolton et al. 2011), quasar damping wings (Davies et al. 2018), and the decrease in the number density and clustering trends of Lyα emitters at high redshifts (Ouchi et al. 2010; Stark et al. 2010; Bosman et al. 2018), have already established the basic parameters of the EoR. Collectively, they suggest that reionization is a process that probably began at z ≫ 10 and ended around z ≈ 6. However, the aforementioned probes paint an indirect and incomplete picture of the EoR. For example, CMB measurements are integral constraints over redshift, making the extraction of detailed information technically difficult (often involving a subtle kinetic Sunyaev–Zel'dovich effect or polarization measurements); Lyα photons suffer from severely saturated absorption that makes it difficult for them to probe earlier times than the end of reionization; and low-mass galaxies (i.e., those thought to be responsible for supplying a large fraction of ionizing photons) are too faint to be directly detected. A complementary probe capable of making direct observations of the EoR is therefore desirable.

A strong candidate for a direct probe of reionization is the 21 cm line. Arising from the "spin flip" transition in the hyperfine structure of atomic hydrogen, the 21 cm line is a promising way to directly trace the evolution of H i regimes on different spatial scales and eventually provide a comprehensive three-dimensional picture throughout the history of reionization (Furlanetto et al. 2006; Morales & Wyithe 2010; Pritchard & Loeb 2012; Liu & Shaw 2020). Current experimental efforts are focused on slightly more modest—but still ambitious—observables. One example is the global 21 cm signal, which is a single spectrum of 21 cm absorption or emission averaged over the entire angular area of the sky (Bowman et al. 2008; Singh et al. 2018). Recently, the Experiment to Detect the Global Epoch of reionization Step (EDGES) team reported a tentative detection of a 21 cm absorption signature at z ∼ 17 (Bowman et al. 2018a), although this result remains controversial (Bowman et al. 2018b; Hills et al. 2018; Bradley et al. 2019; Singh & Subrahmanyan 2019; Sims & Pober 2020). Global signal measurements are complemented by experimental efforts to map spatial fluctuations in the 21 cm brightness temperature field. Most such efforts currently focus on a measurement of the power spectrum, i.e., the variance in Fourier space. Power spectrum measurements have the potential to significantly improve constraints on the cosmological and astrophysical parameters of reionization models and potentially even discover new fundamental physics (e.g., McQuinn et al. 2006; Pober et al. 2014, 2015; Greig & Mesinger 2015, 2017; Hassan et al. 2017; Kern et al. 2017; Park et al. 2019; Ghara et al. 2020). Typically, these measurements are pursued by low-frequency radio interferometer arrays, such as the Murchison Widefield Array 22 (Bowman et al. 2013; Tingay et al. 2013), the Low Frequency Array 23 (LOFAR; van Haarlem et al. 2013), the Donald C. Backer Precision Array for Probing the Epoch of Reionization 24 (Parsons et al. 2010), the Hydrogen Epoch of Reionization Array 25 (HERA; DeBoer et al. 2017), and the Square Kilometre Array 26 (Mellema et al. 2013; Koopmans et al. 2015). Although no experiment has yet to claim a detection of the 21 cm power spectrum at redshifts relevant to the EoR, steady progress has been made in recent years in the form of increasingly stringent and robust upper limits (Dillon et al. 2014, 2015; Beardsley et al. 2016; Patil et al. 2017; Barry et al. 2019; Kolopanis et al. 2019; Li et al. 2019; Mertens et al. 2020; Trott et al. 2020).

In this paper, we tackle the crucial problem of error estimation in the context of 21 cm power spectrum measurements. While an extensive literature on power spectrum error estimation exists for CMB measurements and galaxy surveys, there are several challenges that are unique to 21 cm cosmology. Chief among these is the fact that any measured signals will be strongly contaminated by the foregrounds, which are generally 4–5 orders of magnitude stronger in temperature (de Oliveira-Costa et al. 2008; Jelić et al. 2008; Bernardi et al. 2009). To overcome this obstacle, some collaborations pursue a strategy of foreground subtraction, where models of foreground emission are subtracted from the data (e.g., Harker et al. 2009; Bernardi et al. 2011; Chapman et al. 2012; Cho et al. 2012; Shaw et al. 2015). Different approaches to foreground subtraction make different assumptions (see Liu & Shaw 2020 for examples), but all face the same problem of attempting to subtract a large contaminant from a large raw signal to reveal a small cosmological signature. With empirical constraints on the low-frequency radio sky being relatively scarce and generally imprecise, the chances of mis-subtraction are high. Errors in such a subtraction process, as well as the effects of subtraction residuals, must therefore be propagated through to a final power spectrum estimate.

In this paper, however, we do not tackle the problem of error propagation in the context of foreground subtraction; instead, we consider error estimation in the context of foreground avoidance, where one aims to make cosmological measurements exclusively in Fourier modes where foregrounds are expected to be subdominant. Key to this is the notion of the foreground wedge, a regime in Fourier space beyond which spectrally smooth foregrounds cannot extend if observed using an ideal interferometer (Datta et al. 2010; Morales et al. 2012; Parsons et al. 2012b; Trott et al. 2012; Vedantham et al. 2012; Hazelton et al. 2013; Thyagarajan et al. 2013; Liu et al. 2014a). The limitation of foregrounds to the wedge is a theoretically robust notion (Liu & Shaw 2020), and in principle, one can make foreground-free measurements simply by avoiding the regime. In practice, observations are never made using perfect interferometers, and instrumental systematics such as having nonidentical antenna elements, cable reflections, and cross couplings (e.g., Kern et al. 2019, 2020a) complicate one's foreground mitigation efforts. These complications can result in the appearance of contaminants outside of the foreground wedge, and in this paper, we define and tackle the problem of error estimation in two regimes: a noise-dominated regime and a signal-dominated regime (whether these signals could be foregrounds, systematics, or any other coherent signals).

Through a combination of analytic work, simulations of toy models, and tests on small amounts of real data, we critically examine different ways in which one can place error bars on 21 cm delay power spectra. Our goal is to produce a "buyer's guide" that enumerates the advantages and disadvantages of various error estimation methods. Understanding these strengths and weaknesses is crucial for setting upper limits, diagnosing systematics, interpreting the results of null tests, and the design and optimization of future telescopes (Morales 2005; McQuinn et al. 2006; Parsons et al. 2012a). Although we will focus primarily on the delay power spectrum–style analysis (Parsons et al. 2012b) in support of recent HERA upper limits (HERA Collaboration 2021, in preparation), we expect many of our results to be more generally applicable.

This paper is organized as follows. In Section 2, we review the basics of power spectrum estimation using the delay spectrum technique, establishing our notation. In Section 3, we propose several methods for estimating errors in 21 cm delay power spectra. These approaches are then compared and contrasted using simulations and real data in Section 4. We then discuss the strengths and weaknesses of each error estimation method in Section 5 before summarizing our conclusions in Section 6. For readers' convenience, we provide dictionaries for a number of quantities defined in this paper in Tables 1 and 2.

Table 1. Dictionary of Highlighted Scalars and Functions

QuantityDefinition/MeaningFirst Appearance
b ; b p Baseline vector; vector of the pth index baselineEquation (1)
θ Angular sky positionEquation (1)
ν; νi Frequency; frequency of the ith index channelEquation (1)
b λ ; b λ pi Normalized baseline vector in units of wavelength; normalized vector for baseline b p at frequency νi Equation (3)
u Fourier dual to θ Equation (2)
η; ηα Fourier dual to ν; αth index η modeEquation (2)
τ; τα Delay, i.e., Fourier dual to ν on a single baseline; αth index delay modeEquation (16)
A( θ , ν)Primary beam function at position θ and frequency ν Equation (1)
$\tilde{A}({\boldsymbol{u}},\nu )$ Spatial Fourier transform dual of primary beam functionEquation (3)
γ(ν)Spectral tapering function at frequency ν Equation (14)
Ntime; Nblp Number of time instants; number of baseline pairsEquation (18)
Nboot Number of bootstrapping sample setsEquation (24)
I( θ , ν)Sky source intensity function at position θ and frequency ν Equation (1)
$\tilde{I}({\boldsymbol{u}},\eta )$ Fourier transform of I at angular wavenumber u and line-of-sight wavenumber η Equation (2)
V( b , ν)Visibility measured by baseline b at frequency ν Equation (1)
P( u , η)Cylindrical power spectrum at angular wavenumber u and line-of-sight wavenumber η Equation (4)
Pα αth bandpower ${P}_{\alpha }\equiv P\left({\overline{{\boldsymbol{b}}}}_{\lambda },{\eta }_{\alpha }\right)$ Equation (8)
${\hat{P}}_{\alpha }$ Estimator for the αth bandpower Pα Equation (9)
Mα Normalization scalar of the estimator for the αth bandpowerEquation (11)
$\tilde{V}({{\boldsymbol{b}}}_{p},{\tau }_{\alpha }),{\tilde{x}}_{p}({\tau }_{\alpha })$ Delay spectra of baseline b p at delay mode τα Equation (15)
${\tilde{V}}_{\mathrm{signal}}({{\boldsymbol{b}}}_{p},{\tau }_{\alpha }),{\tilde{s}}_{p}({\tau }_{\alpha })$ Signal component of $\tilde{V}$ of baseline b p at delay mode τα Equation (16)
${\tilde{V}}_{\mathrm{noise}}({{\boldsymbol{b}}}_{p},{\tau }_{\alpha }),{\tilde{n}}_{p}({\tau }_{\alpha })$ Noise component of $\tilde{V}$ of baseline b p at delay mode τα Equation (16)
${P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}$ Power spectra formed from visbilities x 1 and x 2 Equation (30)

Download table as:  ASCIITypeset image

Table 2. Dictionary of Highlighted Vectors and Matrices

QuantityDefinition/MeaningSizeFirst Appearance
x p Stacked visibilities at multiple frequencies of baseline b p Nfreq Equation (6)
C pq Covariance matrices ${{\boldsymbol{C}}}^{{pq}}\equiv \langle {{\boldsymbol{x}}}_{p}{{\boldsymbol{x}}}_{q}^{\dagger }\rangle $ Nfreq × Nfreq Equation (7)
Q pq,α Response of covariance C pq to the αth bandpower Nfreq × Nfreq Equation (8)
E pq,α Matrix for QE of bandpower Pα , i.e., ${\hat{P}}_{\alpha }={{\boldsymbol{x}}}_{p}^{\dagger }{{\boldsymbol{E}}}^{{pq},\alpha }{{\boldsymbol{x}}}_{q}$ Nfreq × Nfreq Equation (9)
W Window function matrix Ndelay × Ndelay Equation (10)
R p Weighting matrix acting on x p Nfreq × Nfreq Equation (11)
Q DFT,α Matrix taking Fourier transform in the estimator Nfreq × Nfreq Equation (11)
U pq Two-point correlation matrices ${{\boldsymbol{U}}}^{{pq}}\equiv \langle {{\boldsymbol{x}}}_{p}{{\boldsymbol{x}}}_{q}^{T}\rangle $ Nfreq × Nfreq Equation (33)
G pq Two-point correlation matrices ${{\boldsymbol{G}}}^{{pq}}\equiv \langle {{\boldsymbol{x}}}_{p}^{* }{{\boldsymbol{x}}}_{q}^{\dagger }\rangle $ Nfreq × Nfreq Equation (33)

Download table as:  ASCIITypeset image

2. Power Spectrum Estimation via the Delay Spectrum

In this section, we review the delay spectrum approach to 21 cm power spectrum estimation (Parsons et al. 2012b) using the the language of the quadratic estimator (QE) formalism (Liu & Tegmark 2011) that we adopt in this paper.

The delay spectrum technique enables power spectra to be estimated using just a single baseline of a radio interferometer, with fluctuations in the 21 cm signal probed primarily in the line-of-sight direction via spectral information. The starting point is the visibility V( b , ν) measured by an interferometer's baseline b at frequency ν. Under the flat-sky limit, it is given by

Equation (1)

where c is the speed of light, θ is the angular sky position, I( θ , ν) is the source intensity function, and A( θ , ν) is the primary beam function. If we express I( θ , ν) in terms of its Fourier transform $\tilde{I}({\boldsymbol{u}},\eta )$, i.e.,

Equation (2)

then our visibility equation becomes

Equation (3)

where we have defined ${{\boldsymbol{b}}}_{\lambda }\equiv \tfrac{\nu }{c}{\boldsymbol{b}}$ as the normalized baseline vector for baseline b in units of wavelength. In the angular directions, we see that a visibility has a response to u modes centered around b λ . If the primary beam A is fairly broad, $\tilde{A}$ will be highly compact, and the majority of the integral will be sourced from u b λ . We will use this fact later. From this, one sees that a visibility V( b , ν) is a linear function of $\tilde{I}({\boldsymbol{u}},\eta )$. This quantity is directly related to the cylindrical power spectrum P( u , η), which decomposes power into Fourier wavenumbers perpendicular ( u ) and parallel (η) to the line of sight and is formally defined as

Equation (4)

Such a power spectrum can be recast into more conventional cosmological coordinates via the relations 27

Equation (5)

where Dc is the line-of-sight comoving distance, ν21 is the rest frequency of the 21 cm line, H0 is the Hubble parameter today, and $E(z)\equiv \sqrt{{{\rm{\Omega }}}_{{\rm{\Lambda }}}+{{\rm{\Omega }}}_{m}{\left(1+z\right)}^{3}}$, with ΩΛ and Ωm as the normalized dark energy and matter density, respectively.

Since the power spectrum is a quadratic function of the Fourier representation of the sky, we expect that one should be able to estimate the power spectrum by forming some quadratic function of visibilities. However, directly squaring some functions of the visibilities will incur a noise bias, because noise that is symmetrically distributed about zero will have a positive contribution that does not average down with cumulative samples. Fortunately, the noise bias can be avoided by cross-multiplying nominally identical measurements rather than squaring a single measurement. For instance, one might choose to form quadratic combinations of data from adjacent time samples of a single baseline's time stream, or perhaps to cross-multiply the time streams from two redundant baselines that satisfy b 1 = b 2 = b for some b . In this paper, we will consider power spectrum measurements that are formed from cross-multiplications in both time and different copies of an identical baseline. Utilizing both types of cross-multiplication has the advantage of avoiding skewness in the probability distributions of the measured power spectra, simplifying the interpretation of our results. This is discussed in Appendix A. In this section, however, we will—for simplicity—suppress explicit reference to the data time stream and use notation that explicitly refers to cross-correlating different baselines. Given a pair of redundant baselines b 1 and b 2, we stack their measuring visibilities at multiple frequencies ν1, ν2, ... at single time instants into two data vectors x 1 and x 2, such that

Equation (6)

To make an explicit connection between visibilities and power spectra, we must examine the statistical properties of these data vectors. For quadratic statistics, the key quantity is the covariance matrix ${{\boldsymbol{C}}}^{12}\equiv \langle {{\boldsymbol{x}}}_{1}{{\boldsymbol{x}}}_{2}^{\dagger }\rangle $, which can be written as

Equation (7)

where b λ1i and b λ2j are the normalized baseline vectors for baseline b 1 and b 2 evaluated at frequencies νi and νj , respectively, and ${\overline{{\boldsymbol{b}}}}_{\lambda }$ is the mean of the two. In deriving Equation (7), we first substituted Equation (3) for the expressions of visibilities in the angle bracket and then factored the evaluated cylindrical power spectrum out of the integral over u . Next, we replaced the continuous integral on the power spectra with discrete sums over a series of piecewise constant bandpowers $P\left({\overline{{\boldsymbol{b}}}}_{\lambda },{\eta }_{\alpha }\right)$, such that

Equation (8)

Henceforth, we will adopt the notation ${P}_{\alpha }\equiv P\left({\overline{{\boldsymbol{b}}}}_{\lambda },{\eta }_{\alpha }\right)$ to mean the value of the cylindrical power spectrum P( u , η) evaluated at ${\boldsymbol{u}}={\overline{{\boldsymbol{b}}}}_{\lambda }$ and η = ηα . The index α discretely runs over a series of bins in η, and as long as these bins are narrow compared to the scales over which the power spectrum changes, a piecewise constant treatment is appropriate.

Equation (8) shows that the cross-baseline covariance matrix of visibilities encodes information about the power spectrum bandpowers via a family of response matrices Q 12,α (with a different matrix for every value of the bandpower index α). Since the covariance is an ensemble-averaged quadratic function of the data, one might venture that estimators for the bandpowers can be constructed by forming quadratic combinations of the data, i.e.,

Equation (9)

where E 12,α is a matrix that can be chosen (within certain limitations) by the data analyst. Taking the ensemble average on both sides and inserting Equation (8) then yields

Equation (10)

where W is the window function matrix. To ensure that our estimated bandpowers are correctly normalized, we require that each row of W sum to unity.

In the HERA power spectrum pipeline, we pick a family of E 12 matrices of the form

Equation (11)

where the matrix ${{\boldsymbol{Q}}}_{{ij}}^{\mathrm{DFT},\alpha }\equiv {e}^{i2\pi {\eta }_{\alpha }({\nu }_{i}-{\nu }_{j})}$ is responsible for taking the Fourier transform of the two copies of the data vectors in the QE. The matrices R 1 and R 2 are weighting matrices that act on visibilities from b 1 and b 2, respectively. In this paper, we use R = T Y , where both T and Y are diagonal matrices. The former is used to impose a Blackman–Harris tapering function on the spectral data, and the latter propagates data flags. With a QE of this form, the normalization scalar, Mα , should take the form

Equation (12)

which ensures that the rows of W sum to unity, and therefore that the bandpowers are properly normalized. In our case, we do use this normalization, but we approximate the Q 12,β term in the denominator. Rather than evaluating the full integral in Equation (8), we make the approximation that b λ1i b λ2i . In fact, this is the motivation for the use of Q DFT,α in Equation (11) rather than Q 12; notice that if b λ1i = b λ2i , then Q 12 Q DFT. Over large bandwidths, this will fail for long baselines, since b λ ν b /c.

The approximation that we have just made is equivalent to the delay spectrum approximation (Parsons et al. 2012b; Liu et al. 2014a). To see this, we can write our estimator in the continuous limit. Our current form for E 12,α is separable into the product of two matrices that each involve only one of the two baselines. In particular, if γ(ν) is the functional form of the Blackman–Harris taper, then we have ${{\boldsymbol{E}}}_{{ij}}^{12,\alpha }={\gamma }_{1}({\nu }_{i}){e}^{i2\pi {\eta }_{\alpha }({\nu }_{i}-{\nu }_{j})}{\gamma }_{2}({\nu }_{j})$, and its action on each baseline's visibilities in Equation (9) is to compute the quantity

Equation (13)

which is just a discrete approximation to

Equation (14)

Note that Equation (14) is an equivalent expression of the delay transform in Parsons et al. (2012b). Therefore,

Equation (15)

Equation (15) just indicates that the QE is proportional to the product of delay-transformed visibilities. This is an estimator that is based on Fourier transforming the visibility spectra from individual baselines, rather than combining information from different baselines. In principle, only the latter can probe truly rectilinear Fourier modes on the sky, since k b λ (which is a frequency-dependent quantity); thus, to probe the same k at multiple frequencies—which is needed to perform the Fourier transform along the line-of-sight direction—one needs multiple baselines. The delay spectrum approach uses the fact that b λ evolves only slowly with frequency for short baselines to form an approximate power spectrum estimator. We make this approximation throughout this paper, as this is the choice that has been made for the next iteration of power spectrum upper limits from HERA observations. In recognition of this, we will henceforth use τ to index our line-of-sight Fourier modes (as is customary for delay spectra) instead of η (which is generally used to denote true rectilinear line-of-sight wavenumbers; Morales et al. 2012, 2019).

In the language of the delay spectrum, the foreground wedge becomes particularly simple to describe: smooth-spectrum foregrounds simply contaminate all modes below a particular delay, the value of which depends on the baseline length (Parsons et al. 2012b; Liu et al. 2014a; Liu & Shaw 2020). Suppose we decompose the delay-transformed visibility into the signal component ${\tilde{V}}_{\mathrm{signal}}$ (mainly foregrounds, and we are neglecting the much weaker EoR signal here) and the noise component ${\tilde{V}}_{\mathrm{noise}}$, such that

Equation (16)

Since we are working on redundant baselines, we will henceforth drop the subscript on $\tilde{s}$, as the two baselines used in Equation (15) should measure identical signals. Mathematically, then, the statement that the smooth-spectrum foregrounds contaminate only low delay modes is given by

Equation (17)

where τα is the delay corresponding to the αth bandpower, and τ0 is some critical delay value that separates parts of the power spectrum that are foreground-dominated from those that are not. In general, τ0 will depend on the properties of one's instrument, as well as the extent to which the assumption of smooth foregrounds is good. At delays less than τ0, we have assumed that the foreground signal is so large that the noise–noise cross term can be neglected.

Throughout the rest of this paper, we will appeal to Equation (17) for intuition when contemplating the behavior of our power spectrum estimates at different delays. For now, we note two of its important properties. First, while the power spectrum of a signal ${\tilde{s}}^{* }\tilde{s}$ will always be real valued, the overall estimator ${\hat{P}}_{\alpha }$ is complex. It is possible to write down symmetrized estimators that give real power spectra. However, since the imaginary part is sourced by noise, it is a useful diagnostic quantity to examine. Second, even though the noise–noise terms may be negligible in the signal-dominated regimes, there will still be a considerable uncertainty here that enters via the signal–noise cross terms.

Until now, we have focused on power spectra estimated from visibilities measured at single time instants. Given data from multiple times, we can average the power spectra estimated from individual measurements together. For a drift scan telescope, this averaging of power spectra from different time samples is tantamount to invoking statistical isotropy to justify the spherical averaging of power spectra over different wavevector k directions. In addition to averaging in time, if we have multiple pairs of baselines within the same redundant group of baselines, we may average over the power spectrum estimates from multiple baseline pairs. The simplest way to do this is to perform an unweighted average,

Equation (18)

where Ntime is the number of time integrations, Nblp is the number of baseline pairs, ${\hat{P}}_{\alpha }(\mathrm{time},\mathrm{blp})$ is the power spectrum estimate (given by previous equations in this section) at a time instant and a baseline pair ("blp"), and ${\overline{\hat{P}}}_{\alpha }$ is the average of the estimates. The type of averaging performed here may be termed an "incoherent average," to distinguish it from a "coherent average," where one averages over visibilities (or converts them into a single image) before squaring them in power spectrum estimation. The latter provides greater sensitivity if calibration errors and other systematic effects can be brought under control (Morales et al. 2019). The former retains the ability to inspect the contributions from particular baseline pairs and time until right before the final result, making some systematics easier to diagnose. However, note that by employing a suitable fringe-rate filtering of the time-stream data, it is in principle possible to recover the lost sensitivity from a "square-then-add" approach (Parsons et al. 2016). In this paper, we will focus on the error statistics of the incoherent average approach, as this is what is currently used in the HERA pipeline (HERA Collaboration 2021, in preparation).

Before we move into the discussion on error estimation methods in the next section, it is worth noting that Equation (18) is not the optimal way to obtain average power spectra with the least variance. Generally, given a set of estimates ${\hat{{\boldsymbol{P}}}}_{\alpha }$ for bandpower Pα with measurement errors σ , such that

Equation (19)

an linear estimator of Pα is written as

Equation (20)

Here D is a column vector of 1 s. We need to select K such that K D = I in order to achieve an unbiased constraint that satisfies $\langle {\overline{\hat{P}}}_{\alpha }\rangle ={P}_{\alpha }$. For an arbitrary matrix K , the error bar ${{\rm{\Sigma }}}_{\alpha }\equiv \langle | {\overline{\hat{P}}}_{\alpha }-{P}_{\alpha }{| }^{2}\rangle ={\boldsymbol{K}}\epsilon {{\boldsymbol{K}}}^{t}$, where the error covariance matrix epsilon ≡ 〈 σ σ t 〉. The superscript t used here and throughout this paper refers to the matrix transposition. Note that Equation (18) is just a special case where ${\boldsymbol{K}}={[{{\boldsymbol{D}}}^{t}{\boldsymbol{D}}]}^{-1}{{\boldsymbol{D}}}^{t}$. When Σα is minimized (optimal), ${\overline{\hat{P}}}_{\alpha }$ and the corresponding Σα should take the form of (Tegmark 1997; Dillon et al. 2014)

Equation (21)

Equation (22)

which amounts to an inverse covariance weighting of the data in averaging it down. Equation (21) brings us the ability to propagate the full covariance information over samples to obtain a least-variance average result. The diagonal elements of epsilon are easily interpreted as the variance in each individual measurement, while the off-diagonal elements, reflected by the coherency between time samples and baseline pair samples, are far more complicated. If estimating the covariance matrix epsilon of the preaveraged data is difficult, one may opt to weight the data using some other matrix Γ instead of epsilon in Equation (21). In this case, the final variance Σα ends up being

Equation (23)

In principle, one could model the off-diagonal elements of epsilon . This is particularly important in the cosmic variance–dominated regime where the sky signal—which is what sources a cosmic variance error—is slowly drifting through HERA's field of view over the course of the day, thus inducing strong correlations between different time samples. In this paper, we do not consider the modeling of off-diagonal covariances in epsilon (or between different α values in ${\overline{\hat{P}}}_{\alpha }$). We assume diagonal covariance matrices and set Γ = I ; i.e., we use Equation (18) when computing the "incoherently averaged" power spectra, and here we are acknowledging other possibilities only for completeness.

3. Error Estimation Methodology

Placing robust error bars on power spectra is crucial to our data analysis, whether it is for setting upper limits, diagnosing experimental systematics, or eventually declaring a detection of the cosmological 21 cm signal. Generally, contributions to the error bars of observed power spectra come from three sources: the EoR signal, noise, and foregrounds (Thyagarajan et al. 2013; Dillon et al. 2014, 2015; Trott 2014; Lanman & Pober 2019). Of course, this is all complicated by the response of one's instrument, and ultimately, one's ability to place reliable error bars rests on one's ability to understand the behavior of each data source in the context of the instrument.

The intrinsic variance of the EoR signal, also known as "cosmic variance," is the ensemble covariance on all possible realizations of the 21 cm temperature field. If the field is Gaussian, then its cosmic variance is proportional to the square of the power spectrum amplitude over the number of independent modes. Lanman & Pober (2019), for example, estimated that the cosmic variance could go as high as ∼35% of the EoR signal for HERA-like fields of view with eight hours of local sidereal time (LST) observations using only the shortest (14.6 m) baselines of HERA. This uncertainty due to cosmic variance is brought down to a level of a few percent for the spherically averaged power spectrum when using all types of baselines. Importantly, as reionization evolves, the 21 cm temperature field is expected to become highly non-Gaussian, and the excess contribution from the non-Gaussian component could lift the cosmic variance in the Gaussian part staggeringly, which is significant and should be considered for future high-sensitivity measurements (Mondal et al. 2016, 2017; Shaw et al. 2019). In this paper, however, we assume that at our current levels of precision, the cosmic variance is subdominant to noise and foregrounds.

For instrumental noise, we assume that the noise in the visibility from each baseline is independent and Gaussian-distributed. This is what one might expect based on the statistics of correlator outputs in a radio interferometer, but it is also an assumption that we will see borne out in our empirical data in Section 4. With these well-understood statistical properties, the noise-dominated delays (recall Equation (17)) are relatively easy to model, at least in principle.

The low-delay, foreground-dominated regimes are trickier to model. One key problem is that the statistics of foregrounds are not well understood, particularly at the low frequencies relevant to us. There are different approaches that one can take to this roadblock. The first is where one attempts to make a measurement of the cosmological 21 cm signal only, by proactively subtracting (or simultaneously fitting) a foreground model. To properly set error bars on such a power spectrum, it is necessary to propagate uncertainties (accounting for the possibility of mis-subtractions) in the foreground model to the final errors (or, in the case of a simultaneous fitting, to allow the errors on the cosmological signal to be appropriately inflated as one marginalizes over foreground uncertainties). While conceptually straightforward, these steps are difficult to implement in practice without a deep understanding of foreground statistics.

Instead, in this paper, we treat foregrounds as additive systematics on the total sky emission. Crucially, this means we only require empirical knowledge of the foregrounds themselves, not their full probability distribution. We simply quantify the error bars on a measurement of total sky emission due to instrumental noise, rather than the error bars on the cosmological signal due to foreground uncertainties and noise. Some understanding of foregrounds is still needed for setting our errors because of the signal–noise cross terms in Equation (17). Implicit in this approach is a strategy of foreground avoidance in the hunt for a cosmological signal detection, where it is hoped that the separation between foreground-dominated and foreground-negligible regimes in Equation (17) is a clean one. It is important to note, however, that we seek to compute error bars that transition smoothly between the regimes and are valid even if the conceptual separation is not a clean one in practice. 28

In addition to foregrounds, one can treat instrumental systematics in the same way. In other words, interpreting systematics as additive "signals," the signal–noise cross term in the variance of power spectra is sourced by not just foregrounds but also other systematics, such as cable reflections and cross couplings (Kern et al. 2019, 2020a). We can apply some models to remove systematics from the signal, but the residuals due to mis-subtraction will still increase the total uncertainties via the signal–noise cross term. Note, however, that in this paper, we do not develop a comprehensive model to account for all systematics, which is particularly difficult when unknown modeling errors are present in complicated effects (e.g., direction-dependent gains). We will instead argue that a procedure of using the measured visibility itself to model the foregrounds and systematics allows us to set robust upper bounds, provided certain safeguards are in place to avoid biases. We will leave more exquisite a priori characterizations of foregrounds and systematics in the signal–noise cross terms for the future.

Finally, one might worry that the averaging of power spectra from multiple measurements together like Equation (18) might complicate the statistics. Appendix B shows an example of this. There we show that when averaging over redundant baseline pairs, the variance of the average power spectra in the foreground-dominated regime goes down roughly with ${N}_{\mathrm{blp}}^{-1/2}$ and not ${N}_{\mathrm{blp}}^{-1}$ because some baselines will appear in multiple baseline pairs. In other words, in foreground-dominated (or systematics-dominated) regimes, one cannot assume that baseline pairs average together in an independent fashion. This has consequences for certain methods of error bar computation, such as the bootstrapping approach discussed in the next subsection, which will tend to underestimate error bars in these regimes. To avoid this, one might just use pairs in which each baseline only appears once in all baseline pairs or to compute a correction factor on the final results. In contrast to the foreground-/signal-dominated regime, in the noise-dominated regime, one obtains correct final error bars by assuming that the baseline pair samples are independent (even if they are not for the aforementioned reasons). In this paper, to avoid averaging power spectra over correlated samples, we will concentrate on the averaging of the power spectra of a single baseline pair over multiple time samples.

We will have a more extensive discussion of the meaning of our error bars in Section 5. For concreteness, however, we will now propose several different methods for generating error bars based on the HERA power spectrum pipeline before performing quantitative comparisons in Section 4. For the convenience of our readers, we provide Table 3 as a quick preview.

Table 3. Dictionary of Error Bars

NameDescriptionDefinition
σbs Error bar of the average power spectra by bootstrapping over the collection of samplesEquation (24)
Pdiff Power spectra from differenced visibility used as a form of error barEquation (26)
PN Analytic noise power spectrumEquation (27)
PSN Error bar based on PN but including the extra signal–noise cross termEquation (30)
σQE-N Error bar from the output covariance in QE formalism including only noise–noise termEquation (37)
σQE-SN Error bar from the output covariance in QE formalism including noise–noise and signal–noise termsEquation (38)
${\tilde{P}}_{\mathrm{SN}}$ Same as PSN but with an adjustment for noise double countingEquation (31)
${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ Same as σQE-SN but with an adjustment for noise double countingEquation (39)

Download table as:  ASCIITypeset image

3.1. Bootstrap

Bootstrapping is a natural method for computing the error bars on the final averaged power spectrum with only minimal a priori modeling assumptions. Within the 21 cm cosmology literature, it has previously been used to set error bars on power spectrum upper limits (Parsons et al. 2014; Ali et al. 2015; although see Cheng et al. 2018 for caveats on these limits). Bootstrapping is a process that goes hand in hand with the averaging step described in Equation (18). Rather than performing a single average, we repeatedly form a new set of preaveraged data by resampling the original set with replacement (i.e., allowing repeated entries). A new estimate of the final average, ${\overline{\hat{P}}}^{(k)}$, can be produced from the kth draw. The scatter in the realizations of the final averaged power spectrum is then quoted as an error bar σbs, such that

Equation (24)

where Nboot is the number of bootstrapping sample sets. In essence, one is using the data themselves as an empirical estimate of the distribution from which the data are drawn (Efron & Tibshirani 1994; Press et al. 2007).

If the input data samples are independent and identically distributed, bootstrapping will give the same error bars as the true ones from the ensemble average. However, this assumption is likely to be violated with our data. Consider the two axes that we have at our disposal. One possibility is to bootstrap over different time samples. Over short timescales, different time integrations have relatively uncorrelated noise realizations. However, as our drift scan telescope moves across different LST values, the sky brightness seen by the telescope changes, leading to slow changes in the noise level for a sky noise–dominated telescope. An alternative to bootstrapping over time is to bootstrap over different copies of an identical ("redundant") baseline group. Here the downside is that it remains an open question as to how truly redundant current interferometric arrays are (Dillon et al. 2020) and precisely what the consequences of nonredundancy are (Choudhuri et al. 2021).

With correlated data samples, bootstrapping tends to underestimate the true error bars on a final averaged power spectrum (Cheng et al. 2018). On the other hand, nonstationary effects such as nonredundancy can inflate bootstrap errors rather than revealing the fact that the data in fact come from multiple distributions. In later sections, we will compute error bars that come from bootstrapping over different LSTs, but we will interpret these results with caution given the caveats we have just outlined. Of course, these caveats by no means diminish the value of bootstrap errors as yet another consistency check, particularly when one is diagnosing systematic effects (e.g., Kolopanis et al. 2019).

3.2. Direct Noise Estimation by Visibility Differencing

The foreground and EoR signal varies relatively slowly in time (or frequency), such that after differencing the integrated visibility between very close LSTs (or frequencies), the normalized residual,

Equation (25)

is almost noise-like. We can propagate such Vdiff through power spectrum estimation pipelines to generate a noise-like power spectrum Pdiff, such that

Equation (26)

where appropriate proportionality/normalization constants allow Pdiff to have the same units as—and therefore be directly comparable to—power spectra. This quantity can be viewed as a random variable that represents random realizations of the noise in the system, which can be used to at least roughly estimate error bars in noise-dominated regimes (see Appendix C for more details). It can be computed from either time-differenced or frequency-differenced visibilities. However, by differencing neighboring points in frequency, we are in fact applying a high-pass filter in the delay space, which means that power is suppressed at low delay modes. This is illustrated in Figure 1, and for this reason, the time-differencing method is preferred for empirical noise uncertainty estimation. However, it is important to note that many correlators do not dump data to disk fast enough for this to be feasible, as the sky changes nonnegligibly on a timescale of a few seconds. The maximum time length of a single integration before reaching a decorrelation threshold depends on the baseline length; thus, one needs particular simulations for one's instrument to determine the suitable timescale (Wijnholds et al. 2018). For the upgraded HERA correlator, it will be able to produce time-differenced visibilities on a millisecond timescale for accurate, empirical noise estimates.

Figure 1.

Figure 1. We generate ∼60 realizations of time streams of white Gaussian noise visibilities and compute the time- and frequency-differenced visibilities. Left: power spectra from original visibilities. Middle: power spectra from time-differenced visibilities. Right: power spectra from frequency-differenced visibilities. In each panel, we plot the power spectra from every realization, along with the mean (solid red) and standard deviation (dashed red) of power spectra over all realizations. We see that power spectra from frequency-differenced visibilities are highly suppressed at low delays.

Standard image High-resolution image

3.3. Power Spectrum Method

With appropriate approximations (see Liu & Shaw 2020 for details), it is possible to write down an analytic expression for the noise power spectrum given a system temperature, Tsys, in units of kelvin,

Equation (27)

where XDc and $Y\equiv \tfrac{c{\left(1+z\right)}^{2}}{{\nu }_{21}{H}_{0}E(z)}$ are conversion factors from sky angles and frequencies to cosmological coordinates, Ωeff is the effective beam area, tint is the integration time, Ncoherent is the number of samples averaged at the level of visibility, and Nincoherent is the number of samples averaged at the level of the power spectrum (Zaldarriaga et al. 2004; Pober et al. 2013; Cheng et al. 2018; Kern et al. 2020a). This is an estimate of the rms of a power spectrum measurement in the limit that it is purely thermal noise–dominated. The system temperature, Tsys = Tsky + Trcvr, is the sum of the sky and receiver temperature and describes the total noise content of the visibilities formed between cross-correlating data from different antennas (Thompson et al. 2017).

There are many ways in which the key quantity Tsys can be estimated. For example, we can take advantage of the differenced visibilities discussed in the previous subsection. These differences can then be converted into an estimate of Tsys via the relation

Equation (28)

where kb is the Boltzmann constant, Ωp is the integrated beam area, B is the bandwidth, and Δt is the integration time at a single time sample. The "rms" subscript signifies taking the rms of the differenced visibilities, and p and q are indices denoting two different antennas that form a baseline {p, q}. This serves to emphasize the fact that we can have a distinct system temperature for every baseline.

Another way to estimate Tsys—which we use in this paper—is to use autocorrelation visibilities, i.e., visibilities formed by correlating a single antenna's data with itself. The system temperature on a non-autocorrelation baseline {p, q} is then related to the geometric mean of the autocorrelation visibilities of the two constituent antennas as (Jacobs et al. 2015)

Equation (29)

In Figure 2, we plot the system temperatures predicted using both methods for some HERA data. The lower scatter with the second method is why we recommend its usage.

Figure 2.

Figure 2. Comparison of two ways to estimate the system temperature based on HERA data. The system temperatures of the cross-correlation visibilities on two 14.6 m baselines (indexed by HERA antenna numbers (23, 37) and (36, 51)) are averaged across the LST range of 6.10–6.46 hr. The green regime, from frequency channel number 515 to 695, shows the HERA data band used for analysis in this paper. The labels "auto" and "rms" indicate the method (either from products of autovisibilities or the rms of differenced visibilities) by which the curves of system temperatures are calculated. And the values of the temperatures shown in the labels are the average values over the band specified by the green regime. We see that the results from two methods are consistent to 5%, though the curves from autocorrelations are far less scattered.

Standard image High-resolution image

The noise power spectrum PN correctly describes the error bars assuming that our instrument measures nothing but noise. This may be a suitable approximation for noise-dominated delays. More generally, however, when a signal (be it foreground or systematics) exists, the cross terms of Equation (17) provide an additional contribution to the noise scatter/error bars. 29 This term exists regardless of whether one's foreground mitigation strategy is based on subtraction or avoidance. In the former case, the foreground residuals after subtracting a model from the data enter into the final expression; in the latter case, the whole foreground contribution is propagated as a systematic signal in the data. We show how to take this into account in Appendix D, where we define PSN as

Equation (30)

which serves as a characterization of the error bars on the total sky emission, consistent with the form derived in Kolopanis et al. (2019). Here $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$, the real part of the power spectra formed from x 1 and x 2, serves as a stand-in for a signal-only power spectrum PS, assuming that the signal dominates the noise (whether this "signal" takes the form of foregrounds, systematics, or the cosmological signal).

Using real data helps us approximate the true PS when we do not possess good a priori models. However, by using real data, our estimate of the first term of Equation (30) can, in principle, be negative because ${\tilde{x}}_{1}$ and ${\tilde{x}}_{2}$ contain different noise realizations. This can cause problems, since the signal-only power spectrum is expected to be nonnegative. We thus enforce a hard prior on this term and set negative values of $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$ to zero. In this way, ${P}_{\mathrm{SN}}^{2}$ is always positive, and the error bar PSN is at worst a conservative estimate. When we average the power spectra with error bars, this conservatism leads to a substantial bias between PSN and PN in our final error estimates in the noise-dominated regime. This is because $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$ in the first term of Equation (30) is empirical—and therefore contains noise—which effectively yields a double counting of the noise–noise term in the variance. This double counting does not result in an average bias if one does not enforce our prior, since in a noise-dominated regime, $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$ has zero mean. Our prior ensures that PSN > PN. Despite this, we will show that Equation (30) is a reasonable approximation over broad swaths of the power spectrum. Moreover, if we understand the statistics of noise fluctuations, one can simply predict—and correct for—the double-counting bias in PSN. In the noise-dominated regime, PN characterizes the scatter in $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$. Thus, one can estimate the expectation value of the extra noise contribution from the first term of Equation (30) by computing

Equation (31)

The integral runs over only positive values, since we are imposing a nonnegative prior. Note that here we have neglected any complicated window function effects in inserting the measured power spectrum, essentially assuming that all power is locally sourced at the delay where it is measured. In principle, these effects can be taken into account in a more general derivation within the QE formalism, but we leave this for future work.

We see from Equation (31) that the excess of PSN above PN in the noise-dominated regime is proportional to PN; thus, we can just subtract it from the initially computed PSN. We then define a modified "PSN" free from the double-counting noise bias as 30

Equation (32)

The reduction of double-counting noise bias in this way also holds where signal dominates over noise. Since PN, PSN, and ${\tilde{P}}_{\mathrm{SN}}$ are all either power spectra or constructed from products of power spectra, we name this methodology of error estimation the "power spectrum method."

3.4. Covariance Method

The QE formalism leads to a natural way to write down an analytic form of error bars by propagating the input covariance matrices on visibilities into the output covariance matrices on bandpowers, which we name the "covariance method" (see Appendix E for more details). Provided three set of matrices below containing the full frequency–frequency two-point correlation information of complex visibilities

Equation (33)

the variance in the real part of ${\hat{P}}_{\alpha }$ is

Equation (34)

and the variance in the imaginary part of ${\hat{P}}_{\alpha }$ is

Equation (35)

To get the final error bar on the power spectra, we should accurately model input covariance matrices on visibilities and propagate them into an output covariance matrix on bandpowers. Generally, we assume that the input covariance matrices can be decomposed as C C signal + C noise.

Assuming the distributions of the real and imaginary parts of the noise in visibilities are independently and identically distributed (IID) at the same frequency and uncorrelated between different frequency channels, our expressions simplify considerably. With these assumptions, ${{\boldsymbol{C}}}_{\mathrm{noise}}^{11}$ and ${{\boldsymbol{C}}}_{\mathrm{noise}}^{22}$ are diagonal, and ${{\boldsymbol{C}}}_{\mathrm{noise}}^{12}$, ${{\boldsymbol{U}}}_{\mathrm{noise}}^{11}$, ${{\boldsymbol{U}}}_{\mathrm{noise}}^{22}$, ${{\boldsymbol{U}}}_{\mathrm{noise}}^{12}$, ${{\boldsymbol{G}}}_{\mathrm{noise}}^{11}$, ${{\boldsymbol{G}}}_{\mathrm{noise}}^{22}$, and ${{\boldsymbol{G}}}_{\mathrm{noise}}^{12}$ are all zero. Analogous to Equation (29), one can estimate the diagonal terms of ${{\boldsymbol{C}}}_{\mathrm{noise}}^{11}$ and ${{\boldsymbol{C}}}_{\mathrm{noise}}^{22}$ using the amplitudes of autocorrelation visibilities. For a baseline {p, q} composed of two antennas p and q, its C noise is

Equation (36)

where BΔt is the product of the channel bandwidth and integration time, and Nnights is the total number of nights of data analyzed from a drift scan telescope.

Inserting only C noise for C in Equations (34) and (35), we have another estimate of the noise power variance as

Equation (37)

By taking the trace on the products of matrices, we have in fact taken a weighted average of covariance information over frequencies. The quantity σQE-N should be equal to PN from the previous subsection, provided that in computing Tsys using Equation (27), we average over frequencies to obtain an effective Tsys in the same way. In this way, we see that the analytic noise power spectrum essentially reduces to a special case of Equation (37).

Of course, the fully covariant treatment here also implicitly includes the signal–noise cross terms discussed in previous sections. Including both C signal and C noise in C gives

Equation (38)

Since we have assumed that only ${{\boldsymbol{C}}}_{\mathrm{noise}}^{11}$ and ${{\boldsymbol{C}}}_{\mathrm{noise}}^{22}$ are nonzero, the extra signal–noise cross terms entering into the expression are just their couplings with the signal counterparts. For that last contribution, we estimate C signal as

Equation (39)

Note that this way of modeling C signal is Hermitian and noise bias–free when taking the ensemble average but not positive definite. With a similar argument to PSN in Section 3.3, we enact a hard nonnegative prior on C signal, where rows and columns containing negative diagonal elements are set to zero. This procedure can be shown to give signal–noise cross terms in Equation (38) that are always nonnegative. However, this means that σQE-SN suffers from the same double-counting noise bias with PSN, and analogously, we may construct a modified "σQE-SN" that is also free from the bias as

Equation (40)

Generally speaking, the power spectrum method of the previous subsection is a special case of the covariance method of this subsection. For example, if we estimate PN in a way that carefully accounts for the frequency dependence of Tsys, we should find that when we insert it into the expression for PSN that PSN = σQE-SN. The covariance method has the advantage of providing off-diagonal covariances between different bandpowers in addition to variances.

3.5. Summary

The methods of error bar estimation introduced in this section can be categorized into two groups.

  • 1.  
    σbs, PSN, σQE-SN. These estimate error bars on the total emission, including both contributions from signal–noise cross terms and noise–noise terms.
  • 2.  
    Pdiff, PN, σQE-N. These estimate the true error bars only in the limit where the delay power spectrum is noise-dominated (they may be called the noise levels), only including contributions from the noise–noise terms.

Before we jump into a quantitative discussion using the HERA power spectrum pipeline to compute these error bars in the next section, it is important to stress that there are other methods of error estimation that we do not cover in this paper. For example, LOFAR has used the Stokes V parameter as an estimator of noise level (Patil et al. 2017; Gehlot et al. 2019; Mertens et al. 2020), since the astrophysical sky is expected to exhibit only extremely weak circular polarization. However, reliably estimating Stokes V power requires more accurate polarization calibration solutions than are currently available for HERA (Kohn et al. 2019). Since one of our goals is to test our error estimation methods on HERA data, we will omit discussion of Stokes V techniques in this paper.

4. Tests

In this section, we quantitatively examine the error estimation methods introduced in Section 3. We apply them to 21 cm delay power spectra estimated from both simulated data and HERA Phase I data. We directly compare the relative amplitudes of the error bars predicted by each method, delay mode by delay mode. We also study how the error bars respond to systematics and foregrounds in different regimes of delay space.

4.1. Simulations from a Toy Model

We start with simulations from a toy model. This allows us to generate a large number of realizations, with which we can numerically test the validity of our error bars in the ensemble-averaged limit. Our simulated visibilities include only the foregrounds and noise. For the foreground portion of the visibilities, we draw a random visibility from a frequency–frequency covariance matrix of the form ${{\boldsymbol{C}}}_{{ij}}=A\exp \left[-{\left({\nu }_{i}-{\nu }_{j}\right)}^{2}/{l}^{2}\right]$, where A and l characterize the amplitude and correlation length of the foreground signal, respectively. The adopted covariance model creates smoothly varying functions in frequency space, which is roughly in accordance with the relatively flat spectral structure of real foregrounds. Here we simulate visibilities on two redundant baselines for 20 consecutive time stamps. We set A = 25 and l = 5 MHz, and the foreground visibilities are kept the same on each baseline and over all time stamps. The noise components of the visibilities on each baseline at each time stamp are independently drawn from the same white Gaussian distribution ${ \mathcal N }(0,{\sigma }^{2}=1)$. We produce ∼10,000 realizations of such visibilities and then use hera_pspec code 31 to estimate the delay power spectra and compute the error bars discussed previously.

In Figure 3, we plot the power spectra together with a few of the error bar types computed from one time stamp of data from the simulations. We compute Pdiff by differencing visibilities between one time stamp and the next. We use Equations (37) and (38) to calculate the error bars of the "covariance method," while we evaluate C noise using the exact covariance matrix from which noise visibilities are drawn, since we did not simulate visibilities on autocorrelation baselines. In the top panel of Figure 3, the green shaded regime (which ranges from ±50 to ±750 ns) is where the foreground power is dominant over the noise power. We see that Pdiff and σQE-N are insensitive to the foreground power in this regime, and when moving to higher delays, the noise levels characterized by Pdiff, σQE-N, and σQE-SN are very close to one another. Compared to the other two, Pdiff shows much more scatter from delay to delay, since it is a more empirical estimation of noise based on examining what amounts to noise realizations. Notice also that, as expected by construction, the σQE-SN curve always lies above σQE-N, due to the fact that we enforce a zero clipping on the signal–noise cross term.

Figure 3.

Figure 3. Error bars on single baseline pair power spectra at one time stamp from simulations described in Section 4.1. In the top panel, we plot the power spectra together with error bar types Pdiff, σQE-SN, and σQE-N. The green shaded regime ranges from ±50 to ±750 ns, where the foreground power is dominant over the noise power. In the bottom panels, we plot histograms of bandpowers from ∼10,000 realizations at τ = 320.0 (strongly foreground-dominated regime), 640.0 (transition regime), and 960.0 (noise-dominated regime) ns, along with PDF curves predicted using the σQE-SN and σQE-N values at the same delay. At τ = 320.0 and 640.0 ns, the PDF takes a Gaussian form. At τ = 960.0 ns, the PDF takes the form of a Laplacian. The P(k) values used in the histograms have been subtracted from the mean value of all realizations. We can see that the error bars are roughly comparable to each other in amplitudes in the noise-dominated regime. At τ = 320.0, the envelope of the histogram matches exactly with the PDF using σQE-SN. At τ = 960.0, the envelope of the histogram matches the PDF using σQE-N, while we see that the PDF using σQE-SN is broader. Therefore, using σQE-SN will lead to a more conservative estimate of errors in this delay regime.

Standard image High-resolution image

In the bottom panel of Figure 3, we plot histograms of power spectra at three delays (τ = 320.0, 640.0, and 960.0 ns) by accumulating data points from ∼10,000 realizations. The results here are therefore representative of ensemble-averaged expectations. At each delay, we also plot theoretical predictions for the probability distribution functions (PDFs). Precisely what form these PDFs take will depend on the delay. In the low-delay regime, Equation (17) shows that the variation comes from single powers of visibility noise, which we assume is Gaussian. (Recall that we are not modeling the signal as a random field, in the sense that it does not participate in our ensemble average.) The result is a Gaussian PDF. At high delays, Equation (17) shows that the power spectrum is the cross-multiplication of two independent realizations of noise. The resulting PDF is a Laplacian. Both of these distributions take one free parameter (the standard deviation of power), and we show predictions where this standard deviation is specified by σQE-SN and σQE-N. At τ = 320.0 and 640.0 ns, we plot Gaussian reference PDFs. At τ = 960.0 ns, we plot a Laplacian reference PDF. We see that at τ = 320.0 ns, where the foreground power is overwhelmingly dominant, the shape of the histogram is indeed Gaussian-like, and its envelope matches the PDF curves using σQE-SN. At τ = 960.0, where noise is dominant, the shape of the histogram is indeed Laplacian-like, and its envelope matches the PDF curves using σQE-N (since σQE-N does not suffer from the conservatism of σQE-SN discussed in Section 3.3). With τ = 640.0 ns, we have a transition case between the two extremes. The distribution of power spectra will be skewed, since neither the signal nor the noise dominates on this occasion (for a mathematical proof of the skewness, see Appendix F). The histogram does not match the PDF predicted by either standard deviation, but note from the widths of the PDFs that an error bar given by σQE-SN is a conservative error, as we designed it to be.

In Figure 4, we present the same types of error bars plus a bootstrapped one on power spectra that were formed by incoherently averaging over 20 time stamps. We see in the green regime that σbs agrees with σQE-SN. All of the different kinds of error bars agree well with each other in the noise-dominated regime, and with the extra time-averaging step (compared to Figure 3), Pdiff exhibits less scatter. Again, we plot histograms of the averaged power spectra from Monte Carlo simulations against Gaussian PDF curves at τ = 320.0, 640.0, and 960.0 ns. One feature to note from the histogram is that each distribution has become nearly Gaussian. This is simply due to the central limit theorem, as power spectra are averaged together incoherently. In addition to σQE-SN and σQE-N, we also plot the PDFs using ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$, which eliminates the double-counting bias in σQE-SN. It is as expected that the PDF using ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ is closer to the one using σQE-N at the noise-dominated delay mode.

Figure 4.

Figure 4. Error bars on time-averaged power spectra over 20 time stamps from simulations in Section 4.1. The figure follows similar conventions to Figure 3, except (top) σbs is added and (bottom) all PDFs take the form of Gaussians, and the ones specified by ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ are appended. We observe good agreement between σbs and σQE-SN in the foreground-dominated regime and the consistency of all types of labeled error bars in the noise-dominated regime. After the incoherent average, we see histograms at all delays become Gaussian. Additionally, ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ is clearly different from σQE-SN, where the signal is less dominant. Especially at τ = 960.0 ns, the PDF using ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ is closer to the exact noise-dominated version using σQE-N.

Standard image High-resolution image

4.2. Application to HERA Phase I Data

The HERA Phase I data used for analysis in this paper consist of 18 observing nights taken in the Karoo Desert, South Africa, from 2017 December 10 to 28. The HERA array consisted of ∼40 functional antennas during observations, which were taken across a 100–200 MHz band comprised of 1024 channels and dual polarization "X" and "Y" feeds. (See Table 1 of Kern et al. 2020b for more details on the array and correlator specifications during the observations.) The data used in this work were first preprocessed with the HERA analysis pipeline (internally called H1C IDR2.2 32 ). This includes automated metric evaluation and data flagging for faulty antennas and radio frequency interference (RFI). In addition, the data are redundantly calibrated (Dillon et al. 2020), absolutely calibrated (Kern et al. 2020b), binned, and averaged across observing nights; inpainted over RFI gaps in frequency; and then treated for known instrumental systematics (Kern et al. 2020a).

We pick a slice of HERA Phase I visibilities taken from a 14.6 m redundant baseline group during an LST range of 5.75–6.10 hr. The visibilities in each time stamp are integrated over ∼10 s. We select visibilities falling within a 150.3–167.8 MHz band to compute the power spectra. We use pseudo-Stokes I visibilities VpI, which are constructed by combining the visibilities from a cross-correlation of two X feeds ("XX") and a cross-correlation of two Y feeds ("YY") as follows:

Equation (41)

In forming the delay power spectra, we cross-correlate visibilities from different baselines (e.g., b 1 b 2, b 1 b 3, b 2 b 3, etc.) and between odd and even time stamps (e.g., t1t2, t3t4, t5t6, etc.) to form delay power spectra. In this way, we obtain power spectra on 253 baseline pairs at 30 time stamps.

We show the power spectra from one baseline pair at one time stamp in Figure 5, together with error bar types Pdiff, σQE-SN, σQE-N, PSN, and PN. The Pdiff errors are computed from time-differenced visibilities; e.g., for power spectra at the cross time stamp t1t2, we form VdiffV(t2)–V(t1), and then we cross-multiply Vdiff from two different baselines to obtain the corresponding Pdiff for that baseline pair. We calculate σQE-SN and σQE-N using Equations (37) and (38) with C signal and C noise specified by Equations (36) and (39). Equations (27) and (30) give the expressions for PSN and PN. See hera_pspec for detailed implementation.

Figure 5.

Figure 5. Error bars on single baseline pair power spectra at one time stamp from HERA Phase I data. The visibilities are selected from a band spanning 150.3–167.8 MHz. Top: power spectra with error bars. The green shaded regime ranging from ±20 to ±200 ns is expected to be foreground-dominated. Middle: absolute relative difference between selected error bars with σQE-N. Bottom: absolute relative difference between selected error bars with σQE-SN. We see numerically that PSN differs from σQE-SN by less than 1% and that the same is true for PN and σQE-N.

Standard image High-resolution image

In the top panel of Figure 5, we see that all error bars agree well with each other in the noise-dominated regime (the red curve for PN is almost exactly underneath the brown curve for σQE-N, making the former difficult to see; the same is true for the teal curve for PSN versus the bright green curve for σQE-SN). The green shaded regime ranging from ±20 to ±200 ns is where foregrounds are expected to dominate. Here we see that Pdiff also responds to the foreground power, similar to PSN and σQE-SN. This tells us that the time-differenced visibilities contain nonnegligible foreground residuals, which is not surprising, since the sky is expected to evolve nonnegligibly over the ∼10 s of difference between our time samples.

In Section 3, we argued that the "covariance method" and the "power spectrum method" should be equivalent to each other. In the middle and bottom panels of Figure 5, we compute the relative difference in magnitudes between error bars, setting σQE-SN and σQE-N as the benchmarks, respectively. We see that PSN differs from σQE-SN and PN from σQE-N by less than 1%, so they are essentially equivalent in our pipeline. On the other hand, Pdiff can differ from σQE-N at more than the 10% level due to the fact that it is highly scattered. Note that σQE-SN and PSN are also scattered at some delays, whereas they are equal to σQE-N and PN at other delays. This is due to our imposition of a nonnegative prior on the signal–noise cross term.

In Figure 6, we show the power spectra with error bars on the same baseline pair as in Figure 5 but with the further step of incoherently averaging over 30 time samples. We still see that all error bars (with bootstrap errors σbs added) agree well in the noise-dominated regime. At low delays, σbs peaks at an even higher value than σQE-SN. This is because the sky is not unchanged over different time stamps, so the bootstrapped error bars over the time samples are inflated. After incoherently averaging, we still see PSN differing from σQE-SN and PN differing from σQE-N by less than 1%. On the other hand, Pdiff and σbs differ from σQE-N at roughly the 10% level in the noise-dominated regime. We also see that in the limit of noise domination, σQE-SN has a relative bias over σQE-SN by about 30%. Therefore, using σQE-SN or PSN leads to a conservative estimate of one's errors, as we expected. For comparison, we also plot the results of ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$, which eliminates the double-counting noise bias in σQE-SN. The relative difference between ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ and σQE-N is reduced to a few percent in the noise-dominated regime, while ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ is not significantly modified from σQE-SN in the foreground-dominated regime. Thus, if we want a compromise on reflecting the properties of the signal–noise cross term while not introducing noise bias, ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ might be our choice.

Figure 6.

Figure 6. Error bars on single baseline pair power spectra incoherently averaged over 30 time samples from the same slice of HERA Phase I data as Figure 5. Our plotting conventions also follow those of Figure 5 for other conventions. We add results from ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ in each panel. In the middle panel, we see that the relative difference between ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ and σQE-N drops remarkably from ∼30% to a few percent compared to the σQE-SN, demonstrating the effectiveness of our noise-double-counting bias removal. On the other hand, in the bottom panel, we see that going from σQE-SN to ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ results in significant changes only at the noise-dominated delays; thus, there one can always elect to use ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ even in foreground-dominated regimes.

Standard image High-resolution image

What we have established so far is the relative agreement (or lack thereof) between different types of error bars in different regimes. However, we have not yet established the absolute validity of these error bars on real data (i.e., we have not ruled out the possibility that they are all incorrect in the same way). For simulated power spectra, we were able to compare the Monte Carlo histograms with the PDF curves predicted from the error bars. The good match between the two gave us confidence in applying our error estimation methods. Might we perform similar analyses for power spectra from real data? Unfortunately, in real observations, we only have one realization of the sky, so we cannot reach an ensemble average limit by accumulating data points from a large number of realizations. Also, unlike simulated data with understood statistics, real data will contain systematics that make their statistics more complicated and difficult to understand (although this may change as the field of 21 cm cosmology continues to mature).

For now, we may partially achieve our goal by checking the distributions of noise-like modes in our power spectra of real data. The noise-like modes refer to power spectra at higher delays, where noise power is thought to be dominant and systematics are negligible. As we discussed in Section 3, we expect the noise visibilities to be Gaussian-distributed. This makes it possible to analytically compute the resultant statistics of the power spectra. In Appendix G, we derive the mathematical form of the PDF of incoherently averaged noise-dominated power spectra. The final result, Equation (G6), shows that the correct PDF is a weighted sum of a series of Laplacian distributions. As a numeric test of the derivation, we produce Monte Carlo histograms of incoherently averaged power spectra from pure Gaussian noise visibilities with an increasing number of averaged samples in Figure 7. We generate ∼10,000 realizations of power spectra with multiple time samples and evaluate the power spectra at a single time stamp, as well as what it would be if incoherently averaged over five or 15 time stamps. For realizations at each time sample, we can calculate the error bar σQE-N of the power spectra and substitute it into Equation (G6). It is clear that the predicted PDF matches the envelope of the histograms and that the shape of the histograms of the averaged power spectra become increasingly Gaussian when averaging is over more time stamps. This is again a result of the central limit theorem.

Figure 7.

Figure 7. We plot the histograms of incoherently averaged power spectra over certain time stamps from pure noise simulations. The histogram in each column contains ∼10,000 data points. We compute σQE-N and refer to Equation (G6) to evaluate the "sum of Laplacians" PDF. Data points have been subtracted from the mean over all realizations. We also plot the equivalent Gaussian PDF with the same variance as the "sum of Laplacians" PDF. The green arrows point to the dotted vertical lines representing 3σ and 5σ, where σ is the square root of the variance of the predicted PDF. We see that the envelopes of the histograms match the PDFs predicted using Equation (G6) very well. As a check, the fractions of outliers beyond 3σ in each histogram are (1.27%, 0.57%, 0.25%), while the corresponding values from the predicted PDFs are (1.34%, 0.58%, 0.22%), a very close agreement. And with more time samples to be incoherently averaged, the shape of the histogram becomes increasingly Gaussian, which is a consequence of the central limit theorem. As expected, we also see the distribution get narrower with more samples averaged together.

Standard image High-resolution image

Confronting our results with real data, we use the power spectra from the same HERA Phase I data set as Figures 5 and 6 to generate the histograms. To accumulate sufficient data points for a histogram, we view all noise-like modes in power spectra over different redundant baseline pairs as independent realizations. And we carry out the incoherent average over the time axis. Because the noise level at different baseline pairs may differ, all power spectra are first normalized by being divided over their corresponding σQE-N and then subtracted from the mean of all data points. After the normalization, we have a uniform error bar σQE-N for all data points at each time sample. We then make histograms and compare their envelopes with the PDF of "sum of Laplacians" predicted using Equation (G6).

Before we jump to the results, we first take a look at the data set, which includes RFI gap inpainting but without the removal of systematics. For the histograms drawn in Figure 8, we evaluate the distributions of power spectra at delays larger than 2000 ns and between 500 and 1500 ns. In the former case, we see that the shapes of the histograms are perfectly consistent with the predicted PDF, and the distributions become more Gaussian and narrower with an increasing number of averaged samples, similar to what we saw in Figure 7. In the latter case, we observe that the histograms are flattened and much wider compared to the predicted PDF, and there exist evidently hefty wings on either end. Numerically, the fractions of outliers beyond 3σ in each histogram are (7.95%, 10.70%, 11.46%), which greatly exceed the corresponding values from the predicted PDFs (1.36%, 0.57%, 0.24%). This is a remarkable proof that significant systematics exist at lower delays in inpainted-only data, as we expect.

Figure 8.

Figure 8. Histograms of power spectra at noise-like modes from the same HERA Phase I data used in Figures 5 and 6, including RFI gap inpainting but without the removal of systematics. The data points are accumulated from power spectra at the same delays from different redundant baseline pairs. Because their noise levels may differ, they are first normalized by dividing out their corresponding σQE-N and then having the mean of all data points subtracted off. In this way, we have a uniform σQE-N for all points, and we use Equation (G6) to compute the "sum of Laplacians" PDF. Refer to Figure 7 for other plotting conventions. Top: histograms from power spectra at all delays larger than 2000 ns, where there are ∼27,000 points in each column. Bottom: histograms from power spectra at delays between 500 and 1500 ns, where there are ∼9000 points in each column. As a check, in the top panels, the fractions of outliers beyond 3σ in each histogram are (1.49%, 0.65%, 0.40%), which are close to the corresponding values from the predicted PDFs (1.36%, 0.57%, 0.24%). In the bottom panels, the fractions of outliers beyond 3σ in each histogram are (7.95%, 10.70%, 11.46%), which greatly exceed the corresponding values from the predicted PDFs (1.36%, 0.57%, 0.24%).

Standard image High-resolution image

We produce histograms for the systematics-removed data, as we used for Figures 5 and 6, in Figure 9. At delays larger than 2000 ns, we still see a good match between the Monte Carlo histograms and the predicted PDFs, while at delays between 500 and 1500 ns, we see that the deviations between the histograms and PDFs are highly suppressed compared to Figure 8. This is not surprising, since we have exerted systematics removal. Though there is still a little excess above the PDFs in the histograms on the far ends, this does not substantially affect the error bars that one might quote on a power spectrum measurement (which serve as a summary statistic for the main bulk of the PDF rather than its wings). However, such deviations are worth keeping an eye on, especially when performing rigorous jackknife or null tests in an attempt to understand the systematics in one's instrument. As noted above, the excessive wings of the histograms in the bottom panels of Figure 8 can serve as a diagnostic tool for systematics that lead to deviations from Gaussian noise-like visibilities. They may also be used to investigate the related question of how instrumental systematics (e.g., Kern et al. 2019, 2020a) might affect the validity of one's error bars. Readers should interpret Figures 8 and 9 as a quality check of HERA Phase I data that shows that the power spectra at high (<2000 ns) and middle (500–1500 ns) delays after systematics mitigation are close to the predicted behaviors of Gaussian noise visibilities. Thus, σQE-N (along with other equivalent methods) validates itself a successful tool to characterize the noise statistics in real data. However, we will still quote ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ as a more robust error bar on reporting EoR upper limits at those delays. One should be aware that not all systematics can be cleanly corrected for, which means that, in principle, the statistics can be much more complicated than the simple Gaussian distribution shown here. Along this theme, we urge readers to always perform consistency checks on the data, including but not limited to the ones we have performed here.

Figure 9.

Figure 9. Histograms of power spectra at noise-like modes from inpainted and systematics-mitigated HERA Phase I data. The power spectra used here come from exactly the same data set as in Figures 5 and 6. As a check, in the top panels, the fractions of outliers beyond 3σ in each histogram are (1.48%, 0.63%, 0.39%), which are close to the corresponding values from the predicted PDFs (1.36%, 0.57%, 0.24%). And in the bottom panels, the fractions of outliers beyond 3σ in each histogram are (2.19%, 1.32%, 0.80%), which slightly exceed the corresponding values from the predicted PDFs (1.36%, 0.57%, 0.24%), but at a much lower level than the disagreement seen in Figure 8.

Standard image High-resolution image

5. Discussion

In previous sections, we have examined a number of different methods for assigning error bars to a HERA power spectrum. Here we perform a comparison of the different types of error bars, highlighting the advantages and disadvantages of each.

We first compare the error bars using the "covariance method" (σQE-N and σQE-SN) to those computed using the "power spectrum method" (PN and PSN).

  • 1.  
    The covariance method error bars analytically take the covariance of the input visibilities and propagate them through to the output covariance of the bandpowers via general formulae given by Equations (34) and (35). There are two weaknesses to this approach. First, the output errors will only be as good as the modeling of the input covariances. This modeling is particularly difficult for foregrounds and systematics, which can have statistical properties that are not entirely understood. In this paper, we adopt a strategy where we view systematics as nonrandom and empirically estimate them from the real data. The other weakness of our covariance method is that our derivations rely on Gaussianity. (Indeed, it would be strange for this method to only require an input covariance—a two-point function—if it were capable of capturing the effects of non-Gaussianity.) This assumption will also be violated by foregrounds and systematics, as well the cosmological signal (which is an effect that was modeled in Mondal et al. 2016, 2017; Shaw et al. 2019). Sidestepping these modeling restrictions on the covariance method are the noise-dominated bandpowers at high delays. In this regime, we use a diagonal input covariance matrix Cnoise, with its diagonal elements set by the autocorrelation visibilities as Equation (36). The resulting error bars we call σQE-N (see Table 3 for the reminder of our notation). These error bars are confirmed by tests on simulations and real data in Figures 7 and 9, which verify that the error bars do properly account for the spread seen in an ensemble of Monte Carlo simulations. Further bolstering our confidence in using the covariance method are their agreement with other error metrics at our disposal. Figures 5 and 6 show that in the noise-dominated regime, the error bars using the covariance method are in excellent agreement with the bootstrap errors σbs, error bars using the power spectrum method, and the power spectrum of differenced data Pdiff.
  • 2.  
    The agreement between these different error estimation methods raises the question of why one might favor the covariance method over others. Consider first a comparison between σQE-N and PN from the power spectrum method. These two methods are in fact quite similar, because PN is also an analytically propagated measurement of error, as one can see, for instance, in the derivation of Zaldarriaga et al. (2004). The difference is one of generality, whether in the inputs, intermediate steps, or outputs. On the input side, PN assumes uncorrelated noise between visibilities whose amplitude is governed by the radiometer equation; σQE-N can accept an arbitrary input covariance (even though in our tests, we take it to be diagonal). During the actual propagation of errors, the derivation of PN assumes that fluctuations in uvν space are uncorrelated; σQE-N makes no such approximations. Finally, on the output side, the power spectrum method returns a single error bar; the covariance method provides a full bandpower covariance matrix.

Of course, in reality, not all delay modes are noise-dominated, and reliable error bars need to be placed even in signal-dominated regimes (whether this signal comes in the form of instrument systematics, foregrounds, or, ultimately, the cosmological signal). It is difficult to place rigorous error bars on bandpowers in these regimes; unless one has a physical model for all of the systematics involved (with knowledge of their probability distributions), it is an ill-defined problem to ask how errors propagate. Unfortunately, the presence of unexplained (or at least not fully explained) systematics is the current state of affairs in 21 cm cosmology, and truly rigorous error bars will need to wait for future work on the modeling of systematics.

Even with well-defined (if not perfectly characterized) systematics, the meaning of one's error bars is subtle. For instance, foregrounds such as a continuum of unresolved point sources can be appropriately treated as a random field. Given this, one's approach might be to say that the unresolved point sources contribute some effective power spectrum to the measurement. With such a formalism, there is a fundamental limit to how well these foregrounds can be characterized, since they come with their own form of cosmic variance. In other words, if one is trying to place constraints on foregrounds, one must account for the fact that the particular realization of foregrounds that we see may not be representative of foregrounds in general. This sort of error is difficult to compute in general, as the squared nature of the power spectrum means that the non-Gaussian—and therefore nontrivial—four-point function of the foregrounds needs to be known.

A goal of characterizing the general statistical properties of all possible foregrounds, however, may be unnecessarily ambitious. In particular, for a cosmological measurement, one is not particularly concerned with the behavior of a "typical" foreground; one is primarily concerned with how our particular realization of foregrounds affects our observations. As a concrete example, if our Galaxy's synchrotron emission happens to be anomalously bright compared to a typical galaxy's synchrotron emission, it is our own brighter foregrounds that we need to deal with. With such a mindset, it is more appropriate to consider all foregrounds as nonrandom components of our data. By this, we do not mean that the foregrounds need to be spatially or spectrally constant; rather, we mean that in hypothetical random draws for taking ensemble averages, the cosmological signal and the instrumental noise change with each new realization, but the foregrounds remain the same. If the foregrounds are not formally random, our error bars are the result of instrumental noise (and, in principle, cosmic variance of the cosmological signal, although this contribution is small for current upper limits).

It is important to stress, however, that even if our error bars are due to the randomness of instrumental noise, the resulting error bars are not simply what one obtains from imagining a noise-only measurement and propagating the noise fluctuations through to a power spectrum. This is because the power spectrum is a squared statistic. Thus, in the squaring of a measurement that contains both noise and a (nonrandom) signal, there are signal–noise cross terms to contend with. These terms are zero in expectation but do not have nonzero variance. This means that knowledge of the signal (whether from systematics or foregrounds) is needed to correctly account for instrumental noise errors in non-noise-dominated regimes.

  • 1.  
    In short, even if we lower our ambitions and forgo incorporating knowledge about signal statistics into our error calculations, understanding the signal itself is necessary for computing noise-sourced error bars. This requirement is where noise-only computations like PN and σQE-N fall short.
  • 2.  
    This shortcoming is remedied by generalized versions of PN and σQE-N, which we dub PSN and σQE-SN. These are given by Equations (30) and (38). The key idea is that in signal-dominated regimes, the measured data themselves can be a good approximation to the signal. Thus, we may reinsert the data in an appropriate way to capture signal terms in our general expressions. Figures 3 and 4 show that these error bars work well in both signal-dominated and noise-dominated regimes.
  • 3.  
    Although we treat foregrounds and systematics as a single signal term that is directly estimated from measured data in this paper, we note that for future high-sensitivity detections, more elaborate modeling of both is needed. Of course, there is also the possibility of unknown systematic effects, which our formalism does not account for.
  • 4.  
    Moreover, two cautionary warnings are in order when applying Equations (30) and (38). The first is that because the measured data are now part of the error bars themselves, it can be dangerous to use these error bars to inform data weightings for downstream averages in one's pipeline (e.g., in further incoherent time averaging of power spectra or incoherent averaging of power spectra from different baselines). If the data weightings are coupled to the data themselves, our so-called QEs are no longer quadratic. As shown in Cheng et al. (2018), a blind application of the usual methods for normalizing QEs leads to power spectrum estimates that are biased low ("signal loss"). For this reason, while PSN and σQE-SN are fine ways to compute error bars, we recommend that any error-motivated data weightings be based on PN and σQE-N instead.
  • 5.  
    The second warning is that there almost certainly exist regimes that are neither signal- nor noise-dominated, where signal and noise are comparable in magnitude. Here it becomes necessary to contend with the fact that a noisy measurement of the signal can be unphysically negative. Said differently, if our estimate of the signal itself contains noise, we are in effect double counting the noise in our error computations. One approach is to enact a hard prior on the positivity of the signal. This is what was done in all computations of PSN and σQE-SN in this paper. However, Figures 3 and 4 show that this has the effect of inflating the error bars. Given that this is a conservative bias on the errors, this may or may not be appropriate, depending on one's application.
  • 6.  
    A slightly more accurate approach is to assume that instrumental noise is Gaussian-distributed and quantitatively predict and correct the noise bias in the errors. Implementing this correction gives ${\tilde{P}}_{\mathrm{SN}}$ and ${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$, which are given by Equations (32) and (40), respectively. Figures 3 and 4 show that this corrects the bias and gives error bars that are no longer overly conservative. However, because this correction is designed to give unbiased errors in expectation, it will occasionally give error bars that are slightly smaller than the error predicted by noise-only estimators such as PN. In practice, however, we find that this is a reasonably rare occurrence.

With the aforementioned difficulties with error estimation in the presence of poorly characterized signals, one may be tempted to make use of more empirically based error estimates. These estimates also come with their strengths and weaknesses.

  • 1.  
    As discussed in Section 3.2, Pdiff from frequency-differenced data suffers from a bias at low delays. Figure 1 shows that even at reasonably high delays of ∼1500 ns, the bias can be significant. Thus, while Pdiff from frequency-differenced data is a useful asymptotic check at high delays, it is not a robust estimator of our errors. Implementing Pdiff using time-differenced data does not have the delay-dependent bias, as one can also see in Figure 1. However, care must be taken to ensure that the time differencing is small enough to suppress any sky signal that is coherent between adjacent time samples (Dillon et al. 2015). In addition, with a differencing scheme, one is ultimately constructing noise realizations, not noise statistics. The resulting error bars thus show considerable scatter. In that sense, the analytically propagated error bars vary in a more physically plausible—smoother—way with time and frequency.
  • 2.  
    The problem of a noisy error bar estimate persists with σbs. However, bootstrapping has several appealing features that make it a crucial check on the analytically propagated error bars. First, no assumptions are made regarding the Gaussianity of the input data. Thus, the fact that our σbs agrees with our analytically propagated errors—which assumed the input noise in the visibilities—is an essential validation of our assumptions. In a similar way, σbs may potentially capture increased variance due to systematics, since it is a measure of the uncertainties of total sky emission. However, the bootstrap method is known to suffer from some important limitations. For example, as noted in Appendix B, if systematics are correlated between samples, the bootstrap method tends to underestimate errors. Also, bootstrapped error bars will be inflated from nonstationary effects such as sky brightness changes and nonredundancies between nominally identical baselines. Precisely how these nonstationary effects should be folded into one's error estimation is reserved for future work, but the correct approach will certainly be more sophisticated than a simple inflation of errors. That said, this increase in bootstrap errors due to nonstationarity can serve as a useful diagnostic for further examination of unexpected systematics.

In Table 4, we summarize the discussion in this section with a succinct listing of the pros and cons of each error estimation method.

Table 4. A Summary of the Advantages and Disadvantages of Different Error Estimation Methods in 21 cm Power Spectrum Estimation

Error Bar TypeProsCons
Bootstrap (σbs)Easy to implement with minimal a priori assumptions; can be useful as a reference statistics in diagnosis of systematicsNot strictly applicable in the presence of nonindependent and nonstatistically stationary data samples
Power spectra from differenced visibilities (Pdiff)Data product close to raw dataProvides noise realizations rather than direct error bars, resulting in considerable scatter
Power spectrum method (PN and PSN)Accurately captures variances/error bars in noise-dominated (both PN and PSN) and signal-dominated (PSN) regimesDoes not contain covariance information between different bandpowers; PSN requires nonnegativity prior on the signal, which slightly inflates errors; downstream data weightings using PSN at risk of signal loss
Covariance method (σQE-N and σQE-SN)Same accuracy as PN and PSN for variance information and additionally provides full covariance informationDerivation assumes data is Gaussian; σQE-SN requires nonnegativity prior on the signal, which slightly inflates errors; downstream data weightings using σQE-SN at risk of signal loss
Modified covariance method (${\tilde{\sigma }}_{\mathrm{QE} \mbox{-} \mathrm{SN}}$ ) and modified power spectrum method ${\tilde{P}}_{\mathrm{SN}}$ Eliminates conservative double counting of noise in noisy estimates of the signalOccasional error predictions that are slightly smaller than instrumental noise expectations from σQE-N and PN

Download table as:  ASCIITypeset image

6. Conclusions

In this paper, we have systematically studied a variety of error bar methodologies in 21 cm power spectrum estimation. We have synthesized some of the common techniques in the literature, outlining their relative strengths and weaknesses in quantifying noise levels and accounting for residual systematics. Specifically, we considered a variety of types of error estimators, including the following.

  • 1.  
    Power spectrum methods. This includes the standard PN estimator for the noise power spectrum found in the literature (Zaldarriaga et al. 2004; Parsons et al. 2012a; Pober et al. 2013; Cheng et al. 2018; Kern et al. 2020b) and the PSN estimator that involves cross products with signal power spectra PS, as detailed in Kolopanis et al. (2019). Here we set PS to be the real values of experimentally observed power spectra, which is a good approximation when the signal dominates the noise. Our implementation of PSN leads to a double-counting bias compared to PN that is considerable in noise-dominated regimes, and we show how a modified form, ${\tilde{P}}_{\mathrm{SN}}$, can eliminate this bias.
  • 2.  
    Covariance methods. This consists of propagating a data covariance matrix between frequencies per time stamp and per baseline pair through the QE formalism to the bandpower covariance matrix (Liu & Tegmark 2011; Dillon et al. 2014; Liu et al. 2014a, 2014b), including error metrics described here: σQE-N for noise-dominated spectra and σQE-SN that includes signal–noise terms. These have the same variance predictions as PN and PSN by construction but also provide bandpower covariance information.
  • 3.  
    Other methods. Other methods studied in this work include the bootstrapping method that can lead to misreported errors when not handled carefully (Cheng et al. 2018), as well as the method of using differenced visibilities as noise realizations propagated through a power spectrum estimator. We show that differencing in frequency is ill advised for this approach. Differencing in time avoids some problems, but either differencing scheme generates error estimates that are rather scattered. However, we stress that the importance of these more empirically based methods is useful cross-checks (e.g., in the manner performed in this paper) that can also be helpful diagnostics for systematics (e.g., Kolopanis et al.2019).

Using simulations and real HERA Phase I data, we show that these methods are generally in agreement with each other, demonstrating their robustness and applicability to future delay power spectrum measurements from HERA. Importantly, we show that for bandpowers that are not completely dominated by noise, one needs to go beyond the standard thermal noise estimates and account for signal–noise cross terms in order to fully describe the uncertainty on the bandpower. In a series of Appendices, we also examine sources of skewness in probability distributions of measured power spectrum bandpowers (Appendices A and F), derive exact expressions for the probability distributions of incoherently summed delay power spectra (Appendix G), and examine whether common baselines in the cross-multiplication of multiple baseline pairs affect assumptions about error independence (Appendix B). The insights gained in this paper regarding error estimation are applicable in 21 cm cosmology beyond HERA. They provide a foundation upon which to develop rigorous error estimation methods that will prove to be key in unlocking the potential of the 21 cm line as a powerful probe of our high-redshift universe.

This material is based upon work supported by the National Science Foundation under grant Nos. 1636646 and 1836019 and institutional support from the HERA collaboration partners. This research is funded in part by the Gordon and Betty Moore Foundation. HERA is hosted by the South African Radio Astronomy Observatory, which is a facility of the National Research Foundation, an agency of the Department of Science and Innovation. Parts of this research were supported by the Australian Research Council Centre of Excellence for All Sky Astrophysics in 3 Dimensions (ASTRO 3D) through project No. CE170100013. G.B. acknowledges funding from the INAF PRIN-SKA 2017 project 1.05.01.88.04 (FORECaST) and support from the Ministero degli Affari Esteri della Cooperazione Internazionale—Direzione Generale per la Promozione del Sistema Paese Progetto di Grande Rilevanza ZA18GR02 and the National Research Foundation of South Africa (grant No. 113121) as part of the ISARP RADIOSKY2020 Joint Research Scheme, the Royal Society and the Newton Fund under grant NA150184, and the National Research Foundation of South Africa (grant No. 103424). P.B. acknowledges funding for part of this research from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 948764) and from STFC grant ST/T000341/1. J.S.D. gratefully acknowledges the support of NSF AAPF award No. 1701536. N.K. acknowledges support from the MIT Pappalardo fellowship. A.L. acknowledges support from the New Frontiers in Research Fund Exploration grant program, the Canadian Institute for Advanced Research (CIFAR) Azrieli Global Scholars program, a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant and a Discovery Launch Supplement, the Sloan Research Fellowship, and the William Dawson Scholarship at McGill.

Appendix A: Skewness in Power Spectra Estimated from Multiple Identical Baselines

In this Appendix, we consider a source of skewness in probability distributions of delay spectra. In particular, we consider the noise properties of power spectra formed from a set of identical ("redundant") baselines. We show that even if each baseline is measuring Gaussian random noise with mean zero, the resulting power spectra will exhibit some skewness. We emphasize, however, that this skewness vanishes if one additionally splits the data into two distinct sets of time samples (e.g., even and odd time stamps) and estimates power spectra that are not only cross-baselines but also cross-times.

As a concrete example, suppose that on the ith copy of a particular baseline, we measure ${\tilde{x}}_{i}\equiv {c}_{i}+{{id}}_{i}$ after taking the delay transform, where ci and di are independently Gaussian-distributed random variables with a variance σ2/2. This represents the behavior of ${\tilde{x}}_{i}$ at noise-dominated delays. If only two identical baselines were available, cross-multiplying them to obtain a power spectrum would yield

Equation (A1)

Consider the real part. Since c1 and c2 are independent random variables, c1 c2 is a symmetric distribution about zero (and, in fact, is given by K0, the zeroth-modified Bessel function of the second kind). The same reasoning holds for the d1 d2 term. Since {ci } and {di } are independent, it follows that c1 c2 and d1 d2 are also independent. The result is that the probability distribution for c1 c2 + d1 d2 is given by the convolution of the distributions for the individual terms. With the two contributing distributions both symmetric about zero, their convolution inherits this property and is in fact given by the Laplacian distribution discussed in Section 4.1.

The situation is different when we have more than two baselines. Taking all possible pairwise combinations (excluding the multiplication of a baseline with itself to eliminate noise bias), we obtain

Equation (A2)

where we have grouped our result into two terms that can be considered separately because {ci } and {di } are independent. Consider the first term. It has zero mean,

Equation (A3)

because the different {ci } are independent. However, the resulting distribution has a skewness to it, which can be seen by the fact that the third moment is nonzero:

Equation (A4)

(Of course, in principle, we should be taking the cube of Equation (A2) in its entirety, not just the first term. However, the independence of {ci } and {di } means that we reach the same conclusion.) The nonzero third moment shown here arises because the three terms that make up the sum are correlated as a triplet, even though each pair has no average covariance. For instance, the covariance between c1 c2 and c1 c3 is

Equation (A5)

This implies that even though c1 c2, c1 c3, and c2 c3 are not independent, for the purposes of computing the variance of the final result, one obtains the same result even if one pretends that these contributions are independent. This result is explored in more detail in the first half of Appendix B.

To summarize, the different moments of the distribution provide different insights into power spectrum estimation with different baseline pair combinations. The mean of the distribution is zero, indicating that there is no bias (as one might expect for cross-correlation spectra). The variance turns out to be the same expression as if we had completely independent baseline pairs, so the noise averages down with the number of baseline pairs, as one might naively have expected them to (without worrying about correlations). However, the skewness is nonzero. This complicates the interpretation of null tests that implicitly assume that the probability distributions of noise-dominated delays are symmetric.

Importantly, these considerations do not apply when we consider the imaginary part, which is given by

Equation (A6)

This has a third moment given by $\left\langle {\left({c}_{2}{d}_{1}+{c}_{3}{d}_{1}-{c}_{1}{d}_{2}+{c}_{3}{d}_{2}-{c}_{1}{d}_{3}-{c}_{2}{d}_{3}\right)}^{3}\right\rangle $. To get terms that are nonzero under the expectation value, we require terms that contain squares of the random variables when we multiply out the polynomial. For example, the first term c2 d1 must be multiplied onto c2 d3 because there is no other c2 term in the expression to pair to. This gives us ${c}_{2}^{2}$ d1 d3. However, we now need to multiply this onto d1 d3 or we end up with a stray d1 and d3. But none of the terms are the product of two {di }, so no matter what terms we pair this up with, it will average to zero. This logic applies to any of the terms, so the distribution of the imaginary part will not be skewed. Because of this, statistical tests involving the imaginary part of a power spectrum estimator can be more easily interpreted using symmetric distributions.

Our result here has implications for how one should avoid the noise bias in power spectrum measurements. Two commonly used methods for doing so are to cross-multiply either different identical baselines or different time stamps together. Here we have shown that employing only one of these will incur a skewness. (While our discussion above focused on cross-multiplying different baselines, the same conclusions hold if we consider cross-multiplying more than two groups in time; after all, the indices in our mathematical expressions can simply be considered time-stamp indices instead of baseline indices.) However, if we perform cross-multiplications across both time and baseline axes, the skewness vanishes. To see this, imagine that we split our data into odd and even time samples, labeled with superscripts "o" and "e," respectively. Equation (A2) then becomes

Equation (A7)

and, cubing this expression as before to compute the third moment, one finds no nonzero terms after taking the ensemble average.

Appendix B: Variance of Averaged Power Spectra from Dependent Baseline Pair Samples

In this Appendix, we consider the effect of having common baselines between different baseline pairs used to form power spectra. Inside a redundant baseline group consisting of Nbl different baselines, we can construct up to ${N}_{\mathrm{blp}}=\tfrac{1}{2}{N}_{\mathrm{bl}}\left({N}_{\mathrm{bl}}-1\right)$ different baseline pairs, and we can form a power spectrum using each pair. Consider the averaged power spectrum over these baseline pairs and the variance of this average. The form of the averaged power spectrum is

Equation (B1)

where the sum is over all possible (p, q) pairs of baselines. The variance of the averaged power spectrum does not simply go down with ${N}_{\mathrm{blp}}^{-1}$ because the data being averaged together are not fully independent of each other. For example, P12 and P13 both carry information from baseline number 1.

Let the signal be $\tilde{s}\equiv a+{bi}$, and let ${\tilde{n}}_{p}\equiv {c}_{p}+{d}_{p}i$ and ${\tilde{n}}_{q}\equiv {c}_{q}+{d}_{q}i$ be the noise realizations in the pth and qth baselines. The signal $\tilde{s}$ is identical in each baseline, since we are assuming that we are combining data from identical ("redundant") baselines. The random variables cp , dp , cq , dq , ... are IID normal variables with variance σ2. In the foreground-negligible regime, recall from Equation (17) that the average power spectrum is given by

Equation (B2)

We notice

Equation (B3)

which means that the variance in the real part of $\overline{P}$ is $\tfrac{4{\sigma }^{4}}{{N}_{\mathrm{bl}}\left({N}_{\mathrm{bl}}-1\right)}$. For the imaginary part, we compute

Equation (B4)

so that the variance of the imaginary part of $\overline{P}$ is also $\tfrac{4{\sigma }^{4}}{{N}_{\mathrm{bl}}\left({N}_{\mathrm{bl}}-1\right)}$. Since the number of baseline pairs is given by ${N}_{\mathrm{bl}}\left({N}_{\mathrm{bl}}-1\right)/2$, and 2σ4 is the variance we would expect to get from a single baseline pair, we can see that $\overline{P}$ averages down in a manner that is identical to the scenario where the baseline pairs are independent.

In foreground-dominant regimes, the average power spectrum goes to

Equation (B5)

The variance in the real part is $\tfrac{4({a}^{2}+{b}^{2}){\sigma }^{2}}{{N}_{\mathrm{bl}}}$, and the variance in the imaginary part is $\tfrac{4({N}_{\mathrm{bl}}+1)({a}^{2}+{b}^{2}){\sigma }^{2}}{3{N}_{\mathrm{bl}}\left({N}_{\mathrm{bl}}-1\right)}$. They now go down roughly as ${N}_{\mathrm{blp}}^{-1/2}$ and are larger than the variance from independent samples by factors of $\left({N}_{\mathrm{bl}}-1\right)$ and (Nbl + 1)/3, respectively.

Appendix C: Time-differenced Visibilities as Noise Estimators

In this Appendix, we establish the validity of using time-differenced visibilities as a way to estimate noise error bars. The key idea is that if we form residuals of data vectors xp (ν, t) by subtracting data from the pth baseline in adjacent time bins (t1 and t2) from each other, the result should be noise-dominated. The same holds true for delay-transformed visibilities, where the residual can be written as ${\tilde{n}}_{p}(\tau ,{t}_{2})-{\tilde{n}}_{p}(\tau ,{t}_{1})$. Suppressing τ and demoting the time variable to a subscript for notational brevity, we write ${\tilde{n}}_{p,t}={c}_{p,t}+{d}_{p,t}i$, where cp , dp , ... are IID normal variables with variance σ2. The power spectra constructed from such residuals are

Equation (C1)

From this, we see that

Equation (C2)

This is again the variance expected for a noise-dominated power spectrum. Therefore, what we have shown is that $| \mathrm{Re}({P}_{\mathrm{diff}})| $ can serve as an estimator that in expectation is equal to the correct noise errors for the measured power spectrum ${P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}$ in noise-dominated regimes. However, since this result only holds in expectation, we expect that in practice, it will exhibit considerable scatter as an error estimate.

Appendix D: Signal-dependent Error Bar from Power Spectrum Method

In this Appendix, we derive an expression for the variance on the power spectrum in the presence of foregrounds or systematics (or any "signal"). A similar derivation is presented in Kolopanis et al. (2019). Given two delay spectra ${\tilde{x}}_{1}=\tilde{s}+{\tilde{n}}_{1}$ and ${\tilde{x}}_{2}=\tilde{s}+{\tilde{n}}_{2}$, the power spectra formed from ${\tilde{x}}_{1}^{* }{\tilde{x}}_{2}$ are

Equation (D1)

where we have written $\tilde{s}=a+{bi}$, ${\tilde{n}}_{1}={c}_{1}+{d}_{1}i$, and ${\tilde{n}}_{2}\,={c}_{2}+{d}_{2}i$.

Consistent with the rest of the paper, we assume that a and b are not random variables, so that 〈s〉 = s. The true sky power spectrum is then given by ${P}_{\tilde{s}\tilde{s}}={a}^{2}+{b}^{2}$, and c1, d1, c2, and d2 in the noise parts are IID random normal variables. We then have

Equation (D2)

In the above, we have used the relation $\mathrm{var}({c}_{1}{c}_{2}+{d}_{1}{d}_{2})\,=2\langle {c}_{1}^{2}{\rangle }^{2}={P}_{{\rm{N}}}^{2}$, where PN is the analytic noise power spectrum. We have also used ${P}_{\tilde{s}\tilde{s}}=\langle \mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)\rangle $. This shows that PSN is a general form for error bars in the existence of foregrounds or systematics (or, again, any "signal").

Appendix E: Covariance Method

In this Appendix, we provide more explicit derivations of the expressions quoted in Section 3.4 for the covariance method of error estimation.

E.1. Variance

If ${\hat{P}}_{\alpha }$ is a complex number representing a power spectrum estimate of the αth bandpower, its real and imaginary parts are given by $\tfrac{1}{2}({\hat{P}}_{\alpha }+{\hat{P}}_{\alpha }^{* })$ and $\tfrac{1}{2i}\left({\hat{P}}_{\alpha }-{\hat{P}}_{\alpha }^{* }\right)$, respectively. The variance in the real part of ${\hat{P}}_{\alpha }$ is

Equation (E1)

while the variance in the imaginary part of ${\hat{P}}_{\alpha }$ is

Equation (E2)

Recall that ${\hat{P}}_{\alpha }$ is defined as ${\hat{P}}_{\alpha }={{\boldsymbol{x}}}_{1}^{\dagger }{{\boldsymbol{E}}}^{12,\alpha }{{\boldsymbol{x}}}_{2}={\sum }_{{ij}}{{\boldsymbol{x}}}_{1,i}^{* }{{\boldsymbol{E}}}_{{ij}}^{12,\alpha }{{\boldsymbol{x}}}_{2,j}$. We define three sets of matrices containing all of the two-point correlation information for the complex estimator C 12, U 12, and G 12, such that

Equation (E3)

Equipped with these definitions, we can generate the following equations

Equation (E4)

Equation (E5)

and

Equation (E6)

where ${{\boldsymbol{E}}}_{{ij}}^{12,\alpha * }={{\boldsymbol{E}}}_{{ji}}^{21,\alpha }$. Setting α = β in these equations then allows us to evaluate Equations (E1) and (E2).

E.2. Covariance

The covariance between the real part of ${\hat{P}}_{\alpha }$ and the real part of ${\hat{P}}_{\beta }$ is

Equation (E7)

and the covariance between the imaginary part of ${\hat{P}}_{\alpha }$ and the imaginary part of ${\hat{P}}_{\beta }$ is

Equation (E8)

These can be evaluated in the same way as the variances above.

Appendix F: Skewness in Distributions of Power Spectra at Intermediate Delays

In this Appendix, we consider the PDFs of power spectra where neither signals (e.g., foregrounds) nor noise are dominant and both must be considered. Using the same notation as Appendix D, the power spectra formed from ${\tilde{x}}_{1}=\tilde{s}+{\tilde{n}}_{1}$ and ${\tilde{x}}_{2}=\tilde{s}+{\tilde{n}}_{2}$ are

Equation (F1)

Note that a and b are constants and c1, d1, c2, and d2 are IID randomly normal variables. For the real part of ${P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}$, we have

Equation (F2)

After subtracting from the mean, its third moment is

Equation (F3)

This nonvanishing third moment implies that the probability distribution of the power spectra is skewed. This skewness disappears for either signal- or noise-dominated cases. These results are evident in the histograms shown in Figure 3.

Appendix G: Probability Distribution for an Incoherent Sum of Delay Transform–Estimated Power Spectra

In this Appendix, we derive the probability distribution for noise in a power spectrum that has been formed by the incoherent (i.e., after squaring) averaging of power spectra from individual time integrations. The resulting probability distribution is used in Figures 79 to validate our error bar methodology.

For a noise-dominated delay power spectrum estimate, the power spectrum value u measured at one instant in time is distributed as a double exponential,

Equation (G1)

where it is assumed that the power spectra are estimated by cross-correlation—thus eliminating noise bias—and where σ is the standard deviation on the resulting power spectrum.

Now suppose we average together a number of these power spectra. Let the power spectrum value at the ith time step be given by ui . The average value is then

Equation (G2)

where {wi } are a set of weights. Note that the error on each xi may be different, so we define

Equation (G3)

We now write down the probability distribution p+ (z) for z. First, we define yi wi ui , such that

Equation (G4)

With this notation, z = ∑i yi , and we can write down z by using the fact that the probability distribution of a sum of two random variables is the convolution of their individual distributions. By the convolution theorem, this is equivalent to multiplying the Fourier transforms of the individual probability distributions ${\widetilde{p}}_{i}(k)$, and thus

Equation (G5)

where we have used the fact that in our case, ${\widetilde{p}}_{i}(k)={\left(1+{w}_{i}^{2}{\sigma }_{i}^{2}{k}^{2}/2\right)}^{-1}$. This integral can be evaluated by contour integration, giving

Equation (G6)

This is a weighted sum of the double exponential distributions, and the curves in Figures 79 labeled "sum of Laplacians" are the plots of this formula.

In closing, we note one peculiarity about this derivation: our contour integration assumed that none of the wi σi values were exactly equal. In principle, this is a reasonable assumption, since for a drift scan telescope that is sky noise–dominated, the noise power is continually changing from one time integration to the next. In practice, however, if this change is happening slowly, two adjacent time integrations may have similar enough noise properties to make Equation (G6) numerically problematic. If this is indeed the regime that one is in, it is advisable to instead use an approximate expression by letting $\sqrt{2}/{w}_{i}{\sigma }_{i}\equiv \kappa +{\varepsilon }_{i}$ and then Taylor expanding to leading order in εi .

Footnotes

  • 22  
  • 23  
  • 24  
  • 25  
  • 26  
  • 27  

    In addition to mapping the arguments of P, there is also an additional multiplicative constant; see Liu et al. (2014a) for explicit expressions.

  • 28  

    We stress that our analysis does not cease to apply at a certain delay; it is simply the case that at high delays, there is less of a pressing need to construct detailed models for foreground subtraction, which to some extent mitigates the need to consider the complicated statistical properties of this subtraction. It is likely that our formalism can be generalized to encompass some foreground subtraction, but detailed work beyond the scope of this paper would be necessary. As an example, suppose one were to use information at τ = 0 and an instrument model to subtract off leakage from other low (but nonzero) delay modes. In such a scenario, one would need to account for the fact that the noise contributions between different delay modes are now coupled. This can, in principle, be accommodated with appropriate covariance matrix modeling, but we leave this to future work.

  • 29  

    We stress that this scatter/error is still due to instrumental noise and not the variance of the signal term. Even for a perfectly constant and known signal, the presence of the cross term alters the uncertainty, essentially having the signal term act as a multiplicative amplifier for noise fluctuations.

  • 30  

    Here we derived the correction factor $\sqrt{1/\sqrt{\pi }+1}-1\approx 0.251$, assuming that $\mathrm{Re}\left({P}_{{\tilde{x}}_{1}{\tilde{x}}_{2}}\right)$ follows a Gaussian distribution. This is appropriate, assuming that enough power spectra formed from data at different times have been incoherently averaged together for the central limit theorem to apply (we will examine this point further in Section 4.1). For a single snapshot in time, the measured power spectrum follows a Laplacian distribution (again, see Section 4.1), and the correction factor becomes $\sqrt{3/2}-1\approx 0.225$. Since the difference is small and, in practice, we operate in the Gaussianized regime anyway, we use $\sqrt{1/\sqrt{\pi }+1}-1$ in our definition.

  • 31  
  • 32  
Please wait… references are loading.
10.3847/1538-4365/ac0533