An Improved Statistical Point-source Foreground Model for the Epoch of Reionization

, , and

Published 2017 August 4 © 2017. The American Astronomical Society. All rights reserved.
, , Citation S. G. Murray et al 2017 ApJ 845 7 DOI 10.3847/1538-4357/aa7d0a

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/845/1/7

Abstract

We present a sophisticated statistical point-source foreground model for low-frequency radio Epoch of Reionization (EoR) experiments using the 21 cm neutral hydrogen emission line. Motivated by our understanding of the low-frequency radio sky, we enhance the realism of two model components compared with existing models: the source count distributions as a function of flux density and spatial position (source clustering), extending current formalisms for the foreground covariance of 2D power-spectral modes in 21 cm EoR experiments. The former we generalize to an arbitrarily broken power law, and the latter to an arbitrary isotropically correlated field. This paper presents expressions for the modified covariance under these extensions, and shows that for a more realistic source spatial distribution, extra covariance arises in the EoR window that was previously unaccounted for. Failure to include this contribution can yield bias in the final power-spectrum and under-estimate uncertainties, potentially leading to a false detection of signal. The extent of this effect is uncertain, owing to ignorance of physical model parameters, but we show that it is dependent on the relative abundance of faint sources, to the effect that our extension will become more important for future deep surveys. Finally, we show that under some parameter choices, ignoring source clustering can lead to false detections on large scales, due to both the induced bias and an artificial reduction in the estimated measurement uncertainty.

Export citation and abstract BibTeX RIS

1. Introduction

The Epoch of Reionization (EoR) is a key era in the evolution of the universe, in which the predominant neutral hydrogen component of the intergalactic medium is reionized due to the emerging activity of luminous sources. The fundamental importance of this epoch, as a bridge between large-scale cosmology and galaxy evolution, has solidified its statistical detection as a primary science goal of several current and upcoming instruments, such as the MWA (Lonsdale et al. 2009; Tingay et al. 2013; Jacobs et al. 2016), PAPER (Parsons et al. 2009), LOFAR, (van Haarlem et al. 2013; Patil et al. 2016), the LWA (Ellingson et al. 2009), HERA (DeBoer et al. 2016), and the SKA (Koopmans et al. 2015).

While optical surveys of the scant luminous sources of the EoR remain an interesting pursuit, a great deal of attention has been focused on a statistical detection of the spatial signature of 21 cm emission arising from the patchy reionization process. Statistical detection, as opposed to the direct imaging that may be achievable with the SKA (Koopmans et al. 2015), offers the benefit of increasing detection signal-to-noise, but also provides a cosmological window on structure formation and evolution. This emission pattern, redshifted into the low-frequency radio regime, offers a unique and powerful window into the heart of the interaction between large-scale-structure evolution and the emergence of luminous objects. However, even statistical detection is beset with challenges due to systematic contamination of the signal with foreground emission (both Galactic and extragalactic), instrumental effects, and other distortions (such as the effect of the ionosphere; Jordan et al. 2017). It is commonly estimated that the foreground emission is 4–5 orders of magnitude brighter than the prized EoR signal (Morales & Wyithe 2009).

It is no surprise then that much of the work being undertaken to detect the EoR concerns the unravelling of foreground emission from the true EoR signal. Commonly, this process takes one of three directions: the foregrounds can be removed, avoided, or suppressed. A useful diagnostic tool in this regard has been the 2D cylindrical power spectrum (PS), and it also aids understanding these three general methods.

While the primary statistic employed for characterization of the spatial EoR signature is the 1D isotropic PS of H i brightness temperature fluctuations, P(k), the 2D PS is a useful intermediate product. The cylindrical PS bins modes in the plane of the sky, ${k}_{\perp }$, and modes parallel to the line of sight, ${k}_{| | }$. This is important, because radio telescopes probe these modes in vastly different ways. Perpendicular modes are dictated by baseline lengths and beam patterns, whereas parallel modes are dictated by frequency-based processes. This extends to the foreground emission, whose spectral structure (and thus parallel-mode structure) is entirely different from its angular distribution on the sky. Thus, different systematic effects can be localized and dealt with on a 2D PS map.

Of particular note is the fact that the typical spectral structure for foreground sources is smooth and power-law-like. This confines the majority of foreground power to large parallel scales (low ${k}_{| | }$). This feature opens up the way for the three aforementioned methods. The foregrounds can be avoided by only averaging over high-${k}_{| | }$ modes (the so-called EoR "window"), in which the foreground power is low, but this may omit the incorporation of much useful information. Furthermore it presumes we understand precisely where the foregrounds stop and the EoR window starts. The foregrounds can be removed by attempting to fit each detected source with some spectral model (either with parametric models [e.g., Bowman et al. 2009; Liu et al. 2009], or with a blind component analysis [e.g., Chapman et al. 2016]), but this runs the risk of residuals imitating the EoR signature. The foregrounds could otherwise be suppressed, by assigning appropriate weights to modes in which the foregrounds are expected to dominate (e.g., Liu & Tegmark 2011). Finally, some mixture of the approaches could be utilized.

In this paper we focus on developing statistical foreground models for use in a hybrid removal-suppression scheme, namely the Cosmological H i Power Spectrum estimator (CHIPS; Trott et al. 2016, hereafter T16). While this scheme utilizes foreground removal for the very brightest sources, for which accurate models can be fitted, it relies on a statistically consistent down-weighting mechanism to extract maximum information from the interferometric visibilities. This has the advantage of statistically accounting for all sources in the field of view, even those well below the detection threshold. It achieves this through the inverse-covariance matrix of the modes, which describes not only the uncertainty on a mode, but its correlation with others.

Naturally, such a scheme requires a model for the distribution of foreground emission in order to derive its covariance. While the full model comprises Galactic emission and that of extragalactic sources, these are additive, and we focus here on the extragalactic point-source model. Our reason for doing so is purely pragmatic—realistic models of the Galactic emission (including spatially correlated structure) have already been proposed (e.g., Jelić et al. 2010, T16), and the varying components are uncorrelated, providing opportunity to focus on each individually. The model need only be statistical, and thus our aim in this paper is to present an improved statistical point-source foreground model in the context of the covariance of the point-source contribution to interferometric visibilities.

While the CHIPS scheme is quite sophisticated, several of its components are quite simplistic. In particular, it assumes both a single-power-law source count distribution (as a function of flux density) and a uniform Poisson spatial distribution of sources on the sky. The primary aims of this paper are to (i) propose generalizations of these simplistic models, (ii) derive the modified foreground covariance under these generalizations, and (iii) estimate the effect that ignoring these generalizations has on the averaged 1D PS.

While we address in some measure all four of the relevant foreground model components—the instrumental beam, the point-source spectral-energy distribution (SED), the source count distribution and the spatial structure—the latter-most component receives the most attention. This is due in part to the relative insensitivity of the results to other components, but also to the additional complexity a more complex spatial structure induces.

We note that previous work in the literature has broached the subject of spatially correlated foreground point sources. In particular, the simulations of Jelić et al. (2010), which include simple point-source clustering, are commonly used as test-beds for foreground mitigation schemes (e.g., Chapman et al. 2014, 2016), while more physically motivated simulations are used in Sims et al. (2016). However, the relatively small set of simulations presented are unable to yield the covariance of foregrounds on all relevant scales. In any case, an analytic model for the covariance is helpful, as it is more efficient and aids the understanding of the features in the model. An analytic treatment of point-source covariance in the presence of clustering was given in Liu & Tegmark (2011); however, their results apply to the covariance in real-space. We extend the analysis to the covariance of visibilities in Fourier space, which is the natural space for interferometric observations, and in which we can include the chromaticity of the instrument and other realistic effects.

The paper is structured as follows. After some notational preliminaries in Section 2, we will review the parts of the CHIPS scheme applicable to the point-source foreground covariance in Section 3. From there we turn to each of our target simplifications in turn—the source count distribution in Section 4 and the spatial distribution in Section 5. These sections develop the necessary mathematical machinery required to determine expected effects of our extensions with respect to synthetic data, which we present in Section 6, before offering some concluding remarks in Section 7.

We note that all plots in this paper were produced with matplotlib v2.0.0 (Hunter 2007), with frequent use of the numpy, scipy (Perez & Granger 2007), and ipython (van der Walt et al. 2011) libraries.

2. Notation and Preliminaries

2.1. Notation

Throughout, we use bold lowercase characters (Latin or Greek) to denote vectors (e.g., ${\boldsymbol{u}}$ or ${\boldsymbol{\theta }}$), while bold uppercase characters denote matrices (notably the covariance matrix ${\boldsymbol{C}}$). Statistical variables will be denoted by uppercase characters with an over-tilde (e.g., $\tilde{{ \mathcal N }}$). Furthermore, sample means will be denoted by angled brackets (e.g., $\langle \tilde{V}\rangle $), whereas population means will be denoted with an over-bar (e.g., $\bar{n}$). Variance and covariance operators will be denoted by $\mathrm{Var}$ and $\mathrm{Cov}$, respectively.

2.2. Fourier Transforms

Fourier transforms (hereafter FTs) are defined with an unfortunate multiplicity of conventions in the literature, and in this paper it is convenient to use a more flexible definition. Following the parameterization of the mathematica software3 , the continuous n-dimensional FT forward/inverse pair can be respectively written as

Equation (1)

Equation (2)

where we shall denote the leading factors in each by ${{\rm{\Psi }}}_{a,b}^{+}$ and ${{\rm{\Psi }}}_{a,b}^{-}$, respectively. Here a and b can be chosen arbitrarily, and fully specify the Fourier convention. Typically, interferometry uses the convention $(a,b)=(0,2\pi )$; however, we shall see that it can be useful to keep these values arbitrary. Throughout, we use the operator ${{ \mathcal F }}_{a,b}$ to denote a general continuous FT with specified convention, and reserve hat notation (e.g., $\widehat{W}$) explicitly for an FT with the $(0,2\pi )$ convention.

In numerical calculations, only a discrete sampling of points is recorded, and the continuous FT must be approximated by a discrete FT (DFT). Such an operation can be conveniently handled using linear algebra, in which the 1D FT of a vector ${\boldsymbol{x}}$ is written $\psi {\boldsymbol{Fx}}$. Here, ψ is a normalization that we shall specify in a moment, and ${\boldsymbol{F}}$ is a unitary Vandermonde matrix:

Equation (3)

with N the length of the data vector ${\boldsymbol{x}}$. The matrix ${\boldsymbol{F}}$ has the property ${{\boldsymbol{F}}}^{\dagger }{\boldsymbol{F}}={{\boldsymbol{FF}}}^{\dagger }={\boldsymbol{I}}$, where ${{\boldsymbol{F}}}^{\dagger }$ encodes the inverse Fourier operation.

We typically require the DFT to approximate a continuous FT, and the two can be related by requiring that

Equation (4)

Equation (5)

where L is the physical length of the discrete sample. The physical modes, k, that are measured in such a transform are

Equation (6)

which relates the size of the Fourier-space box to the real-space box by ${L}_{k}=2\pi N/L$.

2.3. Interferometry

Unless otherwise stated, we reserve the two-vector ${\boldsymbol{u}}$ for baseline displacement in units of wavelength,

Equation (7)

Likewise, we reserve the two-vector ${\boldsymbol{l}}\in (-1,1)$ for the sky coordinate

Equation (8)

with ${\boldsymbol{\theta }}$ the angle between the coordinate and zenith.

With these, in the flat-sky approximation, which we will generally assume throughout for simplicity, an interferometric visibility is defined as the Fourier transform of the beam-attenuated sky brightness,

Equation (9)

where S is here the total flux density in a given direction, B is the frequency-dependent beam attenuation, and the integral is over the entire valid space of ${\boldsymbol{l}}$ (i.e., $| {\boldsymbol{l}}| \leqslant 1$).

2.4. Cosmology

The cylindrical PS is defined at cosmological modes perpendicular, ${k}_{\perp }$, and parallel, ${k}_{| | }$, to the line of sight. The units of k throughout are $h{\mathrm{Mpc}}^{-1}$, where h is the Hubble parameter. We detail the conversion of radio astronomy units (Jy, Hz, sr) to cosmological units (mK, Mpc, h) in the Appendix.

For all cosmological calculations in this paper, we use the parameters from Planck Collaboration (2016; i.e., a flat universe with h = 0.677 and ${{\rm{\Omega }}}_{m}=0.307$).

3. The CHIPS Framework

The basic idea behind the CHIPS algorithm is threefold:

  • 1.  
    Individually model and peel all sources above some brightness threshold, ${S}_{\max }$.
  • 2.  
    Using statistical models of the remaining foregrounds, estimate the covariance between 3D modes of the measured PS.
  • 3.  
    Consistently down-weight modes using their inverse covariance (Liu & Tegmark 2011) in the process of final averaging to a spherically symmetric 1D PS.

In this section, we briefly review the mathematical framework presented in T16 concerning the latter two of these steps, in order to contextualize the remainder of the modeling in this paper.

3.1. Inverse-covariance Formalism

We first note that here we ignore the edge-gaps present in the band-pass of the MWA—a problem dealt with explicitly in T16 using least-squares-signal analysis. Our purposes in this paper are tangential to such instrument-dependent subtleties; thus for the sake of simplicity we assume a perfectly sampled spectral window. This allows us to replace the operator ${ \mathcal H }$ found throughout Section 4 of T16 with the corresponding raw Fourier kernel matrix ${\boldsymbol{F}}$ (cf. Section 2.2).

The CHIPS model only accounts for covariance between frequency modes, neglecting covariance between on-sky modes, and we follow suit here. In this case, we assume that the distribution of flux densities on the sky for any given frequency forms a (potentially correlated) Gaussian distribution, and the visibilities (see Equation (9)) as a function of frequency are drawn from a complex normal,

Equation (10)

where ${\boldsymbol{C}}$ is the foreground covariance between visibilities on the same baseline at varying frequencies, which we will attempt to model in this paper, and henceforth denote as ${{\boldsymbol{C}}}_{\mathrm{FG}}$.

A final FT over ν must be performed to generate the visibility in Fourier space, from which the final PS is formed. We assume that all unit conversions from the natural units to cosmological units (cf. the Appendix) will be handled in a final step, and thus we find that the normalization required to approximate the continuous FT here is $\psi =d\nu $ (i.e., the frequency bin width).

Using the identity that if ${\boldsymbol{y}}={\boldsymbol{Fx}}$ then $\mathrm{Cov}({\boldsymbol{y}})\,={{\boldsymbol{F}}}^{\dagger }\mathrm{Cov}({\boldsymbol{x}}){\boldsymbol{F}}$, we find that the covariance of Fourier-space visibilities is $\mathrm{Cov}({V}_{u}(\eta ))=d{\nu }^{2}{{\boldsymbol{F}}}^{\dagger }{{\boldsymbol{C}}}_{\mathrm{FG}}{\boldsymbol{F}}$. Since the FT of a Gaussian random variable is also Gaussian, the distribution of ${V}_{u}(\eta )$ is entirely known. For a Gaussian variable, the covariance of its square is twice the square of its covariance, so we may write

Equation (11)

We will use the diagonal of this quantity (i.e., the variance) to determine the expected signal-to-noise of a given mode.

Using a maximum-likelihood (ML) estimator, the estimate of the 1D Pk is given by (cf. Equation (44) of T16)

Equation (12)

where D represents the operation of binning 2D modes into 1D annuli, and the syntax, i, denotes a projection on the bin including parameter covariances. Clearly, the normalization factor $2d{\nu }^{4}$ is cancelled in the averaging.

In principle, the entire framework is dependent only on an accurate model for the covariance of visibilities, ${{\boldsymbol{C}}}_{\mathrm{FG}}$, and we now turn to deriving this quantity.

3.2. The CHIPS Point-source Foreground Model

In the absence of redshift information, sources retain two key properties at a given frequency ν—a flux density $S(\nu )$ and a 2D sky position ${\boldsymbol{l}}$. Thus the statistical description of the sources is completely defined by three functions: (i) a source count distribution, ${dN}/{dS}(\nu )$; (ii) a SED, $S(\nu )$; and (iii) a spatial distribution, ${dn}/d{\boldsymbol{l}}(\nu )$.

Current models for each of these functions are surprisingly simple. The source count distribution is assumed to be a power-law:

Equation (13)

where measurements at ${\nu }_{0}\sim 150\,\mathrm{MHz}$ (Intema et al. 2011) give ${\alpha }_{{\nu }_{0}}\sim 4100\,{\mathrm{Jy}}^{-1}\,{\mathrm{sr}}^{-1}$ and $\beta \sim 1.59$. The SED of every source is assumed to be a power law over the frequencies covered in a single observation:

Equation (14)

with a uniform value of $\gamma =0.8$ at ${\nu }_{0}\sim 150\,\mathrm{MHz}$. Finally, the spatial distribution is taken to be a Poisson process (i.e., a statistically uniform distribution across the sky). We now turn to reviewing the derivation of the covariance of interferometric visibilities from these point-source foregrounds alone (as performed by T16).

3.3. The CHIPS Point-source Foreground Covariance

Recall that the measured visibility is given by

Equation (15)

where we have now indicated that the visibility is a random variable, due to the inclusion of the random sky brightness ${\tilde{S}}_{T}$. According to the simple models introduced in the previous section, ${\tilde{S}}_{T}$ is given by

Equation (16)

where ${f}_{0}=\nu /{\nu }_{0}$, and ${\nu }_{0}$ is the reference frequency for the physical models. A subtle point that entered into Equation (16) is that the integration limit involves a peeling flux ${S}_{\max }$ at a given frequency. The peeling flux is some flux density above which sources in the field are individually modeled and subtracted. In practice, the sources that are peeled are determined by their mean flux density over the bandwidth of the measurement. This implies that the number of sources in the field is identical at all flux densities (i.e., no source is peeled in one channel but not another). Assuming a universal SED for all sources, this means that the peeling limit is simply modified according to the SED. Hereafter we shall denote ${S}_{\max }{f}_{0}^{-\gamma }$ simply as ${S}_{\max }^{\nu }$.

Another subtlety that requires explanation is to notice that an interferometric visibility measurement over a given frequency bandwidth ${\rm{\Delta }}\nu $ will by necessity evaluate $\tilde{V}$ at a single vector ${\boldsymbol{u}}$, though in detail at each frequency, the physically measured ${\boldsymbol{u}}$ vary slightly due to Equation (7), where ${\boldsymbol{x}}$ are the constant physical baselines. Thus we arbitrarily choose a reference frequency within our bandwidth—nominally ${\nu }_{\mathrm{low}}$, the lowest measured frequency—and derive visibilities for each frequency in terms of the reference scales ${\boldsymbol{u}}:{{\boldsymbol{u}}}_{\nu }=\nu {\boldsymbol{u}}/{\nu }_{\mathrm{low}}\equiv f{\boldsymbol{u}}$.

Including this information into the visibility yields

Equation (17)

We now set out to evaluate the covariance of visibilities between frequencies, which we will denote as ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Poiss}}$. We approach this rather carefully, and in a more general manner than typically necessary so as to create a framework for our derivations in Section 5. To begin, we note the statistical identity

Equation (18)

and that our double-integral is an example of such a sum of variables. In particular, the sum is over small voxels in the space of $({\boldsymbol{l}},S)$, and thus the covariance is the sum of covariances of all pairs of voxels. We begin by imagining the voxel grid at ${\nu }_{0}$ and noting that each voxel contains a number of sources

Equation (19)

At a different frequency $\nu ^{\prime} $, each source is scaled equivalently by the universal SED ${{f}_{0}^{\prime }}^{-\gamma }$. We imagine scaling the size of each voxel by the same factor so that the number of sources in each voxel remains constant, but the brightness of the voxel is scaled such that ${S}_{i}^{\prime }={{f}_{0}^{\prime }}^{-\gamma }{S}_{i}$. The covariance between arbitrary voxels i and j is then

We can extract the deterministic terms by noting that $\mathrm{Cov}({aX},{bY})={{ab}}^{\dagger }\mathrm{Cov}(X,Y)$, to retrieve

Equation (20)

Since the counts are independently Poisson distributed, the final covariance factor simply reduces to $\bar{N}{\delta }_{{ij}}$, with ${\delta }_{{i}}$ the Kronecker-delta, and the mean counts given in Equation (19). The double-sum over these covariance pairs reduces to a single sum due to the ${\delta }_{{ij}}$, and we abuse notation by re-using i and j for spatial and flux density coordinates to arrive at

Equation (21)

where ${f}_{\nu }=f^{\prime} -f^{\prime\prime} $. Finally, we allow the voxels to become infinitesimal, yielding

Equation (22)

where ${\mu }_{n}$ is the nth moment of the source count distribution:

Equation (23)

It is worth mentioning some features that this model prediction contains. First, the form of the source counts enters purely in a separate integral over flux density, and therefore their form affects the normalization of the covariance, but not its structure (either in ${\boldsymbol{u}}$ or ν). Since the inverse-covariance weighting method of foreground suppression is only sensitive to differences in weights across these scales, it is thus insensitive to the form of the source counts. We note that this strictly arises due to our assumption of a universal SED, which in reality is unjustified. We review this assumption in Section 5.6.1 and find that its influence is minimal.

We also note that while there is a smooth frequency-dependence in the factor ${({f}_{0}^{\prime }{f}_{0}^{\prime\prime })}^{-\gamma }$, which may give rise to power at very low ${k}_{| | }$, there is a more complex interaction between frequency and ${\boldsymbol{u}}$ in the exponent of the FT of the beam, and it is this interaction that gives rise to the so-called foreground "wedge." Finally, the variance of visibilities is simply the brightness-normalized integral over the beam squared:

Equation (24)

which is clearly a constant.

Before proceeding, we note two approximations that can be (and are) made in this formalism that might otherwise escape notice. First, we have already covertly approximated the spherical sky as a flat Euclidean space, by equating a uniform distribution of sources across the sky with a uniform distribution in ${\boldsymbol{l}}$. This is really just approximating ${\boldsymbol{l}}$ as equal to ${\boldsymbol{\theta }}$, which is a valid approximation close to the zenith. To further utilize this approximation in order to simplify the covariance integral, one may assume that ${\boldsymbol{l}}$ is an infinite Euclidean space (i.e., it has no boundary at $| {\boldsymbol{l}}| =1$). While this is clearly unphysical, the presence of the attenuating beam in the integral ensures that for realistic telescopes, the contribution of high-$| {\boldsymbol{l}}| $ patches is negligible. Thus the approximation is valid. We follow the CHIPS methodology in adopting these approximations for ease of comparison, while noting that a more rigorous derivation would involve spherical harmonics (Shaw et al. 2014; Liu et al. 2016). We leave such a derivation to future work.

Under these approximations, if we further assume that the beam is circularly symmetric, the integral becomes a Hankel transform,

Equation (25)

where J0 is the zeroth-order Bessel function of the first kind.

Throughout this paper, we shall predominantly consider toy models that employ a circularly symmetric frequency-dependent Gaussian beam, $B(r)=\exp (-{r}^{2}/2{\sigma }_{\nu }^{2})$, for which the final integral can be easily solved, to give

Equation (26)

where

Equation (27)

We note that while we use this simple form to illustrate features of our model in this paper, in practice the more general Equation (22) is used.

4. A Generalized Source Count Model

In the Poisson covariance model derived in T16 and examined in the previous section, the form of the source counts are important in setting the overall amplitude of the foreground covariance (and thus the expected foreground power). An incorrect model will mis-estimate the global level of power in foregrounds, and therefore provide inaccurate error-bars on the final 1D PS. Thus using a high-fidelity source count model is somewhat important; however, in terms of the current framework, ${\mu }_{2}$ is dominantly affected by bright sources, unless the faint sources are much more abundant than we expect. Since we can model bright sources accurately, it would appear that our current model is entirely adequate.

We find, however, that the introduction of clustering introduces a role for the source counts in modifying the structure of the covariance (cf. Section 5). In this case, an accurate source count model will be required to a much fainter limit, and thus developing a simple and well-defined model for the source counts becomes important.

The single-power-law model for source counts is simple, but gives a remarkably good description of observations over observed flux density ranges at low frequencies (Franzen et al. 2016; Williams et al. 2016). However, it is precisely this well-observed range that we expect to be peeled from the visibilities before an EoR analysis. Accordingly, it is the low flux density regime we are interested in modeling. At the lowest flux densities, observations hint at a break in the power law, situated at $S\sim 6$ mJy (Williams et al. 2016). This break is thought to be the result of a transition between the regime in which AGN dominate the counts, to that of star-forming galaxies. In addition, at some yet-fainter threshold another break must occur, in which the source counts turn over, to ensure a finite number of sources in the universe (i.e., the low-S slope must be $\gt -1$). It is likely that this turnover is quite sharp, and corresponds to the inability to form galaxies in low-mass halos.

In this section, we briefly outline a more general model for the source counts—namely an arbitrarily broken power law—flexible enough to approximately describe most possible future measurements.

4.1. The Broken Power-law Model

Our choice to use a broken power law is primarily driven by simplicity. While it offers discontinuous derivatives, and therefore is a poor choice in terms of a physical distribution, we expect that the statistical properties of the true distribution can be arbitrarily well-recovered by a broken power law, while at the same time presenting as an easily integrable function. It may be argued that a combination of various power-law populations is better modeled as a weighted sum of different power laws, to which we in principle agree. However, the broken power-law model is better able to deal with anomalies in this structure, and furthermore has precedent in the literature (e.g., Franzen et al. 2016). In any case, any viable parameterization of the source counts is able to be swapped in to our formalism quite effortlessly, and the broken power-law provides a useful first look at some of the broad effects of the source counts.

Figure 1 shows a schematic representation of the broken power-law model. The main points of interest are (i) a "peeled" region at high flux density for which we are reasonably confident of a precise sky model that is subtracted from the visibilities, and for which the lower limit (at ${\nu }_{0}$) is Smax; (ii) there are m regions below this peeling limit; (iii) regions are labeled in order of decreasing flux density, and each is defined by two parameters, the normalization α and the slope β; and (iv) an extra "break" occurs at the physical turnover of the source counts, ${S}_{\min }$.

Figure 1.

Figure 1. Schematic of the broken power-law source count model, with m = 3.

Standard image High-resolution image

In practice, we approximate the lower turnover as a sharp cut, and thus the source counts with m regions can be written

Equation (28)

To preserve continuity of the source counts, we define ${\alpha }_{i}$ (for $i\gt 0$) as

Equation (29)

with ${\alpha }_{0}\equiv \alpha $ given by the bright observations.

In this model, we find

Equation (30)

It is not particularly informative to consider the effects of the generalized model on ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Poiss}}$, and so we will return to their effects in the next section.

5. Including Source Clustering

It is well-known that extragalactic sources are clustered, as tracers of the underlying cosmological density field (e.g., Peebles 1971; Percival et al. 2001; and for radio-galaxy examples, see Blake et al. 2004; Wake et al. 2008). Though this clustering occurs in 3D, it is projected onto the sky, creating a predictable angular clustering signal.

It is expected that this clustering will modify the foreground-contributed H i power at scales ${k}_{\perp }$ at which the source over-density power is non-zero. Thus correctly modeling the effects of cosmological clustering may be an important element of an accurate EoR analysis. We note that incorporation of source angular clustering into a foreground covariance model has previously been performed in Liu & Tegmark (2011), who nevertheless follow a slightly different formalism and restrict themselves to Gaussian correlations.

We approach this problem in a very general manner, for an arbitrary point-source two-point distribution $P({\boldsymbol{u}})$. We note that our comments in Section 3.3 concerning approximation of the curved ${\boldsymbol{l}}$-space as infinite and Euclidean will also be employed here, as we compare and contrast our improved model to that presented in CHIPS. Explicitly, we approximate the PS of sources as a function of u, without recognizing its curved nature. As such, our pseudo PS is what one would measure if all sources were expressed in ${\boldsymbol{l}}$ coordinates, and these coordinates treated as ordinary Euclidean. This results in a covariance function of sources that is a function of patch separation, expressed in terms of ${\boldsymbol{l}}$. Aberrations from this approximation occur only far from beam-center, for which the contribution to the visibility is negligible. A more rigorous derivation in terms of spherical harmonics will be forthcoming in future work.

Our high-level approach is to calculate the covariance of number counts within infinitesimal sky patches due to cosmological clustering, and then apply that to the covariance of the visibilities through a similar formalism as employed in Section 3.2.

5.1. Visibilities with Clustered Sources

If a Gaussian field has an over-density power spectrum P(u), with units $\mathrm{sr}$, then a realization of its real-space density field can be evaluated by populating a u-space grid with random complex numbers drawn from a standard normal distribution with uniformly distributed phase, multiplying by the square root of the PS, and then performing an inverse FT (Coles & Jones 1991).

Let us for a moment assume that the distribution of flux density on the sky is produced in this manner. That is, the fluctuations of $\tilde{S}$ are very close to Gaussian (which seems to be cosmologically justified), and that Poisson scatter of individual sources has a negligible effect compared to the clustering itself (we shall amend this assumption soon). Then the field $S({\boldsymbol{l}})$ is given by

Equation (31)

where Ω is the area of integration, $\tilde{{ \mathcal N }}\sim \mathrm{Normal}(0,1)$, and $\tilde{X}\sim \mathrm{Uniform}[0,2\pi )$.

Instead of completely ignoring the Poisson scatter, we may assume that its effects dominate in the second term, but are negligible in the first, in which the other random variables dominate. We will test this assumption in the next section, but here just assert it:

Equation (32)

Finally, we recall that a visibility is the FT of the beam-attenuated sky brightness with kernel modulated by f0, yielding

Equation (33)

5.2. Statistics of Visibilities

Using Equation (33) as the basis for a stochastic visibility, we may proceed to derive the statistics of the visibility—specifically its expectation and covariance between frequencies.

While the formula for the visibility has grown considerably more complex as compared to Equation (9), the expected visibility remains the same—that is, the FT of the beam normalized by mean flux density—because the first term in Equation (33) has a mean of zero (due to the $\tilde{{ \mathcal N }}$). This is intuitively understood because an average over realizations of clustered foregrounds will change phase uniformly, and therefore average to zero.

Our primary goal is the covariance of the visibility. We first note that the second term in Equation (33) is precisely the same as the uniform-sky visibility, Equation (9), so that clustering is an additive modification. Furthermore, covariance between the first and last term is zero, since the random part of each is independent (in fact, even if we had not used the deterministic $\bar{S}$ in the first term, and had rather kept it as the more general random variable found in the second term, the covariance is still zero). Thus we have

Equation (34)

where

Equation (35)

Now, the outer FT can be pushed through, by using the convolution theorem, and the fact that ${{ \mathcal F }}_{\mathrm{0,2}\pi a}f(l)=(1/a)\hat{f}(l/a)$

Equation (36)

The convolution is again just a sum, so the covariance is the sum of covariance of all pairs (we also make the identification that $\bar{S}\equiv {\mu }_{1}$):

Equation (37)

Since each draw of $\tilde{{ \mathcal N }}$ is independent, the covariance term remaining here must be $\delta ({f}_{0}^{\prime }{u}_{1}-{f}_{0}^{\prime\prime }{u}_{2})/{\rm{\Omega }}$, where the solid angle factor ensures that the covariance is dimensionless as it ought to be. So, similarly to the Poisson case, the double-integral collapses to a single integral:

Equation (38)

where we have set $p={f}_{0}^{\prime }/{f}_{0}^{\prime\prime }$. This is the most general form for the clustering covariance term that can be derived.

A few remarks are appropriate. First, the overall source count-based normalization is different (${\mu }_{1}^{2}$) in this term than in ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Poiss}}$. Second, there is again a complex interaction between frequencies and ${\boldsymbol{u}}$, which will bring more structure into the 2D PS than the purely Poisson covariance. The form of Equation (38) is rather complicated, but for the variance, it simplifies into a pure convolution between the square beam and the PS.

Thus in general, the full covariance, in the presence of source clustering, can be evaluated using a two-dimensional integration.

5.3. Model Simplifications

While Equation (38) is the most general form of the clustering covariance, if we assume a simple model for the beam (i.e., a frequency-dependent Gaussian beam), some simplifications can be made.

The frequency-dependence of the Gaussian beam arises through the width of the beam:

Equation (39)

where $\epsilon \simeq 0.42$ is the scaling from an Airy disk to a Gaussian width and D is the tile diameter. Because of this simple relationship, it turns out that the FT of the frequency-dependent beam where the argument is ${\boldsymbol{l}}/\nu $ collapses into the standard FT of the beam at ${\nu }_{\mathrm{low}}$, so we have

Equation (40)

The FT of the beam is $\hat{B}=2\pi {\sigma }^{2}\exp (-2{\pi }^{2}{\sigma }^{2}{u}^{2})$, so inserting this, setting $y=2\pi {\sigma }^{2}$ and $p={f}_{0}^{\prime }/{f}_{0}^{\prime\prime }$, and expanding the exponents, we have

Equation (41)

Acknowledging that the solution is circularly symmetric, we assume that v = 0, and convert to polar coordinates so that ${\boldsymbol{u}}\cdot {{\boldsymbol{u}}}_{1}={{uu}}_{1}\cos \theta $:

Equation (42)

Integrating over θ yields

Equation (43)

where I0 is the zeroth-order modified Bessel function of the first kind. Thus the visibility covariance for any isotropic point-source PS with a Gaussian beam reduces to a single integral.

Let us further assume that the point-source PS is a power-law:

Equation (44)

In this case, the Bessel function may be expanded using its power-series,

Equation (45)

and each term may be analytically integrated to yield Euler gamma functions. The resulting sum may be simplified to

Equation (46)

where $y=2\pi {\sigma }^{2}$, ${}_{1}{F}_{1}$ is the Kummer (confluent) hypergeometric function, and $a=1-\kappa /2$. At large u, Kummer's function asymptotes to

Equation (47)

so that we can approximate the covariance (under the assumption of a Gaussian beam and power-law source spectrum) as

Equation (48)

where $q={(1+p)}^{2}/(1+{p}^{2})$ has a maximum of 2 when ${f}_{0}^{\prime }\equiv {f}_{0}^{\prime\prime }$, and

Equation (49)

Thus the variance on intermediate-to-small scales collapses to a single power law with slope $-\kappa $, and otherwise is a decaying exponential, modified by the same power law.

It is helpful to determine the range of scales on which this large-u approximation holds. Figure 2 shows the relative error induced by the approximation over a range of scales, and for various values of the power-spectral slope. The x-axis units are multiples of the smallest measured scale (i.e., the shortest baseline of the array at 150 MHz). To assess realistic instruments, we use the (128-tile) configuration of the MWA, and the proposed configuration of SKA1-LOW (Dewdney et al. 2016). While the SKA shows an increase in the induced error, primarily due to the reduced beam width, it is clear that over a large range of physical models of the point-source PS, the maximum expected error is less than 10%. In the numerical implementations used for this paper, the large-u approximation is employed, as it is more stable for the majority of scales considered.

Figure 2.

Figure 2. Error induced by the large-u approximation (cf. Equation (48)) as a function of scale for various values of the power-spectral slope κ. The x-axis units are multiples of the smallest measured scale (i.e., the shortest baseline of the array at 150 MHz). This scale changes for different instruments, and we show the MWA and proposed SKA configurations here. The SKA has larger error on its smallest baseline, predominantly owing to its smaller beam size. Regardless, the approximation holds to within 10% over all measured baselines in realistic experiments. The truncation in the MWA curves is due to numerical precision in calculating the LHS of Equation (48).

Standard image High-resolution image

5.4. Summary of Clustered Covariance Model

Several iterations of the total analytic data covariance of the point-source foregrounds have been presented thus far, under a range of simplifying assumptions. This presentation allows for the continued development of the covariance formalism by relaxation of these assumptions. However, for the remainder of this introductory paper, we shall limit ourselves to a definite simple prescription, so as to enable an initial survey of the effects of its parameters.

The total data covariance is approximated by a sum of a Poisson and Clustering term (cf. Equation (34))—an approximation that is validated in the next subsection. Each term is analytically derived using a set of empirical models of the observed sky—namely (i) the distribution of spectral slopes of sources; (ii) the distribution of flux density of sources at a reference frequency ${\nu }_{0};$ (iii) the telescope's spatial attenuation pattern (or the beam); and (iv) the spatial distribution of sources on the sky, parameterized by its isotropic PS.

Henceforth we exclusively employ the following models for these four components: (i) a single universal spectral slope for all sources: $\gamma =0.8;$ (ii) an arbitrarily broken power-law with hard minimum ${S}_{\min }$ and peeling limit ${S}_{\max }$ (cf. Section 4); (iii) a frequency-dependent Gaussian beam, $\exp (-{l}^{2}/2{\sigma }_{\nu }^{2})$, with $\sigma =0.42c/\nu D$, and D the effective tile diameter; and (iv) a single-power-law PS, according to Equation (44). Under these assumptions, the total covariance is given by Equations (23), (26), (34), and (46) (bearing in mind the approximation of Equation (48)).

For ease of reference, we provide a summary of all model parameters in Table 1. In this table, values labeled "Fixed" are considered to be well-known (potentially per telescope) so that they may be kept constant throughout the rest of the analysis (in which case their values are given by the table). The rest of the parameters are varied to assess their impact on results—either in toy models (cf. next subsection) or in the context of a synthetic EoR observation (cf. Section 6). In all cases, the "fiducial model" will correspond to Table 1.

Table 1.  Summary of Analytic Covariance Model Parameters

Param. Description (Fiducial) Value Notes
γ Single universal spectral slope of sources at ${\nu }_{0}$ 0.8 Fixed
${S}_{\min }$ Minimum flux density of sources at ${\nu }_{0}$ 10−1 mJy Varied
${S}_{\max }$ Peeling limit at ${\nu }_{0}$ $\{30,1\}$ mJy Varied
${S}_{\mathrm{break},i+1}$ Position of break in broken power-law source counts between ith and $(i+1)\mathrm{th}$ region [6] mJy Fixed
α Normalization of source count distribution in highest flux density region 6998 ${\mathrm{Jy}}^{-1}\,{\mathrm{sr}}^{-1}$ Fixed
${\beta }_{i}$ Logarithmic slope of source counts in ith region [1.54, 1.95] Varied
D Effective tile diameter of telescope $\{4,35\}$ m Fixed
u0 Normalization of point-source power spectrum 0.05 Varied
κ Slope of point-source power spectrum 1.5 Varied

Note. A summary of all parameters in the analytic covariance model, under the simplified models outlined in Section 5.4. Curly braces in the value column indicate that the parameter is dependent on the telescope itself, and the values given are the fiducial values for an MWA-like and SKA-like array, respectively. Square brackets indicate that the parameter is (potentially) a vector of values (i.e., there is a value for the parameter in each subregion of the source count distribution). In this case, the regions are taken from highest to lowest flux density.

Download table as:  ASCIITypeset image

5.5. Model Features in the Covariance

The analytic covariance follows a straightforward form—at large scales, the clustering term dominates, while on small scales, the Poisson term dominates. The simplest way to determine the effects of the model parameters is to determine the scale ${u}_{\star }$ at which these two terms are equal—the smaller the scale, the more important the clustering term becomes.

The general solution is rather intractable; however, inspection of Figure 4 reveals three key observations: (i) the covariance between different frequencies is always subdominant to the variance at ${u}_{\star }$ (for either the covariance or variance), due to the exponential cut-off; (ii) scales within a physically reasonable observation are typically well-approximated by the large-u approximation (cf. Equation (48)); and (iii) the transition scale is similar for the covariance between frequencies and variance.

These three observations lead us to consider just the variance itself as a measure of the effects of the parameters. By equating Equation (48) with Equation (26) for $\nu ^{\prime} =\nu ^{\prime\prime} $, we find that

Equation (50)

We first consider the relationships between parameters. Clearly, the amplitude of the point-source PS has a linear effect on the importance of the clustering term. The more interesting relationship is that of the ratio of moments of the source count distribution, which involves five out of the nine parameters of the model (${S}_{\min }$, ${S}_{\max }$, α, ${\beta }_{i}$, and ${S}_{\mathrm{break}}$). In Figure 3 we show the effects of modifying a selection of these parameters on the ratio. The parameters varied here are the flux-density boundaries (one of which is entirely unknown and the other dependent on survey design) and the slope of the source counts in the star-forming-galaxy-dominated region of the source counts. The position of the break is kept at its default value, and a (small) change in this value is not expected to modify the qualitative behaviour of the plot. Furthermore, the slope and normalization of the source counts above the break are taken as relatively well-known, and therefore fixed.

Figure 3.

Figure 3. Values of the ratio of moments of the source count distribution, as presented in Equation (50). The x-axes show values of ${S}_{\min }$ and ${S}_{\max }$ (left and right panels, respectively), while the y-axis shows the ratio of the moments. Different colors indicate different values of the power-law slope in the star-forming-galaxy-dominated region of the source counts. The ratio tends to favor a strong clustering signal for lower values of both bounding flux densities, as well as steeper slopes. As expected, the steepness of the slope is not as effective when either boundary is moved higher.

Standard image High-resolution image

The left and right panels of Figure 3 respectively show the ratio as a function of ${S}_{\min }$ and ${S}_{\max }$, with different colored lines indicating various source count slopes. Since a higher ratio corresponds to a greater ${u}_{\star }$ (and thus a greater contribution from the clustering signal), we find that decreasing either boundary results in an increase in the clustering contribution. This increase is more significant for steeper slopes, ${\beta }_{1}$. When ${S}_{\max }$ is below the break (which may be the case for the SKA), there is a turnover, and lower peeling limits correspond to a reduction in the fractional contribution of clustering.

Along with the relationships between parameters, we would also like to consider the absolute value of ${u}_{\star }$ under reasonable parameter choices. This is significant, because a very low ${u}_{\star }$ might be below the limit of measurement for a given instrument, which has a physical minimum baseline length. To do this, we take two extreme cases, under the assumption of MWA-like and SKA-like parameters—(${S}_{\min }$, ${S}_{\max }$, ${\beta }_{1}$, u0, κ) = (0.1 mJy, 30 mJy, 1.54, 1, 4.5) and (0.001 mJy, 1 mJy, 2.2, 0.01, 1.5). For these two sets of parameters, we find ${x}_{\star }={u}_{\star }{\lambda }_{150}\approx (32,6\times {10}^{3})$ m. The first of these is below the minimum baseline for current SKA specifications, while the latter implies that clustering dominates over all relevant baselines for both instruments.

To move forward, we require a more robust physical model of both the source counts in the very-faint regime, and the spatial distribution of sources. We do not pursue these further in this work, but note that once these have been established, a firm estimate of the relative importance of the clustering term can be attained given the instrumental specifications.

5.6. Comparison to Toy Simulations

Before moving on to realistic simulations, we do well to gain intuition from toy models in which we can make as many simplifications as possible. To do this, we generate a distribution of sources on the "sky" (or a finite Euclidean space that we use as an approximation to ${\boldsymbol{l}}$), consistent with an underlying power-law source count distribution with Poisson scatter and a power-law PS, using the powerbox software.4 We then bin these sources separately at each frequency, using different cell sizes so that the corresponding DFT at each frequency contains the correct values of ${\boldsymbol{u}}$. Finally, we attenuate the resulting sky with a Gaussian beam before taking the 2D DFT to calculate visibilities. Each visibility is circularly averaged (its real and imaginary parts, as well as its square) to form a 1D visibility, and the final covariance is calculated as the mean over many realizations of the squared visibility minus the mean of the visibilities squared.

A physical PS is expected to turnover at some scale, and fall to zero at $u=0$. However, as long as this scale is within the resolution of our simulation, it will be completely undetectable (in fact, the finite resolution, d, of the simulation imposes a hard cut on the PS at roughly $u=1/d$). Thus we only trust the results well above this limit (and, conversely, below the scale limit imposed by the finite box size of the simulation).

5.6.1. Basic Tests

Our first task is to ensure that the model, Equation (46), provides a good description of the simulated data, and to examine its features. We show the results of a comparison of the simulated covariance to the model in Figure 4. In this section, as we are comparing to (relatively) expensive simulations and we are interested only in the fidelity of the comparison, rather than the absolute value of the result, we use modified source count limits: $({S}_{\min },{S}_{\max })=(0.1,1)$ Jy. Apart from this modification, we use the default model from Table 1, with $({u}_{0},\kappa )=(0.02,1.5)$.

Figure 4.

Figure 4. Covariance between frequency channels of visibilities, including only point sources. Only two frequency channels are shown for clarity, and these are exaggerated. The reference frequency ${\nu }_{0}=150.0$ MHz, and the displayed frequencies are ${f}_{0}=(1.0,1.25,1.5)$. The box size of the simulation is four times the FWHM of the beam, at 1.68, with 256 grid cells per side, and 200 realizations were used in order to generate the covariance. The power-law PS has $({u}_{0},\kappa )=(0.02,1.5)$. Blue, green, and brown lines indicate variances at each frequency as a function of the perpendicular scale u, while the orange, purple, and maroon lines indicate covariances between the frequencies. Solid lines show fully simulated results, while dashed lines show the result of Equation (38). The LHS panel shows the results of the full covariance, with both clustering and Poisson scatter present, while the center and RHS panels show the results with just clustering and Poisson terms, respectively. Equation (38) assumes these stochastic effects can be separated, and the LHS panel confirms the validity of this assumption, revealing no significant departure of the simulation from the model. The primary effect of including Poisson scatter is to add a constant to the variances, to which the solutions asymptote.

Standard image High-resolution image

The immediate comparison is between solid and dashed lines, which indicate the simulation and model, respectively. These show very good agreement across the range of scales u shown here. The range shown in the plot corresponds roughly to the resolution limits of the simulations.

Of particular interest in Figure 4 is the comparison of the model to simulation in the presence of Poisson scatter. Recall that the derivation of the model incorporated the assumption that the stochasticity of clustering and Poisson scatter could be separated into two terms. The left-hand panel shows the results of the model including this assumption against the simulations, which require no such assumption. Though there is potentially some small departure at low u, the overall effect is negligible. The middle panel shows a comparison in which no Poisson scatter is used at all, which we expect to be in tighter agreement. The improvement, however, is almost visually imperceptible. Thus we conclude that the assumption of separability is indeed valid.

We finally note the features of the model. Clearly, ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Poiss}}$ (right-hand panel) has a constant variance, with exponential decay in the covariance (as made clear in Equation (25)). This chromaticity of the instrument is what results in the so-called foreground "wedge" in the 2D PS. The clustering term, ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Clust}}$, displays a power law for the variance, of the same slope as the point-source PS, and normalization dictated by a combination of factors, including the normalization of the point-source PS. The covariance is also a power law at low-u, but experiences exponentially decay at mid-u. We expect this exponential decay to be a feature of the beam, rather than the PS, while the low-u slope is clearly dominated by the latter. Their combination in the left-hand panel is very close to a pure addition of these terms. The resulting variance is dominated by the clustering term on large scales, and the Poisson term on small scales. The transition scale will be dictated by the relative normalization between the two, along with the form of the point-source PS. A similar trend emerges for the covariance, with large scales dominated by ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Clust}}$, and small scales behaving similarly between the two terms. Thus in summary we expect that ${{\boldsymbol{C}}}_{\mathrm{FG}}^{\mathrm{Clust}}$ may be important (dependent on its normalization) for describing the covariance relevant for EoR analyses.

Another important assumption to check is that of Gaussianity of the underlying density field. It is well-known that a Gaussian density field is physically inconsistent, since the support of the Gaussian distribution is over all real numbers, including negative numbers. A more physically appropriate distribution is the log-normal distribution (Coles & Jones 1991). In over-density, ${\delta }_{x}$, this distribution has support on $[-1,\infty )$, and for a mean-zero field with small variance, is almost identical to a Gaussian distribution. Our simulations, produced with the powerbox package, use the log-normal distribution for this reason. However, producing a log-normal field requires a non-linear transformation that is not captured in our model prediction. Thus, in Figure 5 we show some comparisons between the inherently Gaussian model and the log-normal simulations.

Figure 5.

Figure 5. Comparison of model covariance to simulations with varying magnitudes of bin-to-bin variance, ${\sigma }^{2}$. The point-source PS has κ = 1.5, with varying ${\omega }_{0}$, as shown by line color. Model predictions (shown by dashed lines in left-hand panel) are in good agreement for low ${\omega }_{0}$ (corresponding to low ${\sigma }^{2}$), but start to diverge at high ${\sigma }^{2}$. Simulations use a physically viable log-normal over-density distribution (shown in right-hand panel), while the model assumes a Gaussian distribution (for which the pdf with matching ${\sigma }^{2}$ is shown as a solid curve in the right-hand panel). The further the underlying distribution strays from Gaussianity, the more inaccurate the model prediction. Realistic sky distributions will tend to be close to Gaussian.

Standard image High-resolution image

We find, overall, that agreement is extremely good. The blue curves show consistency to an imperceptible level, as evidenced by the left-hand panel, while the underlying distribution clearly shows departure from Gaussianity. It is not until the high-variance green curve, with u0 = 0.9, that inaccuracies become apparent, and the underlying field in this case is dramatically non-Gaussian. In practice, the sky should be reasonably Gaussian, as it is the integrated spectrum over many redshifts that are not fully correlated. Thus our Gaussian model is an excellent approximation.

Our final test is to assess the impact of a non-universal SED. Figure 6 shows the results of the covariance as a function of u, similarly to Figure 4, but in which the solid lines use our fiducial universal SED, while the dashed lines use a non-universal slope for the underlying power-law SED. The distribution of γ is taken from Callingham et al. (2017), and is a relatively tight Gaussian distribution centered on $\gamma =0.8$, with a width of 0.2. The non-universal SED simulation was run such that the total number of sources at each frequency was constant, which mimics actual observations. The figure shows clearly that the disparity is minimal, certainly negligible compared to the uncertainty in the model parameters. While it may be interesting to determine the limits on variability of the SED under a number of scenarios, we do not further pursue it in this work, since clearly realistic models of the variability are inconsequential.

Figure 6.

Figure 6. Comparison of simulated covariance with and without a universal SED. Universal SED parameters are identical to Figure 4, while the non-universal SED comprises power-law SEs with slopes drawn from a normal distribution about $\gamma =0.8$ with width 0.2. No significant variation is evident.

Standard image High-resolution image

5.6.2. Model Features in Fourier Space

The quantity that is directly used in the CHIPS algorithm is not the covariance as a function of frequency, ${{\boldsymbol{C}}}_{\mathrm{FG}}$, but its transformation to the covariance of the 2D power modes ${{\boldsymbol{C}}}_{P}$ (cf. Equation (11)). We note that although in principle the CHIPS formalism utilizes the full covariance matrix in Fourier space, the remaining plots will show only the variance, ${\sigma }_{P}^{2}({k}_{| | },{k}_{\perp })=\delta ({k}_{| | }^{\prime }-{k}_{| | }^{\prime\prime }){\boldsymbol{C}}({k}_{| | }^{\prime },{k}_{| | }^{\prime\prime },{k}_{\perp })$.

In Figure 7, we show ${\sigma }_{P}$ for each 2D mode of the PS, for both the clustering and Poisson-only solution (left and right panels, respectively). This quantity is closely related to the foreground-contributed power, hence the likeness to the standard 2D PS plots, with clearly visible wedge feature. There is a noticeable excess of power at low ${k}_{\perp }$ in the total solution with clustering, both in the foreground-dominated region and the window. This arises from the excess large-scale power contributed by source clustering.

Figure 7.

Figure 7. Standard deviation of the power in 2D modes, for both the total clustering Poisson solution (left-hand panel) and Poisson-only solution (right-hand panel) using a fiducial (MWA-like) model (see Table 1). Colour-scales are equivalent in each panel to aid comparison.

Standard image High-resolution image

A measurement of interest is the relative importance of the clustering term at any 2D mode. After all, if the clustering term is typically subdominant, it may not be worth the additional computational and intellectual resources required to include it in our models.

In Figure 8 we display the quantity ${\sigma }_{\mathrm{Poiss}}/({\sigma }_{\mathrm{Clust}}+{\sigma }_{\mathrm{Poiss}})$—i.e., the relative contribution of the Poisson standard deviation to the total, for several sets of parameters. The fiducial case is the same as used previously.

Figure 8.

Figure 8. Survey of the effects of various parameters of the model on the ratio of the Poisson-only to the total 2D PS. The top-left panel is the fiducial case, and moving clockwise, the slope, the peeling flux density limit, and the power spectrum normalization are modified.

Standard image High-resolution image

The first point to note is that the clustering fraction is almost exclusively dependent on ${k}_{\perp }$, with negligible dependence on ${k}_{| | }$. The form of this dependence is such that the clustering is dominant at small ${k}_{\perp }$, and diminishes toward smaller scales. The point at which it reaches a fraction of 0.5 will be closely related to ${u}_{\star }$.

We are here concerned primarily with the effects of the parameters, not the actual ratio for any given set. Thus we see that an increase in the power-spectral slope results in a compression of foreground power into very large-scale modes, and potentially out of the observation altogether. Alternatively, an increase in u0, or reduction of ${S}_{\min }$ or ${S}_{\max }$, has the opposite effect of increasing the range of clustering-dominance. The slope of the faint source counts does not seem to have a large impact for the default parameters employed here.

6. Application of Clustering Model to Synthetic EoR Data

To quantify the importance of including clustering information in the visibility covariance, we perform a mock test, by measuring the PS from synthetic data, with clustered foregrounds included, using both the original and improved solutions.

6.1. Simulation Details

We use the faint galaxies simulation from the Evolution of Structure (EOS) project5 , which was run with 21cmFAST (Mesinger et al. 2010). The simulation has a size of 1600 cMpc, with 10243 grid cells, and is provided as a "best guess" at reionization physics. We specifically use the brightness temperature lightcone that covers our redshift range of interest. The z-axis coordinates are converted to redshifts by assuming constant intervals in comoving distance under the default Planck15 cosmology, and the box is cut to contain the range of frequencies 150–161 MHz in 128 bins, approximating an MWA observation.

Ideally, one would process the simulation through an instrumental pipeline, so that all instrumental biases present in the covariances would also be present in the synthetic signal. However, the angular size of the simulation is too small, at a redshift of ∼9, to cover a single primary beam width, which undermines this approach. Instead, we choose to simply measure the 2D signal by 3D Fourier-transforming the original box and averaging over the perpendicular modes of the (volume-normalized) PS. Because of this, the perpendicular scales covered span a large portion of those available to the MWA (and planned for SKA1-LOW), but do not reach to the largest scales we can probe. This should be kept in mind for the remainder of this section, as the largest influence of the clustering solution arises on these large scales.

6.2. Signal-to-noise Estimates

Figure 9 shows the signal-to-noise, defined as ${P}_{\mathrm{signal}}/\sigma ({k}_{\perp },{k}_{| | })$, for a variety of realizations of the foreground model. The basic parameters of the model retain their fiducial values (cf. Table 1), while the top panel uses the MWA-like parameters where appropriate, and the lower panel uses SKA-like parameters. Left- and right-hand panels again indicate full and Poisson-only solutions.

Figure 9.

Figure 9. Signal-to-noise ratios for a signal derived from the simulation described in text, and foreground models based on Table 1. Top panel and bottom panels use MWA-like and SKA-like parameters, respectively, while left- and right-hand panels indicate full and Poisson-only solutions.

Standard image High-resolution image

We remind the reader that only extragalactic point-source foregrounds are included in this plot. Thus it represents what would be achieved with an infinite amount of observing time with a well-known instrument, after perfect removal of Galactic foregrounds and other systematics. Clearly, this should not be taken as a prediction for a detection or otherwise using our model, but rather we are interested in the relative behavior of our model compared to the Poisson solution, with respect to a fiducial signal.

The most striking feature is that the SKA performs significantly better than the MWA. This is not directly attributable to the instrument, but rather to the level of peeling performed in the experiment. Nevertheless, this clearly relates back to the instrument's capability to deeply model individual foreground sources to provide a catalog for this peeling.

For both MWA- and SKA-like mock observations, the lowest ${k}_{\perp }$ modes have a significantly reduced signal-to-noise in the full solution as compared with the Poisson-only solution. This is just as expected, due to the extra noise from the source PS. We do note, however, that these high ${k}_{\perp }$ window-modes already have a suppression of signal-to-noise, even in the Poisson case. This will limit the influence of the clustering term in the final averaged 1D PS.

6.3. Effects in the 1D PS

The ultimate objective is to determine the effects of ignoring the clustering term on the 1D PS. We expect two effects may arise—a bias from ignoring extra foreground power, and an artificially decreased uncertainty.

The CHIPS formalism does not remove the foreground power from the estimated signal due to the potential for over-subtraction of the dominant source of power. Instead, since the goal at this point is merely for a robust detection, rather than an accurate measurement of the signal, it leaves the foregrounds in the estimation, checking whether they are consistent with non-zero power through the ascribed uncertainty. Thus we generate our mock estimated 1D PS via

Equation (51)

where the "i" notation indicates that we are using entries only in the ith radial bin. Furthermore, the uncertainty is estimated as the weighted average across the bin:

Equation (52)

Figure 10 shows the results of this averaging over the 2D spectra shown in Figure 9 (top-left panel) and several other parameter choices. Again, results here are not to be interpreted as actual predictions, because of the lack of other foregrounds, but we do gain insight as to the importance of including the clustering information.

Figure 10.

Figure 10. Inverse-covariance weighted 1D power spectra. Each panel shows MWA-like and SKA-like mock observations, for both the full solution and the Poisson-only solution, as well as the underlying signal. Different panels involve different model parameter choices. Error-bars are derived in text.

Standard image High-resolution image

For most parameter choices, the SKA results in a very precise and accurate measurement of the input signal—as compared to the MWA model, which always includes a significant bias. Again, this is due to the level of peeling performed in this hypothetical experiment. Nevertheless a few interesting features arise in the MWA-like models. First, in most cases presented here, there is discernible extra bias if the clustering term is ignored. In general, the amplitude of this extra bias is increased for models in which faint sources are relatively more abundant, or where the source spectrum is of higher amplitude. The opposite is true when the source PS is very steep. In this case, the clustering plays no visible role.

At the same time, as expected, the error-bars are artificially reduced when ignoring clustering. Importantly, in almost all cases (except the steep-source-spectrum case), there are scales at which a false detection occurs if clustering is ignored. That is, there are scales at which the bias increase and artificial decrease in uncertainty would indicate a significant non-zero power, when in fact the full model shows that no detection should be attainable. Indeed, if faint sources are highly abundant and the source spectrum is shallow (lower center and right-hand plots), even the fiducial SKA model yields significantly inaccurate measurements.6

This is strong motivation for including our clustering extension in the general CHIPS framework in the future. Besides this, there is strong motivation for future surveys to constrain the model parameters—especially the faint source count slope ${\beta }_{1}$, and the source PS parameters u0 and κ. We note that surveys at higher frequencies have measured these quantities (e.g., Blake et al. 2004), but how they translate to lower frequencies and fainter sources is uncertain. There is already work underway to measure these quantities with low-frequency surveys such as GLEAM and EoR-specific deep pointings (Hurley-Walker et al. 2016).

7. Conclusions and Future Prospects

In this paper we have derived an improved statistical point-source foreground covariance model for use in inverse-covariance suppression schemes in the context of the EoR. In particular, beginning from the basic CHIPS model proposed in Trott et al. (2016), we investigated extensions to two components. We extended (i) the source count distribution from a single power law to an arbitrarily broken power law, and (ii) the spatial distribution from a uniform Poisson process to an arbitrary isotropic distribution—especially a power-law configuration.

In doing so, we have derived a general framework for including these extensions, applicable in principle to current and future interferometers. In addition we have verified the validity of our expressions by comparison to statistical simulations, along with providing an expression for the on-sky scale above which our extensions begin to dominate the covariance.

7.1. Future Prospects

There are a number of ways in which the present analysis may be improved and extended. In terms of improvement, we have noted that the framework sits more naturally within the context of spherical harmonics, which would dispense with several of the approximations made in this paper (e.g., Liu et al.2016). Furthermore, we have maintained the simplistic assumption of complete uniformity of $S(\nu )$ across noting that a relaxaton of this assumption could alter the result that all source count model effects are sequestered into a single factor. Nevertheless, we showed that the extent of this inaccuracy is likely to be minimal. A more drastic improvement would be to include covariance between baselines in the framework. At this point, we have considered only covariance between frequencies for the same baseline, which greatly simplifies the calculations.

As far as extensions of this framework are concerned, a useful idea may be to link it with a physically motivated prescription for the source population models. An example of this may be a radio-frequency conditional luminosity function (CLF; Cooray 2006) as a function of redshift, out of which the source counts, SED, and source clustering may be derived. This would simultaneously provide a convenient way to model the foregrounds for suppression, but also to understand them in their own right.

We also note that the general approach to calculate the clustering term for the covariance presented here may be utilized in other foreground regimes. In particular, diffuse Galactic emission is often described as isotropic turbulence within a single observed field, which may be succinctly quantified under the same framework. Furthermore, the ionosphere's total electron content (TEC) is typically prescribed as near-Kolmogorov turbulence (with the potential addition of a large-amplitude single wave-mode to describe travelling ionospheric disturbances). In this case, the TEC field acts as a differential screen, so that the resulting foreground covariance must be derived via a similar approach as that used in weak lensing studies. Nevertheless, it is easy to see that even this contamination source can be connected to the formalism presented here.

7.2. Conclusions

Predictions of our model extension can be summarized as

  • 1.  
    There is excess foreground power at large ${k}_{\perp };$
  • 2.  
    This excess foreground is a function almost exclusively of ${k}_{\perp }$, rather than ${k}_{| | };$
  • 3.  
    Models with a higher relative abundance of faint sources in general increase the relative impact of ignoring our extensions;
  • 4.  
    Models with steeper source spectra decrease the relative impact of ignoring our extensions;
  • 5.  
    Under some parameter choices, ignoring the extensions induces false detections (i.e., a significant non-zero power which in fact is merely bias).

These predictions indicate that future measurements of the EoR PS could be greatly benefited from including these model extensions—particularly the clustering of sources. Furthermore, while deeper surveys will render the total point-source foreground contamination less important, the remainder of that contamination will be more dominated by clustering. Thus using SKA1-LOW in generating accurate measurements of the EoR PS will require taking into account the framework we have presented here.

These results also motivate the accurate determination of source count models down to faint limits, as well as point-source power spectra. Future deep surveys such as GLEAM-X (Hurley-Walker et al. 2017) and those performed with SKA1-LOW will be indispensable in providing high-fidelity measurements of these models, which will feed back into the covariance models for the EoR.

Parts of this research were conducted by the Australian Research Council Centre of Excellence for All-sky Astrophysics (CAASTRO), through project number CE110001020. S.G.M. would like to thank Jean-Pierre Macquart, Randall Wayth, and Cullan Howlett for useful discussions throughout the preparation of this paper. This research has made use of NASA's Astrophysics Data System. C.M.T. is supported under the Australian Research Council's Discovery Early Career Researcher funding scheme (project number DE140100316). This work was supported by resources provided by the Pawsey Supercomputing Centre, with funding from the Australian Government and the Government of Western Australia. We acknowledge the International Centre for Radio Astronomy Research (ICRAR), a joint venture of Curtin University and the University of Western Australia, funded by the Western Australian state government.

Appendix: Cosmological Unit Conversions

Conversion between observation parameters $({\boldsymbol{u}},\nu )$ and the cosmological parameters is given by Morales & Hewitt (2004) as

Equation (53)

Equation (54)

Here z is the central redshift of the observation, ${f}_{21}\sim 1.42$ GHz is the rest frequency of the 21 cm line, DM(z) is the transverse comoving distance at that redshift, and in a flat universe we adopt

Equation (55)

The natural units of the PS are ${\mathrm{Jy}}^{2}\,{\mathrm{Hz}}^{2}$, but these can be converted to the standard units of $\mathrm{mK}\,{\mathrm{Mpc}}^{3}\,{h}^{-3}$ using the following recipe. To convert Hz to $\mathrm{Mpc}\,{h}^{-1}$, one divides by G(z), then to convert Jy to K.sr, we use the following conversion:

Equation (56)

where ${A}_{\mathrm{eff}}\sim 20\,{{\rm{m}}}^{2}$ is the effective area of the MWA telescope at $\nu \sim 150\,\mathrm{MHz}$, and kB is Boltzmann's constant. The steradians are converted to ${\mathrm{Mpc}}^{2}\,{h}^{-2}$ via a squared factor of the comoving distance, ${D}^{2}(z)$, and finally the value is normalized by the volume

Equation (57)

where ${\rm{\Delta }}\nu $ is the range of frequencies in the observation. Explicitly, the conversion from ${\mathrm{Jy}}^{2}\,{\mathrm{Hz}}^{2}$ to $\mathrm{mK}\,{\mathrm{Mpc}}^{3}\,{h}^{-3}$ is

Equation (58)

Footnotes

Please wait… references are loading.
10.3847/1538-4357/aa7d0a