ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks

Adrian A. Collister; Ofer Lahav

doi:10.1086/383254

1. INTRODUCTION

In its most general sense, the term "photometric redshift" refers to a redshift estimated using only medium‐ or broadband photometry or imaging. Most commonly, photometric redshifts are determined on the basis of galaxies' colors in three or more filters (thus giving a very coarse approximation to the spectral energy distribution [SED], but they could also be based on other properties that can be derived from images, such as the angular size or concentration index. The method has found successful application to deep‐field and wide‐field surveys, notably the Hubble Deep Field (e.g., Fernández‐Soto, Lanzetta, & Yahil 1999) and the Sloan Digital Sky Survey (SDSS; Csabai et al. 2003).

The most commonly used approach to photometric redshift estimation is the "template matching" technique. This requires a set of "template" SEDs covering a range of galaxy types, luminosities, and redshifts appropriate to the population for which photometric redshifts are required. For a particular target galaxy, the photometric redshift is chosen to be the redshift of the most closely matching template spectrum; this is usually defined as the template that minimizes the χ² between the template and actual magnitudes.

The template spectra are usually derived from a small set of SEDs representing different classes of galaxy at redshift z = 0, which are then manually redshifted to give a discrete sampling along the redshift axis (note that this method does not account for evolution with redshift). Commonly used template sets are the Coleman, Wu, & Weedman (1980; hereafter CWW) SEDs, which are derived observationally, or those of Bruzual & Charlot (1993), derived from population synthesis models. The template‐matching technique owes its popularity to the very few resources required for a basic implementation (i.e., a handful of template SEDs), but the accuracy of the technique strongly depends on the extent to which the template spectra are representative of the target populations: for example, template SEDs derived from observations of low‐redshift galaxy populations may be a poor match for populations at higher redshifts.

The chances of success can be improved by increasing the number of templates, or by more carefully matching the templates to the populations being studied. For example, the spectroscopic catalog of the SDSS (York et al. 2000) could be used to produce a set of templates that are very well representative of the SDSS photometric catalog (Csabai et al. 2003). However, in situations with such a large amount of prior redshift information about the sample, the template‐matching technique is not the best approach: so‐called "empirical" methods usually offer greater accuracy, as well as being far more efficient.

In essence, empirical photometric redshift methods aim to derive a parametrization for the redshift as a function of the photometric parameters. The form of this parametrization is deduced through use of a suitably large and representative training set of galaxies for which we have both photometry and a precisely known redshift. A simple example is to express the redshift as a polynomial in the galaxy colors (e.g., Connolly et al. 1995; Sowards‐Emmerd et al. 2000). The coefficients in the polynomial are varied to optimize the fit between the predicted and measured redshift. The photometric redshift for the galaxies for which we have no spectroscopy can then be estimated by applying the optimized function to the colors of the target galaxy.

Ideally, the training set would be a representative subset of the actual photometric target sample (this has the attractive side effect of nullifying any systematics in the photometry). However, the training set could also be derived from a set of template spectra or from simulated catalogs (e.g., Vanzella et al. 2004). The photometry for the training set must be for the same filter set and should have the same noise characteristics as that for the target sample. The trained method can usually only be reliably applied to target galaxies within the ranges of redshift and spectral type adequately sampled by the training set.

In this paper we introduce ANNz, a software package for photometric redshift estimation using artificial neural networks (ANNs) to parametrize the redshift‐photometry relation. It can be shown (e.g., Jones 1990; Blum & Li 1991) that a sufficiently complex ANN is capable of approximating to arbitrary accuracy any continuous functional mapping. ANNs have previously found a number of applications in astronomy, including morphological classification of galaxies (e.g., Lahav et al. 1996; Ball et al. 2003) star/galaxy separation (Bertin & Arnouts 1996), and object detection (e.g., Andreon et al. 2000). Firth, Lahav, & Somerville (2003) previously demonstrated the feasibility of using ANNs for photometric redshift estimation, and more recently, Vanzella et al. (2004) have applied the method to the Hubble Deep Fields.

The layout of this paper is as follows. In § 2 artificial neural networks are introduced, and the particular methods used by ANNz are explained. In § 3 ANNz is applied to the SDSS. The results are compared with rival photometric redshift estimators and various extensions to the basic technique are explained and illustrated. Finally, less ideal conditions are simulated to assess the impact on the accuracy of photometric redshift estimation. In § 4 the results are summarized, and prospects for the application of ANNz discussed.

2. ARTIFICIAL NEURAL NETWORKS

ANNz uses a particular species of ANN known formally as a "multilayer perceptron" (MLP). A MLP consists of a number of layers of nodes (Fig. 1; see, e.g., Bishop 1995 and references therein for background). The first layer contains the inputs, which in our application to photometric redshift estimation are the magnitudes, m_i, of a galaxy in a number of filters (for ease of notation we arrange these in a vector m≡[m₁,m₂,...,m_n]). The final layer contains the outputs; we will usually use just one output, the photometric redshift z_phot, but see § 3.2.2 for an example with multiple outputs. Intervening layers are described as "hidden," and there is complete freedom over the number and size of hidden layers used. The nodes in a given layer are connected to all the nodes in adjacent layers. A particular network architecture can be denoted by N_in∶N₁∶N₂∶...∶N_out, where N_in is the number of input nodes, N₁ is the number of nodes in the first hidden layer, and so on. For example, 9∶6∶1 takes nine inputs, has six nodes in a single hidden layer, and gives a single output.

Each connection carries a weight, w_ij; these comprise the vector of coefficients, w, that are to be optimized. An activation function, g_j(u_j), is defined at each node, taking as its argument

where the sum is over all nodes i sending connections to node j. The activation functions are typically taken (in analogy to biological neurons) to be sigmoid functions such as g_j(u_j) = 1/[1 + exp (- u_j)], and we follow this approach here. An extra input node—the bias node—is automatically included to allow for additive constants in these functions.

For a particular input vector, the output vector of the network is determined by progressing sequentially through the network layers, from inputs to outputs, calculating the activation of each node (hence, this type of neural network is often referred to as a "feed forward" network).

2.1. Network Training

Given a suitable training set of galaxies for which we have both photometry, m, and a spectroscopic redshift, z_spec, the ANN is trained by minimizing the cost function

with respect to the weights, w, where z_phot(w,m_k) is the network output for the given input and weight vectors, and the sum is over the galaxies in the training set. To ensure that the weights are regularized (i.e., that they do not become too large), an extra quadratic cost term

is added to equation (2).

ANNz uses an iterative quasi‐Newtonian method to perform this minimization. Details of the minimization algorithm and regularization can be found in Bishop (1995) and Lahav et al. (1996; appendices).

After each training iteration, the cost function is also evaluated on a separate validation set. After a chosen number of training iterations, training terminates and the final weights chosen for the ANN are those from the iteration at which the cost function is minimal on the validation set. This is useful to avoid overfitting to the training set if the training set is small. The trained network can then be presented with previously unseen input vectors, and the outputs computed.

2.2. Photometric Noise

In real situations the inputs to the network (e.g., in this case, the magnitudes of photometric redshift estimation) will usually have a measurement noise associated with them. We can assess the variance these errors effect in the output using the usual chain‐rule approach:

where the sum is over the network inputs.

Given a trained network, the output is an analytic function of the network weights and the input vector, z = z(w,m). Provided the activation functions, g_i(u_i), are differentiable, the derivatives ∂z/∂m_i can be obtained through a simple and efficient algorithm (Bishop 1995, pp. 148–150). This method is used by ANNz to estimate the variance in its photometric redshifts due to the photometric noise.

2.3. Network Variance

Prior to training, ANNz randomizes the initial values of the weights. Depending on the particular initialization state used, the training process will usually converge to different local minima of the cost function. A simple possibility is to train a number of networks and select one based on the best performance on the validation set. However, this is a wasteful use of training effort; in fact, the suboptimal networks can be used to improve overall accuracy: the mean of the individual outputs of a group of networks (known as a "committee") will usually be a more accurate estimate for the true redshift than the outputs of any one committee member in isolation.

Using a committee also allows the uncertainty in the output due to the variance in the network weights to be estimated. For a particular target galaxy, the photometric redshift prediction should ideally be robust to different initializations of the weight vector. However, it may be the case that the available photometry or training set does not constrain the redshift very well (even for high signal‐to‐noise photometry, so the error estimated by the method of § 2.2 could be relatively small). These cases are more likely to show a large variance in the output for different initializations of the weight vector; hence, using a committee may assist in their identification. ANNz allows arbitrarily large committees to be used and estimates the contribution of the network variance to the error in the photometric redshift for each target galaxy.

2.4. Using the ANNz Package

We have made ANNz available on the World Wide Web.³ Full instructions are provided with the package, but we provide an outline of the procedure here. ANNz comprises two main programs: annz_train and annz_test.

1.
When applying ANNz to any data set for the first time, it is strongly recommended that a portion of the available training data be set aside as an evaluation set. This is used as a mock target sample to assess and tune ANNz's performance on the data. The evaluation set should therefore be chosen to match the real target sample as closely as possible in terms of its magnitude and color distributions.
2.
The remaining training data should be separated into training and validation sets that are supplied to the annz_train program along with a description of the required network architecture. This program performs the network training as described in § 2.1. The trained network weights are saved to file.
3.
Step 2 may be repeated several times using different network initializations to obtain a committee of trained networks.
4.
The annz_test program can now be used to apply the trained networks to the target data.

Before applying ANNz to the actual photometric target sample, the whole procedure should be run several times using the evaluation set as the target data and varying the parameters of the training (e.g., weight decay, training and validation set sizes, number of networks in the committee) so as to optimize the performance.

3. APPLICATION TO SDSS DATA

The SDSS⁴ (York et al. 2000) combines a large, five‐band (ugriz) imaging survey with a smaller spectroscopic follow‐up survey. This is an ideal situation for the application of ANNz, since the spectroscopic survey represents an excellent training set for the imaging survey.

The selection algorithm for the SDSS spectroscopic survey results in two subsets of the data: a main galaxy catalog and a luminous red galaxy catalog (LRG; Eisenstein et al. 2001). The main galaxy catalog is a flux‐limited sample (r<17.77) with a median redshift z = 0.104 (Strauss et al. 2002), while the LRG catalog is flux‐ and color‐selected to be a very uniform and approximately volume limited sample (it is volume limited to z ≈ 0.4 but probes out to z ≈ 0.6 at lower completion).

3.1. Comparison of ANNz with Other Techniques

The SDSS consortium have themselves applied a range of photometric redshift techniques to their commissioning data (Csabai et al. 2003). Table 1 lists the estimation errors they obtained. This commissioning data was made public in the Early Data Release (EDR; Stoughton et al. 2002). In order to allow a direct comparison of the accuracy of ANNz with the methods used by Csabai et al. (2003), we selected the main galaxy and LRG samples from the EDR. From these ∼30,000 galaxies, we randomly selected training, validation, and evaluation sets with the respective sizes 15,000, 5000, and 10,000. The network inputs were the dereddened model magnitudes in each of the five filters, and the overall architecture was 5∶10∶10∶1. A committee of five such networks was trained on the training and validation sets, then applied to the evaluation set. Figure 2 shows the ANNz photometric redshift against the spectroscopic value for each galaxy in the evaluation set. The rms deviation between these is σ_rms = 〈(z_phot - z_spec)²〉^1/2 = 0.0229, which compares well with the results in Table 1. For clarity, the estimated errors on the photometric redshifts are not shown in Figure 2. The results for a randomly selected subset of 200 galaxies are shown with error bars in Figure 3. Because of the high quality of the training data in this case, network variance makes only a small contribution, and the errors are therefore dominated by the photometric noise.

**Fig. 2.—** Spectroscopic vs. photometric redshifts for ANNz applied to 10,000 galaxies randomly selected from the SDSS EDR.

**Fig. 3.—** Subset of 200 galaxies randomly selected from the results of Fig. 2, with the error bars calculated by ANNz shown. These are a combination of contributions from photometric noise (§ 2.2) and network variance (§ 2.3).

HYPERZ (Bolzonella, Miralles, & Pelló 2000) is a widely used template‐based photometric redshift package. In order to more directly compare ANNz with the template‐matching method, HYPERZ was applied to the same evaluation set using the CWW template SEDs. It is clear from the results in Figure 4 that not only is the rms dispersion in the photometric redshift considerably greater than that for ANNz, but there are also systematic deviations in the HYPERZ results. The SDSS consortium obtained similar accuracies to HYPERZ in their implementation of the basic template‐fitting technique (the results labeled "CWW" and "Bruzual‐Charlot" in Table 1 are for the respective template sets). With more sophisticated template‐based methods, they were able to improve on these errors: the result labeled "Interpolated" was obtained by first tuning the templates using the spectroscopic sample as a training set, then producing a continuous range of templates by interpolating between the tweaked SEDs. However, even "hybrid" methods such as this still do not match the accuracy achieved by the purely empirical methods (in the table, these are "Polynomial," which uses a second‐order polynomial as the fitting function, and "Kd‐tree," in which the training set is partitioned in color‐space and a separate second‐order polynomial is fitted in each cell).

**Fig. 4.—** Photometric redshift estimation using HYPERZ with the CWW template SEDs. This uses the same 10,000 galaxy sample as Fig. 2. There are obvious systematic deviations, with bands apparent above and below the z_phot = z_spec line.

3.2. Extensions to the Basic Method

In this section more advanced use of ANNz is demonstrated. These examples use the LRG and main galaxy data from the SDSS Data Release 1 (DR1; Abazajian et al. 2003), split into training, validation, and evaluation sets of respective sizes 50,000, 10,000, and 64,175. For these data the photometric redshift accuracy on the evaluation set when using the same basic method as in § 3.1 was σ_rms = 0.0238.

3.2.1. Using Additional Inputs

One of the great advantages of empirical photometric redshift methods is the ease with which we can introduce additional observables into our parametrization of the photometric redshift. This is particularly true for ANNz; we simply add an extra input to our network architecture for each new parameter we wish to consider. ANNz treats these new inputs in exactly the same way as it does the galaxy magnitudes.

If the additional inputs contain useful information, then the ANN will use this to improve the accuracy of its predictions. However, increasing the number of inputs to the ANN generally leads to a reduction in the generalization capabilities of the network (that is, its ability to make predictions for data on which it has not been trained). Thus, the inputs should be chosen carefully, as noninformative inputs may actually lead to a worsened ANN performance: due to the increased dimensionality of the input space, larger training sets may be required, and there will be an increased likelihood of converging to a local, rather than the global, minimum.

By way of example, the r‐band 50% and 90% Petrosian‐flux radii were added as two extra inputs to our ANN. These are the angular radii (concentric with the galaxy brightness distribution) containing the stated fraction of the Petrosian flux, and therefore contain information on the angular size of the galaxy (clearly a strongly distance‐dependent property) and the "concentration index" (essentially the steepness of the galaxy brightness profile, which may help break degeneracies in the redshift‐color relationship). Running this extended data set through ANNz (using a committee of five 7∶11∶11∶1 networks) produced a redshift estimation accuracy of σ_rms = 0.0230, an improvement of ∼3% compared to the results based only on the magnitudes. In this example the improvement is small (mainly because the training sample already provided excellent redshift information), but it demonstrates well how straightforwardly the extra information could be included for consideration by ANNz.

3.2.2. Predicting Spectral Type

It is equally straightforward to train ANNz to make predictions for properties other than the redshift. Template‐matching photometric redshift techniques have the useful side effect of assigning an estimated spectral type to each galaxy, in addition to estimating the redshift. Firth et al. (2003) demonstrated the use of ANNs to determine spectral types from broadband photometry.

The spectroscopic catalog of the SDSS includes a continuous parameter (eClass) indicating spectral type that ranges from approximately −0.5 (early types) to 1 (late types). A 5∶10∶10∶2 network architecture was used to attempt the simultaneous estimation of redshift and eClass from the photometry. The accuracy of the redshift estimation was very slightly poorer, σ_rms = 0.0241. The eClass was determined with an rms error of σ_rms = 0.0516 (Fig. 5).

**Fig. 5.—** Results from using ANNz to predict the spectral type (in the form of the *eClass* parameter) simultaneously with the redshift for 64,175 galaxies from the SDSS Data Release 1.

3.3. More Realistic Conditions

Our example applications to the SDSS above are somewhat idealistic, since we are training and testing on samples with identical redshift, magnitude, and galaxy species distributions. Furthermore, our training samples have thus far been very large. In this section, less optimal training sets are used to investigate their impact on the photometric redshift accuracy.

3.3.1. Smaller Training Sets

The size of training sample needed will be strongly dependent on the range of redshifts and galaxy types in the target sample. The same evaluation set of 64,175 galaxies was submitted to networks trained on randomly selected samples of (1) 2000 galaxies and (2) 200 galaxies. In both cases these samples were split equally into the training and validation sets. Committees of five 5∶10∶10∶1 networks were used.

The photometric redshift accuracies were respectively (1) σ_rms = 0.0263 and (2) σ_rms = 0.0343. In the first case the loss of accuracy is small, while the second case demonstrates well the problems associated with small training sets. The rarer classes of objects in the target sample (e.g., here, those at high redshift) feature very sparsely (if at all) in the training set, and so the network is unable to sensibly deal with these objects when they appear in the testing data. This leads to an increased number of outliers and, potentially, the introduction of systematic errors.

3.3.2. Biased Training Sets

For increasingly faint targets, acquiring good spectroscopy becomes increasingly difficult and eventually prohibitively expensive; this problem is the primary motivation for photometric redshifts. In practice then, the available spectroscopic training sample is likely to be somewhat brighter on average than the photometric target sets. However, the major stumbling block for empirical photometric redshift estimation techniques is the difficulty in applying them outside of the regions of parameter space that are well sampled by the training data: while the estimator ought to be able to interpolate within the training regime, extrapolating beyond is much more problematic. Ideally, we would like to be able to train our estimator on bright galaxies and then confidently apply it to faint galaxies.

We can improve the ANN's prospects by careful preselection of the data set. The LRGs are a very uniform sample with respect to spectral types, since these early‐type galaxies show little spectral evolution with redshift; this might be expected to make extrapolation a more manageable task. To assess the effectiveness of ANNz in this situation, the LRG sample was split roughly in half by imposing a magnitude cut at r = 18.5. The brighter subsample was further divided at random into training and validation sets of size 5000 and 2000 galaxies, respectively. A committee of five 5∶10∶10∶1 networks was trained on this data and then applied to the remaining ∼6000 LRGs (for which the limiting magnitude is r ≈ 19.6).

The results are shown in Figure 6. The overall dispersion is σ_rms = 0.0327, which represents only a slight loss of accuracy when compared with results using a LRG training set selected over all magnitudes (σ_rms = 0.0294). Thus, in this particular case, ANNz is able to extrapolate with some success to around a magnitude fainter than is sampled by the training data.

**Fig. 6.—** Results from training networks on LRGs with r<18.5, but applied to LRGs with strictly r>18.5 (note the change of intercept of the axes). The limiting magnitude for the LRGs is r ≈ 19.6.

4. CONCLUSIONS

In appropriate circumstances, ANNz is a highly competitive tool for photometric redshift estimation. However, it does rely on the existence of a sufficiently large training set that is representative of the particular populations being studied. The package's utility therefore lies particularly with large photometric surveys such as the SDSS, GOODS (Dickinson et al. 2001), or the VIRMOS‐VLT Deep Survey (Le Fevre et al. 2003), some of which include spectroscopic surveys for subsets of the photometric catalogs (for example, of the eventual 100 million photometric objects that the SDSS expects to catalog, 1 million will also have spectroscopy, and hence accurate redshifts).

A major problem for empirical photometric redshift estimators is the difficulty in extrapolating to regions of the input parameter space that are not well sampled by the training data. Care should be taken to match the training data to the target sample as closely as possible in terms of the magnitude and color distributions of each. Use of an evaluation set is essential when applying ANNz to a new data set: the good performance demonstrated here on the SDSS data cannot be guaranteed on different data sets.

A potential solution to the problem of obtaining training sets when spectroscopy is difficult to obtain is to use simulated catalogs as training data (e.g., Vanzella et al. 2004). Since this requires the use of theoretical SEDs, it introduces the disadvantages of the template‐based methods, such as the need for precise calibration. However, the ANN approach has advantages over standard template‐matching: simulated catalogs can contain galaxies representing a large range of complex star formation histories, dust extinction models, metallicities, etc., giving fully Bayesian statistics, and ANNs allow much more flexible weighting to be applied to the filters than is possible with the simple χ² weighting of standard template matching.

We acknowledge help and advice from Stefano Andreon, Andrew Firth, Rachel Somerville, and Elizabeth Stanway. The ANN training program is based on code kindly provided by B. D. Ripley. A. A. C. is supported by an Isle of Man Department of Education Postgraduate Studies Grant. O. L. acknowledges a PPARC Research Senior Fellowship.⁵

ANNz: Estimating Photometric Redshifts Using Artificial Neural Networks

Article metrics

Permissions

Author affiliations

Author notes

Dates

ABSTRACT

1. INTRODUCTION