BLIND EXTRACTION OF AN EXOPLANETARY SPECTRUM THROUGH INDEPENDENT COMPONENT ANALYSIS

I. P. Waldmann; G. Tinetti; P. Deroo; M. D. J. Hollis; S. N. Yurchenko; J. Tennyson

doi:10.1088/0004-637X/766/1/7

1. INTRODUCTION

The field of exoplanetary spectroscopy is as rapidly advancing as it is new. It has come from the first detection of spectroscopic features in an exoplanetary atmosphere (Charbonneau et al. 2002), to an ever more detailed characterization of a variety of targets. (e.g., Agol et al. 2010; Beaulieu et al. 2010, 2011; Berta et al. 2012; Charbonneau et al. 2005, 2008; Deming et al. 2005, 2007; Grillmair et al. 2008; Knutson et al. 2007; Snellen et al. 2008, 2010a, 2010b; Brogi et al. 2012; Bean et al. 2011; Stevenson et al. 2010; Swain et al. 2008, 2009a, 2009b; Tinetti et al. 2007, 2010; Crouzet et al. 2012). The aim to characterize smaller and smaller planets is equally a quest for higher and higher precision measurements, which are often limited by the systematic noise associated with the instrument with which the data are observed. This is particularly true for general, non-dedicated observatories. In the past, parametric models have extensively been used by several teams to remove instrument systematics (e.g., Agol et al. 2010; Beaulieu et al. 2010, 2011; Brown et al. 2001; Burke et al. 2010; Charbonneau et al. 2005, 2008; Deming et al. 2007; Désert et al. 2011; Gibson et al. 2011; Grillmair et al. 2008; Knutson et al. 2007; Pont et al. 2008; Sing et al. 2011; Swain et al. 2008, 2009a). Parametric models approximate systematic noise via the use of auxiliary information of the instrument, the so called optical state vectors (OSVs). Such OSVs often include the X- and Y-positional drifts of the star or the spectrum on the detector, the focus and the detector temperature changes, as well as positional angles of the telescope on the sky. By fitting a linear combination of OSVs to the data, the parametric approach derives its systematic noise model. We refer to this as the "linear, parametric" method. In the case of dedicated missions, such as Kepler (Borucki et al. 1996; Jenkins et al. 2010), the instrument response functions are well characterized in advance and conceived to reach the required 10⁻⁴ to 10⁻⁵ photometric precision. For general purpose instruments not calibrated to reach this required precision, poorly sampled optical state vectors or a missing parameterization of the instrument often become critical issues. Even if the parameterization is sufficient, it is often difficult to determine which combination of these OSVs may best capture the systematic effects of the instrument.

Given the intricacies of a parametric approach, several groups have worked toward alternative methods to decorrelate the data from instrumental and stellar noise. The issue of time-correlated systematics in exoplanetary time series was discussed by Pont et al. (2006). Carter & Winn (2009) developed a wavelet-based, non-parametric, detrending of 1/f noise contaminated lightcurves. Thatte et al. (2010) proposed a selective principal component filtering to reduce instrument and telluric systematics. More recently, Gibson et al. (2012, hereafter G12) presented a non-parametric detrending approach based on Gaussian Processes (GP; Rasmussen & Williams 2006). GP, as implemented by G12, belong to the class of non-parametric, supervised machine learning algorithms. They use the observed data and OSVs to derive an optimal, non-linear systematic noise model for the observed data.

In Waldmann (2012, hereafter W12), we proposed independent component analysis (ICA; Hyvärinen 1999) as an effective way to de-correlate the exoplanetary signal from the instrument and stellar noise components. ICA belongs to the category of unsupervised machine learning algorithms (i.e., the algorithm does not need to be trained prior to use). It does not require auxiliary information such as OSVs but only the observed data themselves. This approach is also known as blind decorrelation as the assumptions on the system are minimal. As described later on, ICA assumes a linear combination of independent components (ICs) to form its systematic noise model.

The issue of poorly constrained parameter spaces is in fact not new in astrophysics and has given rise to an increased interest in blind-source separation algorithms. Cosmological and extragalactic observations, in particular, are often analyzed through fully blind, non-parametric methods and ICA has successfully been used to separate the cosmic microwave background (CMB) or the signatures of distant galaxies from their galactic foregrounds (e.g., Chapman et al. 2012; Stivoli et al. 2006; Maino et al. 2002, 2007; Wang et al. 2010). Aumont & Macías-Pérez (2007), for instance, can separate the instrumental noise from the desired astrophysical signal by using ICA.

In W12, the ICA approach was tested by presenting two single, decorrelated light-curves of the primary transit of HD189733b and XO1b obtained with Hubble/NICMOS in its spectroscopy setting. In this paper, we apply the ICA detrending method to the extraction of an exoplanetary spectrum; we then compare the results obtained to the previous ones published in the literature and discuss the advantages and limits of this technique for analyzing exoplanetary spectra and light-curves.

1.1. The HD189733b Hubble/NICMOS Data-set

As explained in the introduction, the main focus of our paper is the application and critical discussion of the ICA method to detrend time-resolved spectroscopic data. We chose the transit spectrum of the hot-Jupiter HD189733b recorded by Hubble/NICMOS as a testbed, the main reason being that these data have been analyzed by other teams in the past and the results of the analysis have been quite debated in the literature.

The NICMOS data were first published by Swain et al. (2008, hereafter S08) and then re-analyzed by Gibson et al. (2011) using a similar linear parametric detrending approach. Gibson et al. (2011), failing to retrieve the transmission spectrum reported by S08, attributed the discrepancy to a high degree of degeneracy between the systematic noise correction model used and the extracted spectrum. Swain et al. (2011) argued, though, that this discrepancy is due to poorly derived OSVs by Gibson et al. (2011). A more recent re-analysis of the data by G12 using non-parametric, non-linear Gaussian Processes, found a solution consistent with that of S08, but with error bars on average 2.4 times as large.

Given these debates, it becomes critical to understand how far we can push the analysis when no prior or auxiliary information for the instrument is assumed.

2. DATA ANALYSIS

In this section, we briefly introduce the ICA technique and we apply it to the primary transit data of HD189733b, as recorded by Hubble/NICMOS.

2.1. Independent Component Analysis

Let us assume that we have multiple, simultaneous observations of a mixture of signals. In time-resolved spectroscopy (spectrophotometry), we obtain a light-curve signal of the exoplanetary transit per resolution element of the spectrograph. Here, each measured time series is denoted by x_k(t), where k is its index in the individual time series and N is the total number of time series. The measured signal can be assumed to be the sum of the astrophysical light-curve signal, s_a(t), the instrumental and stellar noise sources, s_sn(t) and the white noise, s_wn(t). So

$\begin{eqnarray} x_{k}(t) &= & a_{k,1} s_{a}(t) + s_{{\rm wn}}(t) + a_{k,2} s_{{\rm sn}1}(t)\nonumber\\ && + a_{k,3} s_{{\rm sn}2}(t) + \cdots + a_{k,N_{{\rm sn}}} s_{{\rm sn}}(t), \end{eqnarray} \tag{ 1 }$

or as sum of vectors (the time-dependence has been dropped for clarity)

$\begin{equation} {x}_{k} = a_{k,1}{s}_{a} + a_{k,2}{s}_{{\rm wn}} + \sum _{l=3}^{N_{{\rm sn}}} a_{k,l} {s}_{{\rm sn}}, \end{equation} \tag{ 2 }$

where l is the estimated source signal index. For non-overcomplete sets, we have as many individual source signals as time series, i.e., k = l. Assuming that only one source signal is astrophysical and one is Gaussian per time series, we can state the total number of source signals to be N = N_sn + 2, where N_sn is the number of systematic noise source signals.

It is worth noting that for overcomplete sets, we have more time series available than source signals contained within, i.e., k > l. In these cases, we can reduce the dimensionality of the data set (for example using principal component analysis) or select a subset of estimated source signals given some selection criteria (W12).

We can also express Equation (1) in matrix form as

$\begin{equation} \bf {x}=\bf {A}\bf {s}, \end{equation} \tag{ 3 }$

where x is the column vector containing the measured time series, x_k, i.e., x = (x₁, x₂, ..., x_N)^T, and s is the column vector of independent source signals, s = (s_a, s_wn, s_sn1, s_sn2, ..., s_N)^T. We may also write s = s_a + s_wn + s_sn to clearly differentiate between the astrophysical, white noise, and systematic components. A is the N × N dimensional "mixing matrix" comprised of the weights a_{k, l}.

The motivation of ICA is to estimate A without prior knowledge of A or s (Hyvärinen 1999). This is achieved by making the stringent assumption that the source signals composing s, s_k, are statistically independent from each other. Such an assumption is valid, as one expects the astrophysical light-curve signal to be independent in origin from systematic instrumental noise. Hence, ICA algorithms attempt to de-compose observed signals, x, to a set of independent source components, s. To maximize said independence, several approaches have been proposed (for a comprehensive summary: Hyvärinen et al. 2001; Comon & Jutten 2000). Here, we follow W12 and use a variant of the FastICA algorithm, which maximizes the statistical independence of the estimated source signals s_l by maximizing the nongaussianities of their respective probability distributions (Hyvärinen 1999; Hyvärinen et al. 1999; Hyvärinen & Oja 2000; Hyvärinen et al. 2001; Koldovský et al. 2006; Comon & Jutten 2000).

A linear combination of individual source signals, as in Equations (1) and (2), is hence a valid assumption as each component is assumed to be fully independent. However, a few subtleties regarding this approach are worth mentioning. Only the signals that are common to all time series, x_k(t), can realistically be de-convolved. Taking Hubble/NICMOS as example, systematics introduced by the grism are easily detectable, as these are "common" to all of the dispersed wavelengths. Similarly, detector-wide flat-fielding gradients are detrendable. On the other hand, localized inter-pixel variations are not represented in all of the time series and would not be decorrelated with ICA. These properties lead to the discussion of "global" and "local" noise models later on in the text. Whilst the effect of limb-darkening is reduced in the infrared, it is true that wavelength varying limb-darkening coefficients can impair the direct extraction of the lightcurve feature. However, as described in the next section, the systematic noise model makes no direct use of the astrophysical component, s_a, and hence circumvents this potential limitation.

2.2. Application to HD189733b

The primary transit of HD189733b was observed using HST/NICMOS in the G206 grism setting and spanning five consecutive orbits. The selected grism covers the spectral range of ∼1.51–2.43 μm, see S08. The HST-pipeline calibrated data were downloaded from the MAST³ archive and the spectrum was extracted using both standard IRAF⁴ routines as well as a custom built routine for optimal spectral extraction. Both extractions are in good accord with each other, but the custom built routine was found to yield a better signal to noise and was subsequently adopted for all further analysis. In order to minimize the inter- and intra-pixel variability of the NICMOS detector, the instrument was slightly de-focused to a full-width-half-maximum (FWHM) of ∼5 spectral channels per resolution element. This sets a limit on the maximum resolution, R, achievable.

ICA is limited by the amount of Gaussian noise present in the data (Hyvärinen & Oja 2000). We found the minimal binning of 5 channels to be too low in signal-to-noise ratio (S/N) and used a binning of 8 spectral channels (∼0.09 μm). Several data pre-processing methods exist to decrease the Gaussian component of time series data (e.g., kernel smoothing, low pass filters, wavelet based approaches, etc.), but we decided to interfere as little as possible with the original data and opted for a slightly coarser binning instead. This resulted in 11 light curves across the G206 grism band. We found the first of the 5 orbits to be very noisy and to negatively impact the efficiency of the algorithm and excluded the first orbit from all further analysis. An example of the "raw" light-curves' quality, at ∼2.33 μm, can be found in Figure 1.

**Figure 1.** Raw light-curve at ∼2.33 μm (black crosses), and its respective systematic noise model (red squares), *m_k*(t), composed out of the systematic components in Figure 2. The de-trended final light-curve is shown underneath (blue circles) with a Mandel & Agol (2002) fit overlaid.
Download figure:
Standard image High-resolution image

As described in W12, we used the extracted light-curves as input to the ICA algorithm to calculate the mixing matrix, A, and its (pseudo)inverse, the de-mixing matrix W = A⁻¹. Once the de-mixing matrix had been determined, the algorithm tested the estimated components for their nongaussianity and returned four main systematic noise components that do not correlate with the expected light-curve morphology. These components, comprising s_sn, were extracted over the entire spectral range of the grism (referred to as "global" below) and showed a good degree of separation (Figure 2, left side). ICA estimates the mixing matrix A up to a sign and scaling factor, meaning the source signals, s, are recovered but lack an overall scaling constant. In an analogy to principal component analysis, we can think of this as recovering the eigenvectors but not the eigenvalues. Due to this ambiguity, we need to determine the scaling of the individual systematic noise components, per time series x_k, separately. This is done by fitting a systematic-noise-model (SNM), m_k, to the out-of-transit data of each time series x_k, where m_k is the sum of all of the scaled systematic noise components in s_sn, i.e., m_k = ∑^N_sn_{l = 1}o_{k, l}s_{sn, l}, where o_{k, l} is the scaling factor for the systematic noise signal s_{sn, l} for a given time series k. A Nelder–Mead minimization algorithm (Press et al. 2007) was used to fit for o_{k, l}. The scaling amplitude of each component for each light-curve is given in Figure 2 (right side). Once m_k is determined, we subtract it from the raw data to get the corrected time series y_k = x_k − m_k, see Figure 1 for an example.

**Figure 2.** Left: four retrieved nongaussian systematic noise components in the order of importance. They were computed over the whole spectral range of the G206 grism and describe the systematic noise (instrumental and/or stellar) common to all spectral channels. Right: scaling factors, o_{k, l}, of the systematic noise components on the left. The color coding is identical for both plots. We can see that the first component at 2.06 μm is sharply deviating from its own pass-band mean and the mean of all components at 2.06 μm. This can indicate that the "global" systematic noise model does not describe well the systematics in this channel and could over-correct.
Download figure:
Standard image High-resolution image

2.3. Lightcurve Fitting and Error Bars

Having obtained the de-trended time series, y_k, we proceed to model-fit these using the analytical model by Mandel & Agol (2002, from here MA02) with orbital and limb-darkening parameters taken from S08. The S08 quadratic limb-darkening parameters are interpolated to the coarser wavelength grid of this analysis. The transit depth, δ_k, is left as the only free parameter. The transit depths of all 11 light-curves constitute the exoplanetary spectrum. The model-fitting was performed using a Markov Chain Monte Carlo (MCMC) algorithm and cross checked using two variants of a Bootstrap Monte Carlo analysis (see the Appendix).

2.3.1. Markov Chain Monte Carlo

MCMC (Press et al. 2007) has become the standard fitting routine for exoplanetary time series and radial velocity data (e.g., Ford 2006; Burke et al. 2007; Bakos et al. 2007; Knutson et al. 2007; Cameron et al. 2007; Charbonneau et al. 2009; Bean et al. 2011; Kipping & Bakos 2011; Gregory 2011; Crouzet et al. 2012; Knutson et al. 2012). In this analysis, we use an adaptive version of the standard Metropolis–Hastings algorithm (Haario et al. 2001, 2006; Metropolis et al. 1953; Hastings 1970; Press et al. 2007). The only free parameter, δ_k, was set to have a uniform prior ranging from R_p/R_* = 0–1. As we only fit for the transit depth, we are not concerned by inter-parameter correlations in this analysis.

We perform an initial MA02 model fit, m_{MA02, k}(t), using a Nelder–Mead minimization algorithm (Press et al. 2007) and calculate the model subtracted residual, r_k(t) = x_k(t) − m_{MA02, k}(t). We take the fitted transit depth as the starting value of the MCMC chain and calculate the variance of the normal sampling distribution as

$\begin{equation} \sigma _{k,{\rm sample}}^{2} = {\rm var}(r_{k}(t)) + 2 \sum _{\tau =1} {\rm cov}[r_{k}(t),r_{k}(t+\tau)], \end{equation} \tag{ 4 }$

where var is the variance of the residual and cov the auto-covariance for a given lag τ. This accounts for the remaining autocorrelated noise in the time series data. The MCMC algorithm was consequently run for 2 × 10⁴ iterations to guarantee a good coverage of the posterior distribution, of which examples are shown in Figures 6(A) and 7(A). Here, the error bars are the standard deviation of the posterior distribution.

In addition to the MCMC algorithm described here, we also estimated the standard error using two variants of a Bootstrap Monte Carlo algorithm, see the Appendix. We find the retrieved transit-depths and errors to be in good agreement with the MCMC method.

2.3.2. Source Signal Separation Error

The two core algorithms used in this analysis, EFICA (Koldovský et al. 2006) and WASOBI (Yeredor 2000; Tichavský et al. 2006b), can be shown to be asymptotically efficient, i.e., to reach the Cramer–Rao Lower Bound (CRLB) in an ideal case where the nonlinearity G(.) equals the signal's score function. In other words, the algorithms employed here can be shown to converge to the correct solution given the original source signals and in the limit of N_iter → ∞ iterations.

In reality, the number of iterations is finite, and imperfect convergence results in traces of other sources remaining in the individual signals comprising s. We can hence state that the estimated de-mixing matrix, W, is only approximately equal to the inverse of the original mixing matrix, A, i.e.,

$\begin{equation} \bf {W} \simeq \bf {A}^{-1}. \end{equation} \tag{ 5 }$

This requires us to calculate the signal separation error (SSE) of the analysis. A measure of this error is the deviation of WA from the unity matrix by inspecting the variance of its elements (Koldovský et al. 2006; Hyvärinen et al. 2001).

To assert a good degree of separation, we can define G as the gain matrix. For a perfectly estimated de-mixing matrix, W, the gain matrix is equal to its identity matrix

$\begin{equation} \textbf {G} = \textbf {W}\textbf {A} = \textbf {I}. \end{equation} \tag{ 6 }$

In signal processing, the performance of blind-source separation algorithms is usually measured by the signal over inference ratio, SIR.⁵ The SIR is the standard measure in signal processing of how well a given signal has been transmitted or deconvolved from a mixture of signals. Taking the inference to be a noise source, we can equate this to the more commonly used S/N. We can now calculate the separation error of the estimated source signal, s_l, in relation to the original source signal, s_k, using

$\begin{equation} \sigma _{l} = \frac{E\left[\sum ^{N}_{l=1,l \ne k} \textbf {G}^{2}_{kl}\right]}{E[\textbf {G}^{2}_{kk}]}, \quad k,l = 1,2,\ldots ,N. \end{equation} \tag{ 7 }$

However, the original mixing matrix, A, and the original source signals, s_k, are not generally known for real data sets and Equation (7) is only useful in the case of simulations. Tichavský et al. (2006a), Koldovský et al. (2006), and Tichavský et al. (2008) have shown that despite the original mixing matrix being unknown, an asymptotic estimate of the SIR can be made. A derivation of this process is beyond the scope of this paper and we refer the interested reader to the relevant literature.

We now have the retrieval error on the individual source components, σ_l. In order to obtain the overall error on the systematic noise model, m_k, calculated in Section 2.2, we compute the weighted sum of the individual source separation errors with the previously retrieved source component weighting factors, o_{k, l}

$\begin{equation} \sigma _{k,{\rm SSE}} = \left(\sum _{l = 1}^{Nsn} o_{k,l}^{2}\sigma ^{2}_{l}\right)^{1/N_{{\rm sn}}}. \end{equation} \tag{ 8 }$

2.3.3. SNM Fitting Error

In addition to the above determined errors, we also include an ICA fitting error that accounts for possible over-corrections of the global SNM to individual, poorer constrained light-curves. This term becomes non-zero when the scaling of a systematic noise component, o_{k, l} (Figure 2, right-hand side), shows a 3σ significant deviation from the mean scaling of all of the other light-curves, $\bar{o}_{k}$ . In other words, we expect the scaling of an individual systematic noise component, s_sn, to be a slowly varying function over wavelength for "globally" estimated systematic-noise components. If individual light-curves show a significantly larger positive or negative scaling than expected, for an individual lightcurve, we can assume the nongaussian noise of the affected light-curve not to be properly accounted for by this global model. In larger data-sets, it is easier to exclude the affected light-curve from any further analysis, while in small data-sets we take the amplitude of the scaling from its mean scaling as the error, i.e.,

$\begin{equation} \sigma _{k,{\rm SNM}} = |(o_{k,l} - \bar{o}_{k})|. \end{equation} \tag{ 9 }$

In this analysis, we find the ICA fitting error to be zero for all of the wavelengths but the 2.06 μm spectral point.

2.3.4. Final Error Bar

In summary, the final error bar per time series, k, consists of:

1.
Standard error. Estimating the variance in retrieved transit depth when model fitting the de-trended lightcurves.
2.
Signal separation error. Estimating the ICA component separation error.
3.
Systematic noise model error. Estimating errors due to noise model over- or under-fitting for individual time series.

We now define the final error to be the sum of squares of the above mentioned error sources,

$\begin{equation} \sigma _{k,{\rm TOTAL}} = \sqrt{\sigma _{k,{\rm MCMC}}^{2}+ \sigma _{k,{\rm SSE}}^{2} + \sigma _{k,{\rm SNM}}^{2}}. \end{equation} \tag{ 10 }$

3. RESULTS

Figure 1 shows the raw light-curve at ∼2.33 μm in black crosses with its corrected counterpart (blue circles) offset below. Here, much of the autoregressive noise in the original data could be captured by the noise model (red squares) and removed from the final result. All of the raw and corrected light-curves are shown in Figure 3 with their respective systematic noise models. The resulting spectrum is presented in Figure 4 and Table 1. We find the retrieved spectrum to be in good agreement with the "parametric" analysis of S08 and G12 in terms of spectral shape, which show-cases the robustness of this methodology and the stability of the results as a whole.

**Figure 3.** Left: raw light-curves from 1.51 μm (bottom) to 2.43 μm (top) with fitted Mandel & Agol (2002) model overlaid. The lightcurves were off-set for clarity. Middle: Systematic noise model for each raw lightcurve in the left panel. Right: final de-trended light-curves with fitted Mandel & Agol (2002) model overlaid.
Download figure:
Standard image High-resolution image

Table 1. NICMOS Transmission Spectrum of HD189733b for a "Global" Systematic Noise Model Correction and Plotted in Figure 4.

λ	R_p/R_s	σ(R_p/R_s) × 10⁻⁴
(μm)	R_p/R_s	σ(R_p/R_s) × 10⁻⁴
2.429	0.155187	3.69383
2.336	0.155125	3.03754
2.244	0.155437	2.83050
2.152	0.155000	3.01554
2.060	0.155200	12.29019
1.968	0.155187	2.94831
1.876	0.155375	3.09120
1.784	0.155187	3.05973
1.691	0.154250	2.54556
1.599	0.154000	3.75548
1.507	0.156500	3.28963

Note. The columns are wavelength (λ), planet-star-ratio (R_p/R_s) and the respective error-bar (σ(R_p/R_s)).

Download table as: ASCII Typeset image

The underlying noise of the spectral point at ∼2.06 μm was flagged by the algorithm to be discrepant with the global systematic noise model. This can also be seen by the higher systematic noise remaining in the corrected lightcurve (5th from the top in Figure 3). Here, the first systematic noise component is indicative of an overcorrection that is reflected in the error-bar as described in the previous section. Overall, the error-bars reported here are ∼10%–30% larger than those reported by S08, but ∼10%–50% smaller than those reported by G12. It should be noted that this analysis uses a slightly coarser bin size yielding 11 data points, while the S08 and G12 analyses yield 18 spectral points.

4. DISCUSSION

In this analysis, we have computed the global SNM and found a good agreement with previously published results. Figure 4 shows the comparison between the spectrum derived in this analysis, compared to the linear, fully parametric approach by S08 and the non-linear, non-parametric, Gaussian Processes approach by G12. Both non-parametric approaches yield slightly higher error bars than the parametric approach. We find the error on the ICA derived spectrum to be ∼10%–30% bigger than those reported by S08 using a coarser bin size of ∼0.09 μm. These differences in error bars between the "blind" and "informed" (meaning the use of auxiliary information of the instrument) approaches are not surprising. By not assuming any knowledge of the data or instrument, we are actively neglecting auxiliary information helpful to the de-trending of the data set.

With the linear, non-parametric blind ICA method presented here, the uncertainties grow by up to 30 ∼ 40% compared to the linear, parametric analysis by S08 (accounting for the larger bin sizes in this analysis), and a further ∼70% when further relaxing the linear assumptions as for the non-linear, non-parametric Gaussian Processes (G12). In other words, we are trading smaller error-bars for a higher degree of objectivity.

In Figure 5, we show a comparison of the ICA derived nongaussian systematic noise components and parametric OSVs. S08 identified the X- and Y-positional drifts on the detector to constitute the most important OSVs in their decorrelation while other OSVs are less significant. Similar independent components are also present in this analysis, while the other independent components differ more significantly from the parametric approach. The agreement between the results of this analysis and those available in the literature showcases the stability of this exoplanetary spectrum.

We find the global SNM approach to be most sensitive to slowly varying systematic trends across the data-set, while local nongaussian deviations tend not to be captured. It is therefore possible to generate, additionally, a "local" SNM for a subset of spectral bins or before binning on the individual "raw" spectral channels, should the S/N permit it. It is important to remember that k ⩾ N_sn + 1 as the input to the algorithm. In other words, at least as many observed time series, x_k, are required as input to the algorithm than the total number of nongaussian components in the data. In this case, the minimum "local" SNM would include 5 spectral bins (4 systematic noise and 1 astrophysical component). With 11 spectral bins in total, the NICMOS data-set is too small for this approach. However, for larger sets, this two stage "global" + "local" detrending becomes a viable solution. Furthermore, multiple observations of a transit/eclipse event with the same instrumental setup can be helpful to further decorrelate the observed data. We expect, in fact, the astrophysical signal to be stable throughout consecutive transits, while the instrumental noise will be largely uncorrelated between observations (Waldmann et al. 2012).

5. CONCLUSION

Here, we present a reanalysis of a HD189733b primary eclipse spectrum from ∼1.51–2.43 μm obtained with Hubble/NICMOS. This analysis differs from previous publications in that it uses blind machine-learning to detrend data with no prior or auxiliary information about the data or instrument. Such blind-source de-convolution algorithms can be used alone or in conjunction with other decorrelation techniques, making them very powerful tools in the analysis of exoplanetary data. This is especially true for instruments that lack an a priori calibration plan at the level of 10⁻⁴ photometric precision, as needed for this field.

We compare our results with previously published analyses of the same data set: another non-parametric, non-linear approach (G12) and the classical linear parametric method (S08). We find that the error-bars of this analysis are ∼10–30% larger than those reported by S08. We attribute this difference to the higher amount of auxiliary information injected in the parametric approach. Ultimately, it is a trade-off between a higher degree of objectivity for the non-parametric methods and smaller errors for the parametric detrending. Additional observations would have allowed much smaller error bars and a more robust determination of the signal at ∼2.06 μm.

The fact that three very different analysis techniques yield comparable spectra is a strong indication of the stability of these results. The error bars estimated in this paper through ICA are smaller than the ones of Gibson et al. (2012) through Gaussian Processes, suggesting that more investigation is needed to identify the most effective techniques to detrend exoplanet atmosphere data and, most importantly, understand their limitations.

The authors thank Mark Swain and Filipe Abdalla for helpful discussions. This work is supported by ERC Advanced Investigator Project 267219, STFC, NERC, UKSA, UCL and the Royal Society. All of the data presented in this paper were obtained from the Multimission Archive at the Space Telescope Science Institute (MAST). STScI is operated by the Association of Universities for Research in Astronomy, Inc., under NASA contract NAS5-26555. Support for MAST for non-HST data is provided by the NASA Office of Space Science via grant NNX09AF08G and by other grants and contracts.

APPENDIX: BOOTSTRAP MONTE CARLO

Following from Section 2.3.1, an alternative way of estimate a parameter's sampling distribution is to use the so called "bootstrap" or "jacknife" Monte Carlo methods (Marcy et al. 2005; Press et al. 2007; Baluev 2009, 2012). These methods estimate a parameter's distribution by replacing parts of the original data with a randomly permeated version of the data to observe the effect on a consequent model fit. For a given iteration of the algorithm, we model-fit the time series and randomly scramble the model-subtracted residual. We then add the scrambled residual back to the original model-fit and repeat the process. Please see Appendix A.1 for details of the iteration scheme. As for the MCMC algorithm, we ran the bootstrap Monte Carlo for 2 × 10⁴ iterations and examples of the parameter distributions are found in Figures 6(B) and 7(B). It is worth noting that the scrambling process destroys any autocorrelation in the data as well as homogenizes nongaussian systematics in the data.

**Figure 6.** Posterior distribution of fitted transit depth at 2.4291 μm using: (A) MCMC algorithm; (B) Bootstrap Monte Carlo, method 1; (C) Bootstrap Monte Carlo, method 2. The sample median is marked with a solid red line, the 1σ error bars are marked with dashed red lines. (D) Cumulative distribution functions of sampling distributions in plots A, B, and C.
Download figure:
Standard image High-resolution image

**Figure 7.** Posterior distribution of fitted transit depth at 1.9683 μm using: (A) MCMC algorithm; (B) Bootstrap Monte Carlo, method 1; (C) Bootstrap Monte Carlo, method 2. The sample median is marked with red, continuous line, the 1σ error bars are marked with red-discontinuous lines. (D) Cumulative distribution functions of sampling distributions in plots A, B, and C.
Download figure:
Standard image High-resolution image

An approach to directly measure the effects of autocorrelation in the data, the "prayer bead" algorithm, has been suggested (Jenkins et al. 2002; Southworth 2008). Whereas bootstrap methods randomly replace parts of the original data, the "prayer bead" algorithm shifts the model subtracted residual along the time-axis for every iteration. Given the small number of data points available in this analysis and the duration of the systematics being on the same or similar time scales to the transit signal, we refrain from using this method. Instead, we re-run the above bootstrap Monte Carlo method with the modification of only replacing a randomly sized fraction, ranging from 40% to 100%, of the data. This modification preserves parts of the autocorrelation while the high iteration number of 2 × 10⁴ ensures a sufficient sampling. The iteration scheme is described in Appendix A.2, and Figures 6(C) and 7(C) are examples of the transit-depth sampling distributions.

A.1. Method 1

1.
Set y_c = y_k, where c is the bootstrap iteration index.
2.
Using a Nelder–Mead minimization and the MA02 model, evaluate and record the best fit transit-depth, δ_c.
3.
Compute the model subtracted residual, r_c = y_c − m_MA02(δ_c), where m_MA02(δ_c) is the MA02 model with the fitted transit depth.
4.
Randomly scramble the residual to obtain r_{c, permutated}.
5.
And add the scrambled residual back on the model to obtain the new time series y_{c + 1} = m_MA02(δ_c) + r_{c, permutated}.
6.
Steps 2–5 are repeated N_boot times.

A.2. Method 2

In the second method, we follow the procedural sequence of Method 1 but only

1.
Set y_c = y_k, where c is the bootstrap iteration index.
2.
Using a Nelder–Mead minimization and the MA02 model, we evaluate and record the best fit transit-depth, δ_c.
3.
Compute the model subtracted residual, r_c = y_c − m_MA02(δ_c), where m_MA02(δ_c) is the MA02 model with the fitted transit depth.
4.
Randomly scramble the residual to obtain r_{c, permutated}.
5.
Randomly replace a fraction of the original residual, r_c, with the permutated residual, r_{c, permutated}. This fraction is chosen at random but held to be within 40–100% of the original data. We call this semi-permutated residual r_{c, semi}.
6.
Add the above residual back on the model to obtain the new time series y_{c + 1} = m_MA02(δ_c) + r_{c, semi}.
7.
Steps 2–6 are repeated N_boot times.

BLIND EXTRACTION OF AN EXOPLANETARY SPECTRUM THROUGH INDEPENDENT COMPONENT ANALYSIS

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

1.1. The HD189733b Hubble/NICMOS Data-set