1 Introduction

Single top quarks are produced in proton–proton collisions via the weak interaction. At leading order (LO) three different channels, which depend on the virtuality of the \(W\) boson involved, are defined: t-channel, s-channel or top-quark production in association with a \(W\) boson, called \(tW\) production. These processes, for which example Feynman diagrams are shown in Fig. 1, involve a Wtb vertex at LO in the Standard Model (SM). Calculations involving \(tW\) production beyond LO have to include quantum interference with \(t\bar{t}\) production. Measurements of single-top-quark cross-sections are used to study the properties of this vertex, as they are directly sensitive to the Cabibbo–Kobayashi–Maskawa (CKM) matrix element \(\vert V_{tb} \vert \). Deviations from the cross-sections predicted by the SM can originate from single top quarks produced with similar kinematics in the decays of unknown heavy particles predicted by physics beyond the Standard Model. If the masses of these particles are beyond the reach of direct searches, they might be revealed through their effects on the effective \(Wtb\) coupling [1]. Using measurements in all three channels of single top-quark production, physics beyond the SM can be probed systematically in the context of Effective Field Theory [2]. As each of the single-top-quark processes can be sensitive to different sources of new physics, it is also important to study each channel separately. In addition, the SM production of \(tW\) is an important background in direct searches for particles beyond the SM [3, 4].

Fig. 1
figure 1

Example LO Feynman diagrams of single top-quark production: a t-channel, b s-channel and c \(tW\) production

At the Large Hadron Collider (LHC), evidence for the \(tW\) production process was found by the ATLAS [5] and CMS Collaborations [6] at \(\sqrt{s} ={7}\,{\text {TeV}}\) and the process was observed by both experiments [7, 8] at \(\sqrt{s} ={8}\,{\text {TeV}}\). The \(tW\) cross-section has been also measured with 13 \(\text {TeV}\) collision data inclusively by the CMS Collaboration [9] as well as inclusively and differentially by the ATLAS Collaboration [10,11,12]. These measurements were performed in final states with two leptons, and the measured cross-sections agree with the theoretical expectations.

This paper presents evidence for \(tW\) production in final states with a single lepton using proton–proton (\(pp\)) collisions at \(\sqrt{s}\) = 8 \(\text {TeV}\). This topology features a \(W\) boson in addition to a top quark, which decays mainly into another \(W\) boson and b-quark, leading to a \(W^{+}W^{-}b\) state. In the single-lepton channel, one of the \(W\) bosons decays leptonically (\(W_L \)) while the other one decays hadronically (\(W_H \)). Therefore, the experimental signature of event candidates is characterised by one isolated charged lepton (electron or muon), large missing transverse momentum (\(E_{\text {T}}^{\text {miss}}\)), and three jets with high transverse momentum (\(p_{\text {T}}\)), one of which contains a \(b\)-hadron and is labelled as a \(b\)-tagged jet, \(j_B \). In contrast to the dilepton analyses, the event signature contains only one neutrino, which originates from the leptonic \(W\)-boson decay. Hence, both the \(W\)-boson and the top-quark kinematics can be reconstructed and used to separate the signal from background. The main backgrounds are \(W \!+\!\text {jets}\) and \(t\bar{t}\) events, where the latter background poses a major challenge in this measurement because of its similar kinematics and a ten times larger cross-section compared to the \(tW\) signal. An artificial neural network is trained to separate the signal from the \(t\bar{t}\) background. The cross-section is extracted using a binned profile maximum-likelihood fit to a two-dimensional discriminant. This measurement, performed with \(tW\) single-lepton events, constitutes a cross-check of the previous results published in the dilepton channel.

2 ATLAS detector

The ATLAS experiment [13] at the LHC is a multipurpose particle detector with a forward–backward symmetric cylindrical geometry and a near \(4\pi \) coverage in solid angle.Footnote 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer (MS). The ID provides charged-particle tracking in the pseudorapidity range \(|\eta | < 2.5\). It consists of silicon pixel, silicon microstrip, and transition-radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. An iron/scintillator-tile hadron calorimeter covers the central pseudorapidity range (\(|\eta | < 1.7\)). The endcap (\(1.5<|\eta | <3.2\)) and forward (\(3.1<|\eta | <4.9\)) regions are instrumented with LAr calorimeters for measurements of both EM and hadronic energy. The MS surrounds the calorimeters and includes a system of precision tracking chambers (\(|\eta | < 2.7\)) and fast detectors for triggering (\(|\eta | < 2.4\)). The magnet system for the MS consists of three large air-core toroidal magnets with eight superconducting coils. The field integral of the toroids ranges between 2.0 and 6.0 T m across most of the detector. Collisions producing interesting events are selected for storage with the trigger system [14]. For the data taken at \(\sqrt{s} = {8}\,{\text {TeV}}\), a three-level trigger system was used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information. It reduced the accepted rate to at most 75 kHz. This was followed by two software-based trigger levels that together reduced the accepted event rate to 400 Hz on average, depending on the data-taking conditions.

3 Data and simulated event samples

The data considered in this analysis are from \(pp\) collisions at \(\sqrt{s} ={8}\,{\text {TeV}}\) and were taken with stable LHC beams and the ATLAS detector fully operational, corresponding to an integrated luminosity of 20.2 \(\hbox {fb}^{-1}\).

Monte Carlo (MC) samples were produced using the full ATLAS detector simulation [15] implemented in Geant  4 [16]. In addition, alternative MC samples, used to train the neural network and evaluate systematic uncertainties, were produced using AtlFast2 [15], which provides a faster calorimeter simulation making use of parameterised showers to compute the energy deposited by the particles. Pile-up (additional \(pp\) interactions in the same or nearby bunch crossing) was modelled by overlaying simulated minimum-bias events generated with [17]. Weights were assigned to the simulated events, such that the distribution of the number of \(pp\) interactions per bunch crossing in the simulation matches the corresponding distribution in the data, which has an average of 21 [18].

The \(tW\) signal events were simulated using the next-to-leading order (NLO) method [19,20,21] implemented in the generator (revision 2192) [22] with the CT10 parton distribution function (PDF) set [23] in the matrix-element calculation. The mass and width of the top-quark were set to \(m_{t} = {172.5}\, {\text {GeV}}\) and \(\Gamma _{t} = {1.32} \,{\text {GeV}}\), respectively. The top quark was assumed to decay exclusively into \(W b\). The parton shower, hadronisation and underlying event were simulated using (v6.426) [24] with the LO CTEQ6L1 PDF set [25] and a corresponding set of tuned parameters called the Perugia 2011 (P2011C) tune [26]. The factorisation scale, \(\mu _\mathrm {f}\), and renormalisation scale, \(\mu _\mathrm {r}\), were set to \(m_t\). Calculations involving \(tW\) production beyond LO included quantum interference with \(t\bar{t}\) production. Double-counting of the contributions was avoided by using either the diagram-removal (DR) or the diagram-subtraction (DS) scheme [27, 28]. In the DR scheme, diagrams with a second on-shell top-quark propagator are removed from the amplitude, while in the DS scheme, a subtraction term cancels out the \(t\bar{t}\) contribution to the cross-section when the top-quark propagator becomes on shell. Nominal MC samples were generated using the DR scheme. For the evaluation of systematic uncertainties, alternative samples were generated using the DS scheme, or using or [29], each interfaced with [30]. For the samples, the AUET2 tune [31] with the CT10 PDF was used and the underlying event was generated with [32]. In addition, (v6.427) samples with variations of \(\mu _\mathrm {r}\) and \(\mu _\mathrm {f}\) and the radiation tunes were used. The SM \(tW\) cross-section prediction at NLO including next-to-next-to-leading-log (NNLL) soft gluon corrections [33, 34] was calculated as \(\sigma _{tW}^{\text {th.}}({8}\,{\text {TeV}})=22.4 \pm 0.6\,(\text {scale})\pm {1.4}\,(\text {PDF})~\text {pb}\) assuming a top-quark mass, \(m_t\), of \({172.5} \,{\text {GeV}}\). The first uncertainty accounts for renormalisation and factorisation scale variations (from \(m_t/2\) to \(2m_t\)) and the second term covers the uncertainty in the parton distribution functions, evaluated using the MSTW2008 PDF set [35] at next-to-next-to-leading order (NNLO).

The \(t\bar{t}\) sample was generated with interfaced with (v6.427) [36]. In the event generator, the CT10 PDFs were used, while the CTEQ6L1 PDFs were used for . The \(h_{\text {damp}}\) parameter, which effectively regulates the high-\(p_{\text {T}}\) gluon radiation, was set to \(m_t\). The predicted \(t\bar{t}\) production cross-section, \(\sigma _{t\bar{t}} ({8}\,{\text {TeV}})=252.9^{+6.4}_{-8.6}\,\text {(scale)} \pm 11.7\,(\text {PDF}+\alpha _{\text {s}})\,\text {pb}\), was calculated with the Top++2.0 program to NNLO in perturbative QCD, including soft-gluon resummation to NNLL [37]. The first uncertainty comes from the sum in quadrature of the effects of independently varying \(\mu _\mathrm {r}\) and \(\mu _\mathrm {f}\). The uncertainty associated with variations in the PDFs and strong coupling constant, \(\alpha _S \), was evaluated following the PDF4LHC NLO prescription [38, 39], which defines the central value as the midpoint of the uncertainty envelope of three PDF sets: MSTW2008 NNLO [35], CT10 NNLO [40] and NNPDF2.3 5f FFN [41]. The same procedures as for the \(tW\) samples were employed to determine the uncertainties due to the NLO matching method and the parton shower and hadronisation. Samples to evaluate the scale uncertainties were produced in a similar way, varying \(\mu _\mathrm {r}\) and \(\mu _\mathrm {f}\) together with the Perugia tune, but also adding variations in the \(h_{\text {damp}}\) parameter (for the up-variation, \(h_{\text {damp}}\) was changed to \(2 m_t\), while for the down variation it was kept at \(m_t\)).

The other single-top-quark production processes, s-channel and t-channel, were also generated with coupled to (v6.426), using the same PDF sets as described for the other top-quark processes above. The predicted cross-sections at \(\sqrt{s}\) = 8 \(\text {TeV}\) were calculated at NLO plus NNLL as \(5.6\pm {0.2}\, \text {pb}\) for the s-channel [42, 43], and \(87.8^{+3.4}_{-1.9}\,\text {pb}\) for the t-channel [44, 45] process.

The multi-leg LO generator [46,47,48], together with the CT10 PDF sets, was used to simulate vector-boson production in association with jets. was used to generate the hard process as well as the parton shower and the modelling of the underlying event. Double-counting between the inclusive \(V+n\) parton samples (with \(V=\) \(W\) or \(Z\)) and samples with associated heavy-quark pair production was avoided consistently by using massive \(c\)- and \(b\)-quarks in the shower. The predicted NNLO \(W \!+\!\text {jets}\) cross-section with \(W\) decaying leptonically was calculated as \(\sigma (p{}p\rightarrow \ell ^{\pm }\nu _{\ell }X) = 36.3\pm {1.9}\, \text {nb}\) [49].Footnote 2 For \(Z +\text {jets}\) the cross-section was calculated at NNLO in QCD for leptonic \(Z\) decays as \(\sigma (pp \rightarrow {\ell ^+}{}{\ell ^-}X) = 3.72\pm {0.19}\,\text {nb}\) [49]. The AtlFast2 simulation was used to generate these samples with sufficient statistics. For cross-checks of the \(W \!+\!\text {jets}\) modelling, an alternative sample generated with [50] with up to five additional partons, (v6.426) and the CTEQ6L1 PDFs were used. Diboson samples (\(WW/ WZ/ ZZ\,+\,jets \)) were generated with at LO QCD using the CTEQ6L1 PDF. The theoretical NLO cross-section for events with one lepton is \(29.4\pm {1.5}\,\text {pb}\) [51].

Multijet events are selected in the analysis when they contain jets or photons misidentified as leptons or contain non-prompt leptons from hadron decays (both referred to as a ‘fake’ lepton). This background was estimated directly from data using the matrix method [52], which exploits differences in lepton identification and isolation properties between prompt and fake leptons. The data were processed with a second, ‘loose’ set of lepton selection criteria. The resulting sample was then corrected for efficiency differences between the two sets of cuts, and the contamination from events containing prompt leptons was subtracted. The efficiencies, lepton selection criteria, and uncertainties applied in this analysis are the same as in Ref. [52].

4 Object definitions

Primary vertex (PV) candidates in the interaction region are reconstructed from at least five tracks that satisfy a transverse momentum (\(p_{\text {T}} \)) of \(p_{\text {T}} >{400}\, {\text {MeV}}\). The candidate with the highest sum of \(p_{\text {T}} ^{2}\) over all associated tracks is chosen as the hard-collision PV [53].

Muon candidates are reconstructed by matching segments or tracks in the MS with tracks found in the ID [54]. The candidates must have \(p_{\text {T}} > {25}\,{\text {GeV}}\) and be in the pseudorapidity range \(|\eta | < 2.5\). The longitudinal impact parameter of the track relative to the hard-collision PV, \(|z_{\mathrm {vtx}} |\), is required to be smaller than 2 mm. In order to reject non-prompt muons, an isolation criterion is applied. The isolation variable is defined as the scalar sum of the transverse momenta of all tracks with \(p_{\text {T}} > {1}\,{\text {GeV}}\) (excluding the muon track) within a cone of size \(\Delta R = {10}\,{\text {GeV}} / p_{\text {T}} (\mu )\) around the muon’s direction. It is required to be less than 5% of the muon \(p_{\text {T}}\). The selection efficiency after this requirement is measured to be about 97% in \(Z \rightarrow \mu ^+{}\mu ^-\) events.

Electron candidates are reconstructed from energy deposits (clusters) in the EM calorimeter, which match a well-reconstructed track in the ID [55]. Requirements on the transverse and longitudinal impact parameter of \(|d_{\mathrm {vtx}} | < {1}\,\text {mm}\) and \(|z_{\mathrm {vtx}} | < {2}\,\text {mm}\), respectively, are applied. Electron candidates must have energy in the transverse plane \(E_{\text {T}} > {25}\,{\text {GeV}}\) and \(\left| \eta _{\text {cluster}}\right| < 2.47\), where \(\eta _{\text {cluster}}\) denotes the pseudorapidity of the cluster. Clusters in the calorimeter barrel–endcap transition region, \(1.37< |\eta | < 1.52\), are excluded. An isolation requirement based on the deposited transverse energy in a cone of size \(\Delta R = 0.2\) around the direction of the electron and the \(p_{\text {T}}\) sum of the tracks in a cone with \(\Delta R = 0.3\) around the same direction is applied. This requirement is chosen to give a nearly uniform selection efficiency of 85% in \(p_{\text {T}}\) and \(\eta \), as measured in \(Z \rightarrow e^+{}e^-\) events. Electron candidates that share the ID track with a reconstructed muon candidate are vetoed.

Jets are reconstructed using the anti-\(k_{t}\) algorithm [56, 57] with a radius parameter of \(R=0.4\) using topological clusters [58], calibrated with the Local Cluster Weighting method [59], as input to the jet finding. The jet energy is further corrected by subtracting the contribution from pile-up events and applying an MC-based and a data-based calibration. The jet vertex fraction (JVF) [60] variable is used to identify the primary vertex from which the jet originated. The JVF criterion suppresses pile-up jets with \(p_{\text {T}} < {50}\,{\text {GeV}}\) and \(|\eta | < {2.4}\). To avoid possible overlap between jets and electrons, jets that are closer than \(\Delta R = 0.2\) to an electron are removed. Afterwards, remaining electron candidates overlapping with jets within a distance of \(\Delta R = 0.4\) are rejected. Finally, muons overlapping with jets within \(\Delta R = 0.4\) are removed.

The identification of jets originating from the hadronisation of a \(b\)-quark (\(b\)-tagging) is based on various algorithms exploiting the long lifetime, high mass and high decay multiplicity of \(b\)-hadrons as well as the properties of the \(b\)-quark fragmentation. The outputs of these algorithms are combined in a neural network classifier to maximise the \(b\)-tagging performance [61]. The choice of \(b\text {-tagging}\) working point represents a trade-off between the efficiency for identifying \(b\)-jets and rejection of other jets. The chosen working point for this analysis corresponds to a \(b\text {-tagging}\) efficiency of 70%. The corresponding \(c\)-quark-jet rejection factor is about 5 and the light-quark-jet rejection factor is about 120. These efficiencies and rejection factors were obtained using simulated \(t\bar{t}\) events. The tagging efficiencies in the simulation are corrected to match the efficiencies measured in data [61].

The \(p_{\text {T}} ^{\text {miss}}\) of the event, defined as the momentum imbalance in the plane transverse to the beam axis, is primarily due to neutrinos that escape detection. It is calculated as the negative vector sum of the transverse momenta of the reconstructed electrons, muons, jets and the clusters that are not associated with any of the previous objects (the ‘soft term’) [62]. Its magnitude is denoted \(E_{\text {T}}^{\text {miss}}\).

5 Event selection

Events are required to have a hard-collision primary vertex. They also have to pass a single-lepton trigger requirement [14, 63] and contain at least one electron or muon candidate with \(p_{\text {T}} > {30}\,{\text {GeV}}\) matched to the lepton that fired the trigger. The electron trigger requires an electron candidate, formed by an EM calorimeter cluster matched with a track, either with \(E_{\text {T}} > {60}\,{\text {GeV}}\) or with \(E_{\text {T}} > {24}\,{\text {GeV}}\) and additional isolation requirements. The muon trigger requires a muon candidate, defined as a reconstructed track in the muon spectrometer, either with \(p_{\text {T}} > {36}\,{\text {GeV}}\) or with \(p_{\text {T}} > {24}\,{\text {GeV}}\) and isolation requirements. If there is another lepton candidate with a transverse momentum above \({25}\,{\text {GeV}}\), the event is rejected. This lepton veto guarantees orthogonality with respect to the dilepton analysis. The contribution from leptonically decaying \(\tau \)-leptons is included. In the following, the electron or muon candidate is referred to as the lepton.

Events identified as containing jets from cosmic rays or beam-induced backgrounds or due to noise hot spots in the calorimeter are removed. Only jets with \(p_{\text {T}} > {30}\,{\text {GeV}}\) and \(|\eta | < {2.4}\) are considered in the analysis. Additionally, a requirement of \(E_{\text {T}}^{\text {miss}} > {30}\,{\text {GeV}}\) is applied, and the transverse massFootnote 3 of the leptonically decaying \(W\) boson must satisfy \(m_T (W _L) >{50}\,{\text {GeV}}\).

In order to perform the measurement and validate the result, selected events are divided into different categories based on the jet and b-tagged jet multiplicities. The region with three jets of which one is b-tagged (3j1b) is called the signal region and is used to extract the \(tW\) cross-section. The region with four jets, two of them b-tagged (4j2b), contains a very pure sample of \(t\bar{t}\) events and is used as the \(t\bar{t}\) validation region to check the modelling of this background. Table 1 shows the expected and the observed numbers of events in the signal region after the event selection. All backgrounds except fake leptons, which is estimated using data-driven methods, are normalised to their expected cross-sections. The \(tW\) events constitute about 5% of the total number of events. The major backgrounds are \(t\bar{t}\) production with about 58%, and \(W \!+\!\text {jets}\) production with about 28% of the total number of events. The \(W \!+\!\text {jets}\) events are subdivided into heavy flavour (HF), where a \(W\) boson is produced in association with \(b\)- or \(c\)-jets, and light flavour (LF). The total numbers of expected events agree within a few percent with the observed numbers of events.

Table 1 Expected signal and background and observed number of events in the signal 3j1b region. The cross-section for \(tW\) production is taken to be the theory prediction. The uncertainties include statistical and systematic uncertainties

6 Separation of signal from background

Differences between signal and background event kinematics are exploited to better separate them. The \(t\bar{t}\) background is inherently difficult to distinguish from the signal, motivating the use of an artificial neural network (NN) implemented in the NeuroBayes framework [64, 65]. Detailed information about how the NN is used in single-top-quark analyses can be found in Ref. [66]. The NN input variables are selected such that they contribute significantly to the statistical separation power between signal and background, while avoiding variables that would lead to an increase of the expected uncertainty in the signal cross-section. The observable \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) (Fig. 2) provides very good separation of the signal from the background, but is strongly affected by uncertainties in the reconstructed jet energies as well as uncertainties in the b-tagging in \(t\bar{t}\) events. For this reason, \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) is not used in the NN; instead a two-dimensional discriminant is constructed from \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) and the response of the NN. The two-dimensional discriminant, explained in the following sections, allows the nuisance parameters affecting the variable \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) to be partially constrained.

6.1 Invariant mass of the hadronically decaying \(W\) boson

The variable \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) is computed from the four-momenta of the two selected untagged jets. For the signal and the \(t\bar{t}\) background, the distribution of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) exhibits a peak near the mass of the \(W\) boson, shown in Fig. 2a. The peak results from events where the two untagged jets are correctly matched to the hadronically decaying \(W\) boson. This is less likely to happen for \(t\bar{t}\) events than for \(tW\) events due to the higher \(b\)-jet multiplicity and the limited \(b\)-tagging efficiency. On the other hand, the \(W \!+\!\text {jets}\) background does not feature such a peak since the \(W\) boson must decay leptonically for the events to pass the selection. Figure 2b shows the pre-fit distribution of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\), and also demonstrates good pre-fit modelling of the data.

Fig. 2
figure 2

a Shape of the reconstructed \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) distribution for signal and most important backgrounds in the signal (3j1b) region. The distribution for each process normalised to unity is shown. b Pre-fit \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) distribution in the 3j1b region. Small backgrounds are subsumed under ‘Other’. The simulated distributions are normalised to their theoretical cross-sections. The dashed uncertainty band includes statistical and systematic uncertainties. The lower panels show the ratio of the observed and the predicted number of events in each bin. The last bin includes the overflow events.

6.2 Neural network

The NN is trained using simulated events with the two reconstructed untagged jets matched within \(\Delta R < {0.35}\) to the generator-level jets originating from a \(W\)-boson decay in the MC simulation and having a reconstructed mass of \({65}\,{\text {GeV}}< m\mathopen {}\left( {W_H {}}\right) \mathclose {} < {92.5}\,{\text {GeV}}\). As events are required to contain a lepton, only \(tW\), \(t\bar{t}\) and diboson events can have a pair of jets matched to the hadronic \(W\)-boson decay. Given that the contribution from diboson production is very small, the background sample used for the training consists entirely of \(t\bar{t}\) events. Following the training procedure mentioned before, the following four variables (ordered by significance) are selected as input for the NN:

  • the transverse momentum of the \(tW\) system, \(p_{\text {T}} (W_H W_L j_B )\), divided by the sum of the objects’ transverse momenta,

    $$\begin{aligned} \rho _T (W_{H},W_{L},j_B ) = \frac{p_{\text {T}} (W_H W_L j_B )}{p_{\text {T}} (W_H )+p_{\text {T}} (W_L )+p_{\text {T}} (j_B )}\,, \end{aligned}$$

    where the four-momentum of \(W_L \) is the sum of the four-momenta of the electron or muon and the neutrino, and the four-momentum of the neutrino is determined using \(E_{\text {T}}^{\text {miss}}\) from the solution of a quadratic equation.Footnote 4 The use of \(\rho _T (W_{H},W_{L},j_B )\), instead of the transverse momentum of the \(tW\) system, decreases the background contribution in the signal-like region of the NN response and results in a gain of sensitivity;

  • the invariant mass of the reconstructed \(tW\) system, \(m\mathopen {}\left( {W_L W_H j_B }\right) \mathclose {}\);

  • the absolute value of the difference between the pseudorapidities of the lepton and the leading untagged jet in \(p_{\text {T}}\), \(|\Delta \eta (\ell ,j_{L1})|\) ;

  • the absolute value of the pseudorapidity of the lepton, \(|\eta (\ell )|\).

Figure 3 compares the data with the prediction for the NN input variables. For all variables, the simulation provides a good description of the data.

Fig. 3
figure 3

Pre-fit distributions of the NN input variables in the \(tW\) signal (3j1b) region with \({65}\,{\text {GeV}} \le m\mathopen {}\left( {W_H {}}\right) \mathclose {} \le {92.5}\,{\text {GeV}}\). Small backgrounds are subsumed under ‘Other’. The simulated distributions are normalised to their theoretical cross-sections. The dashed uncertainty band includes statistical and systematic uncertainties. The last bin includes the overflow events. The lower panels show the ratio of the observed and the predicted number of events in each bin.

The distribution of the NN response is subdivided into eight bins, with the edges placed approximately at the 12.5% quantiles of a 50:50 mixture of \(tW\) and \(t\bar{t}\) events. Figure 4a shows the shape of the NN response for the \(tW\) and \(t\bar{t}\) processes and Fig. 4b presents the comparison between data and Monte Carlo simulation.

Fig. 4
figure 4

a Shape of the NN response in the signal (3j1b) region. The distribution contains those events with \({65}\,{\text {GeV}}\le m\mathopen {}\left( {W_H {}}\right) \mathclose {} \le {92.5}\,{\text {GeV}}\). The distributions for the \(tW\) process and the \(t\bar{t}\) process normalised to unity are shown. b Pre-fit NN output distribution in the 3j1b region. Small backgrounds are subsumed under ‘Other’. The simulated distributions are normalised to their theoretical cross-sections. The dashed uncertainty band includes statistical and systematic uncertainties. The lower panels show the ratio of the observed and the predicted number of events in each bin.

6.3 Two-dimensional discriminant

For the two-dimensional discriminant, \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) is used on the abscissa and the NN response on the ordinate of the two-dimensional discriminant. Outside of the aforementioned \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) range from 65 to 92.5 \(\text {GeV}\), the bins corresponding to different values of the NN response are merged, i.e. the NN response is ignored. The two-dimensional distribution is presented in Fig. 5.

Fig. 5
figure 5

Predicted distribution of the two-dimensional discriminant in the signal (3j1b) region. The proportions of the coloured areas reflect the expected composition in terms of \(tW\), \(t\bar{t}\), \(W \!+\!\text {jets}\) and other processes. The numbers correspond to the bin order when projecting the discriminant onto one axis as in Fig. 6. The last bin on the horizontal axis includes the overflow events

The bins are then rearranged on a one-dimensional axis in column-major order. The resulting one-dimensional distribution is presented in Fig. 6, together with a comparison of the shapes. The first three bins and the last ten bins correspond directly to the bins of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) below 65 \(\text {GeV}\) and above 92.5 \(\text {GeV}\) respectively. In between are four blocks of eight bins, corresponding to the NN output in slices of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\). Inside each of the blocks, the \(tW\)-to-\(t\bar{t}\) ratio increases significantly from left to right.

Fig. 6
figure 6

a Shape distribution of the reconstructed discriminant in the \(tW\) signal (3j1b) region rearranged onto a one-dimensional distribution. The distribution for each process normalised to unity is shown. b Pre-fit distributions of the discriminant in the \(tW\) signal (3j1b) region. Small backgrounds are subsumed under ‘Other’. The simulated distributions are normalised to their theoretical cross-sections. The dashed uncertainty band includes statistical and systematic uncertainties. The lower panels show the ratio of the observed and the predicted number of events in each bin. The first three bins and the last ten bins correspond directly to (non-uniform) bins of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\). In between are four blocks of eight bins, corresponding to the NN output in slices of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\). Inside each of the blocks, the numbers of events are scaled by a factor of four for better visibility

7 Systematic uncertainties

Uncertainties in the jet reconstruction arise from the jet energy scale (JES), jet energy resolution (JER), JVF requirement and jet reconstruction efficiency. The effect of the uncertainty in the JES [59] is evaluated by varying the reconstructed energies of the jets in the simulated samples. It is split into multiple components, taking into account the uncertainty in the calorimeter response, the detector simulation, the choice of MC event generator, the subtraction of pile-up, and differences in the detector response for jets initiated by a gluon, a light-flavour quark, or a \(b\)-quark. In a similar way, the JER uncertainty is represented using several components, which account for the uncertainty in different \(p_{\text {T}}\) and \(\eta \) regions of the detector, the difference between data and MC simulation, as well as the noise contribution in the forward detector region [59]. The uncertainty in jet reconstruction efficiency is estimated by randomly removing simulated jets from the events according to the jet reconstruction inefficiency measured with dijet events [67]. The JVF uncertainty is evaluated by varying the JVF criterion [60].

The scale factors used to correct the \(b\text {-tagging}\) efficiency in simulation compared to the efficiency in data are varied separately for \(b\)-jet, \(c\)-jet and light-flavour jets. Independent sources of uncertainty affecting the \(b\)-jet tagging efficiency and \(c\)-jet mis-tagging efficiency are considered depending on the jet kinematics, e.g. the variation of the \(b\)-quark jets is subdivided into 6 components. Uncertainties associated with the lepton selection arise from the trigger, reconstruction, identification, isolation and lepton momentum scale and resolution [54, 68, 69].

All systematic uncertainties in the reconstruction of jets and leptons are propagated to the uncertainty in \(E_{\text {T}}^{\text {miss}}\). In addition, dedicated uncertainties are assigned to the soft term of the \(E_{\text {T}}^{\text {miss}}\) , which accounts for energy deposits in the calorimeter which are not matched to high-\(p_{\text {T}}\) physics objects [62].

The uncertainty in the integrated luminosity for the data set used in this analysis is 1.9%. It is derived following the methodology detailed in Ref. [18]. This systematic uncertainty is applied to all contributions determined from the MC simulation.

Uncertainties stemming from theoretical models are evaluated using alternative MC samples for \(tW\) and \(t\bar{t}\) processes. The renormalisation and factorisation scales are varied in the matrix element and in the parton shower together with the amount of QCD radiation. Both scales are varied simultaneously in the matrix element and in the parton shower. The variation of both \(\mu _\mathrm {r}\) and \(\mu _\mathrm {f}\) by a factor of 0.5 is combined with the Perugia 2012radHi tune, while the variation of the scale parameters by a factor of 2.0 is combined with the Perugia 2012radLo tune [26]. This (radiation) uncertainty is considered uncorrelated between the \(tW\) and \(t\bar{t}\) processes. The NLO matrix element generator uncertainty is estimated by comparing two NLO matching methods: and , both interfaced with . The parton shower, hadronisation and underlying-event systematic uncertainties are computed by comparing with either or . These are treated as fully correlated between the \(tW\) and \(t\bar{t}\) processes. The uncertainty due to the treatment of the interference effects of the \(tW\) and \(t\bar{t}\) processes is evaluated by using the \(tW\) DS scheme instead of the DR scheme, both generated using with . The effect of the PDF uncertainties on the acceptance is taken into account for both the \(tW\) signal and the \(t\bar{t}\) background and treated as uncorrelated between the processes, following the studies in Ref. [70].

The uncertainties in the theoretical cross-section calculations are process dependent and vary from 4% for the t-channel to 6% for \(t\bar{t}\) (see Sect. 3). In addition, there are large uncertainties in the \(Z/W \text {+jets}\) production cross-sections. For every jet an additional uncertainty of 24% is assumed [71]. The uncertainty in the normalisation of \(W\)/\(Z\)-boson production in association with three jets is 42%, and the rate of \(W\)-boson events with heavy-flavour jets is allowed to vary by an additional 20%.

The modelling of the \(W \!+\!\text {jets}\) background was cross-checked using with . The shape of the \(W \!+\!\text {jets}\) background was found to be consistent with the nominal prediction. Hence no dedicated systematic uncertainty is assigned to the choice of generator, in order to avoid double-counting of the statistical uncertainty of the prediction (model statistics).

Uncertainties related to the modelling of the fake-lepton background take into account the choice of control region for the determination of the fake- and real-lepton efficiencies, the choice of parameterisation, and the normalisation of the prompt-lepton backgrounds in the determination of the efficiencies [52].

The uncertainty due to the limited size of the simulated samples and the fake-lepton background (model statistics) is estimated through the procedure detailed in Refs. [72, 73]: for every bin of the discriminant, an independent parameter is assigned which describes the variation of the predicted event rate constrained by its statistical uncertainty.

8 Statistical analysis

A binned profile maximum-likelihood fit to the discriminant in the signal region is used to determine the \(tW\) cross-section. The likelihood function is defined as a product of Poisson probability terms over all the bins of the discriminant in the signal region and Gaussian penalty terms,

$$\begin{aligned} L(\mu ,\vec {\theta };\vec {n}) = \prod _{i}^{\text {bins}} \text {Pois}(n_i;\nu _i(\mu ,\vec {\theta })) \, \prod _{k}^{\text {NPs}} G(\theta _k;0,1) \,\text {,} \end{aligned}$$

where the \(n_i\) (\(\nu _i\)) is the observed (expected) number of events in each bin i of the discriminant. The expected number of events depends on the signal-strength parameter, \(\mu \), which is a multiplicative factor on the predicted signal cross-section. Nuisance parameters (NPs), \(\theta _k\), are used to encode the effects of the systematic uncertainties in the expected number of events. The Gaussian penalty terms model the external constraints on these parameters. The estimated parameters, denoted by \({\hat{\mu }}\) and \({\hat{\theta }}\), are obtained by maximising \(L(\mu ,\vec {\theta };\vec {n})\).

The likelihood function is composed and evaluated with the HistFactory program [74], part of the RooStats framework [75]. The minimisation is performed with the Minuit package [76], using Minos to compute the error estimates.

The statistical significance, Z, of the result is estimated by comparing the likelihood values of two hypotheses. The background-only hypothesis is that there is no signal in the data (or equivalently, \(\mu =0\)). The signal-plus-background hypothesis is that the signal exists with the signal strength obtained from the fit to data. With the asymptotic approximation [77], the significance is calculated using a test statistic based on the profile likelihood ratio,

$$\begin{aligned} Z^2 = - 2 \log \frac{ L(\mu =0, \vec \theta =\hat{\vec {\theta }}_{\mu =0}) }{ L(\mu ={\hat{\mu }}, \vec \theta =\hat{\vec {\theta }}) }\,, \end{aligned}$$

where \(\hat{\vec {\theta }}_{\mu =0}\) denotes the estimates of the nuisance parameters that maximise the likelihood function under the background-only hypothesis. The expected significance is calculated by replacing \(\vec {n}\) in the likelihood function with the Asimov dataset for the nominal signal-plus-background hypothesis (\(\mu =1, \vec {\theta }=\hat{\vec {\theta }}\)).

9 Cross-section measurement

The \(tW\) cross-section is extracted from the fit to data in the signal region. Given the Standard Model prediction, the extracted signal strength is expected to be \({\hat{\mu }} = 1.00 \pm 0.35\). The measured value is \({\hat{\mu }} = 1.16\pm 0.31\), corresponding to an observed cross-section of \(\sigma _{tW}^{\text {obs}} =26\pm 7\,\text {pb}\), which is consistent with the Standard Model prediction. The observed (expected) significance is \(4.5\sigma \) (\(3.9\sigma \)).

The (post-fit) impact of each systematic uncertainty on the measured signal strength is estimated by means of conditional fits, i.e. the fit is repeated while keeping the corresponding nuisance parameter fixed at the \(\pm 1\) standard deviation (sigma) value of the post-fit error interval. The resulting change in the estimate of the signal strength quantifies the impact of the uncertainty. For each nuisance parameter, the \(+1\) and \(-1\) sigma variations are found to be symmetric about the best-fit value to a very good approximation. Table 2 shows the impacts of the systematic uncertainties on the observed fit result, where the impacts of uncertainties with similar sources have been added in quadrature. The dominant uncertainties are due to the amount of QCD radiation in signal events and \(t\bar{t}\) background, the JES and \(b\text {-tagging}\), and the model statistics, including the limited size of the MC samples.

Table 2 List of systematic uncertainties considered in the analysis and their relative impact on the observed signal strength, evaluated as described in the text. The ‘model statistics’ uncertainty is dominated by the \(W \!+\!\text {jets}\) background

Some nuisance parameters are constrained by the data. For example, the normalisation uncertainty for \(W \!+\!\text {jets}\) events is reduced from 45% to 8%, because the assigned initial uncertainty is large and this background can be separated well from \(tW\) and \(t\bar{t}\) events. By design of the discriminant, combinations of nuisance parameters that shift the peak in the \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) distribution are constrained, primarily the JES and choice of renormalisation scale together with the amount of QCD radiation in signal and \(t\bar{t}\) background. Also, the nuisance parameter for the NLO matching for \(tW\) and \(t\bar{t}\) is constrained: the choice of is not supported by the data, reducing the impact of the choice from 9% pre-fit to 3% post-fit.

Table 3 Post-fit signal and background and observed number of events in the signal region and the \(t\bar{t}\) validation region. The uncertainties include statistical plus all systematic uncertainties (cf. Sect. 7)
Fig. 7
figure 7

ad Post-fit distributions of the NN input variables, e NN discriminant and f \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) in the signal region. Small backgrounds are subsumed under ‘Other’. The dashed uncertainty band includes statistical and systematic uncertainties. The last bin includes the overflow events, except for e. The lower panels show the ratio of the observed and the predicted number of events in each bin.

A few nuisance parameters are pulled away from the pre-fit expectation. For the parameter associated with the choice of parton-shower generator, a blend of and gives the best description of the data, while the nominal prediction is disfavoured at the two-sigma level. The b-tagging parameter with the largest effect on the overall b-tagging efficiency is pulled by about one sigma, corresponding to a decrease of about 1% to 2% in the b-tagging efficiency compared to the pre-fit expectation. Given that the b-tagging calibration partially relies on dijet events [61], which correspond to a different environment regarding the production mechanism of the b-jets, the pull is reasonable.

Table 3 shows the post-fit event yields of each process. The uncertainties in the yields are computed taking the correlations between nuisance parameters and processes into account. The post-fit estimates are well within the uncertainties of the pre-fit expectation (Table 1), while most of their uncertainties are reduced. The normalisation uncertainty for \(W\) + HF jets changes from almost 50% to about 10%.

Figure 7 shows the post-fit distributions for the NN input variables, the NN output response and the \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\) in the signal region. The post-fit plots use the parameter estimates obtained in the fit of the discriminant, including their uncertainties, and demonstrate a good description of the data.

Figure 8a shows that the data are well described by the model in the signal region. Figure 8b shows the strongest support for the validity of the fit result by comparing the expected distributions and observed distributions in the \(t\bar{t}\) validation region. It shows that the uncertainty due to the extrapolation from the signal region is small, and therefore provides a stringent test that the main background is well modelled.

Fig. 8
figure 8

Post-fit distributions of the discriminant in the a signal region and b validation region. Small backgrounds are subsumed under ‘Other’. The dashed uncertainty band includes statistical and systematic uncertainties. The lower panels show the ratio of the observed and the predicted number of events in each bin. The first three bins and the last ten bins correspond directly to (non-uniform) bins of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\). In between are four blocks of eight bins, corresponding to the NN output in slices of \(m\mathopen {}\left( {W_H {}}\right) \mathclose {}\). Inside each of the blocks, the numbers of events are scaled by a factor of four (factor of two in 4j2b) for better visibility

10 Conclusion

The inclusive cross-section for the production of a single top quark in association with a \(W\) boson in the single-lepton channel is measured using an integrated luminosity of \(20.2\,\text {fb}^{-1}\) of \(\sqrt{s}={8}\,{\text {TeV}}\) proton–proton collision data collected by the ATLAS detector at the LHC in 2012. A neural network is used to separate the signal from the \(t\bar{t}\) background. A two-dimensional discriminant, built from the neural-network response and the mass of the hadronically decaying \(W\) boson, is used to extract the cross-section. Evidence for \(tW\) production in the single-lepton channel is obtained with an observed (expected) significance of 4.5 (3.9\(\sigma \)) standard deviations. The measured cross-section is:

$$\begin{aligned} \sigma _{tW}^{meas. } = 26 \pm 7\,\text {pb}, \end{aligned}$$

which is consistent with the SM expectation of \(\sigma _{tW}^{\text {th.}} = 22.4 \pm 1.5\,\text {pb}\).