1 Introduction

The experimental observation of neutrino oscillation provides conclusive evidence that neutrinos have non-zero masses which are much smaller than those of the charged leptons [1]. While in the Standard Model (SM) the charged fermions acquire masses by coupling to the Higgs boson, the origin of neutrino masses is still unknown. An elegant extension of the SM, accounting for the smallness of neutrino masses compared to other fundamental particles, is the seesaw mechanism [2,3,4,5,6]. It introduces a neutrino mass matrix with Majorana mass terms and very small neutrino mass eigenvalues, emerging from the existence of a heavy neutrino partner for each of the light neutrino species. Several models implementing the seesaw mechanism exist (see e.g. Ref. [7] for a concise overview) with different particle content. Among these, the type-III model [8] introduces at least one extra triplet of heavy fermionic fields with zero hypercharge in the adjoint representation of SU(2)\(_{\text {L}}\) which couple to electroweak (EW) gauge bosons and generate neutrino masses through Yukawa couplings to the Higgs boson and neutrinos. Consequently, these new charged and neutral heavy leptons could be produced via EW processes in proton–proton (pp) collisions at the Large Hadron Collider (LHC).

Several type-III seesaw heavy-lepton searches have been performed in different decay channels. The search presented in this paper is an extension of a similar search performed by ATLAS in Run 1 at \(\sqrt{s} = 8 \, {\text {TeV}}\) [9], using the same final state of two light leptons and two jets, which excluded heavy leptons with masses below \(335 \, {\text {GeV}}\). In Run 1 this search was complemented by another ATLAS search for heavy leptons using the three-lepton final state [10], which excluded heavy-lepton masses below \(470 \, {\text {GeV}}\). A search by the CMS Collaboration using their full Run 2 dataset at \(\sqrt{s} = 13 \, {\text {TeV}}\) [11] was performed, focusing on final states with at least three leptons, setting limits on the mass of type-III seesaw heavy leptons of up to \(880 \, {\text {GeV}}\). The analysis presented in this paper, based on the full ATLAS Run 2 dataset, is a novel measurement performed by an LHC experiment in Run 2 searching for type-III seesaw heavy leptons using the dilepton signature. Substantial refinements relative to the published ATLAS Run 1 analysis have been made in the background estimation and selection of signal candidate events, as well as in the theoretical signal modelling and signal cross-section next-to-leading-order calculation, all of which have a significant impact on the achieved measurement sensitivity of this analysis.

A minimal type-III seesaw model is used to optimise the analysis strategy and interpret the search results, as in Ref. [12]. This model introduces a single fermionic triplet with unknown (heavy) masses for one neutral and two oppositely charged leptons, denoted by \((L^{+}, L^{-}, N^{0})\). Here \(L^{+}\) is the antiparticle of \(L^{-}\) and \(N^{0}\) is a Majorana particle. These heavy leptons decay into a SM lepton and a \(W\), \(Z\) or \(H\) boson. The heavy leptons are assumed to be degenerate in mass. This assumption does not limit the generality of results within the context of type-III seesaw models because the theoretical calculations predict only a small mass-splitting due to radiative corrections and the resulting transitions between heavy leptons are highly suppressed [13].

The remaining degrees of freedom in the simplified model are the unknown mixing angles \( V_\alpha \) (\( \alpha = e,\mu ,\tau \)) between the SM leptons and the new heavy-lepton states, which enter only in the expressions for the \(L^{\pm }\) and \(N^{0}\) decay widths. The production cross-section does not depend on the \( V_\alpha \) parameters because the heavy leptons are produced through the coupling to the EW bosons. Only the branching fraction \({\mathcal {B}}_\alpha \) of the \(L^{\pm }\) and \(N^{0}\) decays to lepton flavour \( \alpha \) depends on the values of the mixing angles and is proportional to \({\mathcal {B}}_\alpha = |V_\alpha |^2/(|V_e|^2 + |V_\mu |^2 + |V_\tau |^2) \). In this analysis the decay branching ratios are assumed to be equal for all three lepton flavours, with \({\mathcal {B}}_e = {\mathcal {B}}_\mu = {\mathcal {B}}_\tau = 1/3 \).

The dominant production mechanism for type-III seesaw heavy leptons in \(pp\) collisions is pair production through \( q{\bar{q}} \rightarrow W^{*} \rightarrow N^{0} L^{\pm } \), and the largest branching fraction is the one with two \(W\) bosons in the final state: \( N^{0} \rightarrow W^{\pm } \ell ^\mp \, (\ell =e,\mu ,\tau ) \), and \( L^{\pm } \rightarrow W^{\pm } \nu \, (\nu =\nu _e,\nu _\mu ,\nu _\tau ) \). Branching fractions through intermediate \(W\) bosons range from approximately 80% down to approximately 60% over the heavy-lepton mass range. This search is optimised for the dominant processes \( pp \rightarrow N^{0} L^{\pm } \rightarrow W^{\pm } \ell ^\mp W^{\pm } \nu \), where one \(W\) boson decays leptonically and the other decays hadronically (Fig. 1). However, all \( pp \rightarrow N^{0} L^{\pm } \) and \( pp \rightarrow L^{\pm } L^{\mp } \) processes with two or more leptons in the final state are considered to form part of the signal. Only final states containing electrons and muons are considered, including those from leptonic \( \tau \) decays.

Fig. 1
figure 1

Feynman diagrams of the dominant production process in the a opposite-charge and b same-charge final-state cases

The exclusive topology of the final state consists of two jets from the hadronically decaying \(W\) boson, large missing transverse momentum and a lepton pair with either same-sign charge (SS) or opposite-sign charge (OS) and with either same-flavour (ee or \( \mu \mu \)) or different-flavour (\( e\mu \)) combinations.

The ATLAS detector is described in Sect. 2. The data and simulated events used are outlined in Sect. 3, and the event reconstruction procedure is detailed in Sect. 4. The analysis strategy is presented in Sect. 5, followed by a discussion of the background estimation in Sect. 6 and of systematic uncertainties in Sect. 7. Finally, the statistical analysis and results are presented in Sect. 8, followed by the conclusions in Sect. 9.

2 ATLAS detector

The ATLAS detector [14] at the LHC is a multipurpose particle detector with a forward–backward symmetric cylindrical geometryFootnote 1 and a nearly \( 4\pi \) coverage in solid angle around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroidal magnets.

The inner detector is immersed in a \(2 \, {\text {T}}\) axial magnetic field and provides charged-particle tracking in the range \( |\eta | < 2.5 \). The high-granularity silicon pixel detector covers the vertex region and typically provides four measurements per track, the first hit being normally in the insertable B-layer, which was installed before Run 2 [15, 16]. It is followed by the silicon microstrip tracker which usually provides four measurements per track. These silicon detectors are complemented by the transition radiation tracker (TRT), which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The TRT also provides electron-identification information based on the fraction of hits (typically 30 in total) above a higher energy-deposit threshold corresponding to that of transition radiation.

The calorimeter system covers the pseudorapidity range \( |\eta | < 4.9 \). Within the region \( |\eta | < 3.2 \), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) sampling calorimeters, with an additional thin LAr presampler covering \( |\eta | < 1.8 \) to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \( |\eta | < 1.7 \), and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules optimised for electromagnetic and hadronic measurements, respectively.

The muon spectrometer comprises separate trigger and high-precision tracking chambers which measure the deflection of muons in a magnetic field generated by the superconducting air-core toroids. The field integral of the toroids ranges between 2.0 and \(6.0 \, \hbox {Tm}\) across most of the muon spectrometer. A set of precision chambers covers the region \( |\eta | < 2.7 \) with three layers of monitored drift tubes, complemented by cathode-strip chambers in the forward region, where the background is highest. The muon trigger system covers the range \( |\eta | < 2.4 \) with resistive-plate chambers in the barrel and thin-gap chambers in the endcap regions.

A two-level trigger system is used to select interesting events [17]. The first-level trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to at most \(100 \, {\text {kHz}}\). This is followed by a software-based trigger which further reduces the event rate to approximately \(1 \, {\text {kHz}}\).

3 Data and simulated events

The data used in this analysis correspond to proton–proton collisions with proton bunches colliding every \(25 \, {\text {ns}}\). Data quality criteria are applied to ensure that considered events were recorded with stable beam conditions and with all relevant sub-detector systems operational. After data quality requirements, the samples used for this analysis correspond to \(3.2 \, {\text {fb}}^{-1}\), \(33.0 \, {\text {fb}}^{-1}\), \(44.3 \, {\text {fb}}^{-1}\), and \(58.5 \, {\text {fb}}^{-1}\) of integrated luminosity recorded in 2015, 2016, 2017, and 2018, respectively, for a total of \(139 \, {\text {fb}}^{-1}\). In the four years of data taking, the average number of interactions per bunch crossing is 13, 25, 38, and 36, respectively. The uncertainty in the combined 2015–2018 integrated luminosity is 1.7% [18], obtained using the LUCID-2 detector [19] for the primary luminosity measurements.

Samples of signal and background processes were simulated using several Monte Carlo (MC) generators. The signal considered in the simplified type-III seesaw model is implemented in the MadGraph5_aMC@NLO [20] generator at leading order (\({\text {LO}}\)) using FeynRules [21]. For the simulated signal production, MadGraph5_aMC@NLO was interfaced to Pythia 8.212 [22] for parton showering. The A14 set of tuned parameters [23] was used for the parton shower. The NNPDF3.0lo [24] parton distribution function (PDF) set was used in the matrix element calculation, and the NNPDF2.3lo [25] was used in the parton shower. The signal cross-section and its uncertainty are calculated separately at next-to-leading-order (NLO) plus next-to-leading-logarithmic (NLL) accuracy, assuming that the heavy leptons, \(L^{\pm }\) and \(N^{0}\), are \( {\text {SU}}(2) \) triplet fermions [26, 27]. Simulated SM background samples include top quark pair (\(t{\bar{t}}\)) and diboson (\( VV,~V = Z, W \)) production processes, which are the dominant ones, as well as Drell–Yan (\( q{\bar{q}}\rightarrow Z/\gamma ^{*} \rightarrow \ell ^{+} \ell ^{-} \, (\ell =e,\mu ,\tau ) \)), triboson (VVV), single top quark, and rare top quark (\( t{\bar{t}}+ V \)) production processes. The non-dominant processes are grouped into the ‘other’ background category throughout this paper.

The generators used in the MC sample production and the cross-section calculations used for MC sample normalisation are provided in Table 1. In order to minimise the theoretical uncertainties, the normalisation of the MC samples modelling dominant backgrounds is considered as a free parameter in the final likelihood fit, as described in Sect. 8.

Table 1 The event generator, parton shower generator, cross-section normalisation, and PDF set used for the matrix element (ME) calculation are shown for each simulated signal and background event sample. The generator cross-section is used where not specifically stated otherwise

The production of \(t{\bar{t}}\) events was modelled using the Powheg-Box v2 generator [28,29,30,31], which provides matrix elements at next-to-leading order (NLO) in the strong coupling constant \(\alpha _{{\text {S}}}\) with the NNPDF3.0nlo PDF. The \(h_{\mathrm {damp}}\) parameter in Powheg controls the matching between the matrix element and the parton shower, and effectively regulates the high-\(p_{{\text {T}}}\) radiation against which the \(t{\bar{t}}\) system recoils. It was set to \(1.5 m_{{\mathrm {top}}} \) [32], using a top quark mass of \(m_{{\mathrm {top}}} = 172.5 \, {\text {GeV}} \). The renormalisation and factorisation scale was set to \( \sqrt{m_{\text {top}}^2 + p_{\text {T, top}}^2} \), where \(p_{\text {T, top}} \) is the transverse momentum of the simulated top quark in the event. The events were interfaced with Pythia 8.230 [22] for the parton shower and hadronisation, using the A14 set of tuned parameters and the NNPDF2.3lo set of PDFs. The decays of bottom and charm hadrons were simulated using EvtGen v1.6.0 [33]. The \(t{\bar{t}}\)sample is initially normalised to the cross-section prediction at next-to-next-to-leading order (NNLO) in QCD including the re-summation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms calculated using Top++2.0 [34,35,36,37,38,39,40], while the final yield is extracted from the data, as is described in Sect. 5. For \(pp\) collisions at a centre-of-mass energy of \( \sqrt{s} = 13 \, {\text {TeV}} \), this cross-section corresponds to \( \sigma {(t{\bar{t}})}_{{\text {NNLO+NNLL}}} = 832\pm 51~{\mathrm {pb}} \).

Samples of diboson final states (VV) were simulated with the Sherpa v2.2.1 or v2.2.2 generator [41], as detailed in Table 1, including off-shell effects and Higgs boson contributions, where appropriate. Fully leptonic final states and semileptonic final states, where one boson decays leptonically and the other hadronically, were generated using matrix elements at NLO accuracy in QCD for up to one additional parton and at LO accuracy for up to three additional parton emissions. Samples for the loop-induced processes \( gg \rightarrow VV \) were generated using LO-accurate matrix elements for emission of up to one additional parton for both the fully leptonic and semileptonic final states. Electroweak production of a diboson pair in association with two jets (VVjj) is also accurate up to leading order. The matrix element calculations were matched and merged with the Sherpa parton shower based on Catani–Seymour dipole factorisation [42, 43] using the MEPS@NLO prescription [44,45,46,47]. QCD corrections were provided by the OpenLoops library [48, 49]. The NNPDF3.0nnlo set of PDFs was used [24], along with the dedicated set of tuned parton-shower parameters developed by the Sherpa authors.

Other background processes, such as single-top production, \( t{\bar{t}}+ V \) processes and the Drell–Yan \( Z/\gamma ^{*}\rightarrow e^{+} e^{-}/\mu ^{+} \mu ^{-}/\tau ^{+} \tau ^{-} \) process, do not contribute significantly to the regions defined in this analysis and are outlined only in Table 1.

For all samples, a full simulation of the ATLAS detector response [50] was performed using the Geant 4 toolkit [51]. The effect of multiple interactions in the same and neighbouring bunch crossings (pile-up) was modelled by overlaying the original hard-scattering event with simulated inelastic pp events generated with Pythia 8.186 using the NNPDF2.3lo set of PDFs and the A3 set of tuned parameters [52]. The MC events are weighted to reproduce the distribution of the average number of interactions per bunch crossing (\( \left<\mu \right> \)) observed in the data. The \( \left<\mu \right> \) value in MC is rescaled by a factor of \( 1.03 \pm 0.07 \) to improve the level of agreement between data and simulation in the visible inelastic pp cross-section [53].

4 Event reconstruction

After the application of beam, detector and data-quality requirements, events containing jets failing to satisfy the quality criteria described in Ref. [54] are rejected to suppress events with large calorimeter noise or non-collision backgrounds. To further reject non-collision backgrounds originating from cosmic rays and beam-halo events, at least one reconstructed collision vertex is required with at least two associated tracks with transverse momentum (\(p_{{\text {T}}}\)) greater than \(500 \, \hbox {MeV}\). The \(pp\) vertex candidate with the highest sum of the squared transverse momenta of all associated tracks is identified as the primary vertex.

Electron candidates are reconstructed from energy deposits in the electromagnetic calorimeter associated with a charged-particle track measured in the inner detector. The electron candidates are required to pass the Tight likelihood-based identification selection [55,56,57], to have transverse momentum \( p_{{\text {T}}} > 40 \, {\text {GeV}} \) and to be in the fiducial volume of the inner detector, \( |\eta | < 2.47 \). The transition region between the barrel and endcap calorimeters (\( 1.37< |\eta | < 1.52 \)) is excluded because it has inefficient regions due to the presence of service conduits. The track associated with the electron candidate must have a transverse impact parameter (\(d_{0}\)) evaluated in the transverse plane at the point of closest approach of the track to the beam axis which satisfies \( |d_{0} |/\sigma (d_{0}) < 5 \), where \( \sigma (d_{0}) \) is the uncertainty in \(d_{0}\). A longitudinal impact parameter value of \( |z_{0}\sin (\theta )| < 0.5\, \hbox {mm} \) is also required for electrons, where \(z_{0}\) is the distance between the track and the primary vertex in the z direction and \( \theta \) is the polar angle of the track. The combined identification and reconstruction efficiency for Tight electrons ranges from 58 to 88% in the transverse energy (\(E_{{\text {T}}}\)) range from \(4.5 \, {\text {GeV}}\) to \(100 \, {\text {GeV}}\). In addition, the electron candidates must also satisfy the Loose isolation criteria, which use calorimeter-based and track-based isolation requirements with an efficiency of about 99%. Both reconstruction and isolation performance are evaluated in \(Z \rightarrow ee\) decay measurements, described in Ref. [58]. Electron candidates are discarded if their angular distance from a jet satisfies \( 0.2< \Delta R < 0.4 \).

Muon candidates are reconstructed by matching an inner-detector track with a track reconstructed in the muon spectrometer. The muon candidates are required to have \( p_{{\text {T}}} > 40 \, {\text {GeV}} \), \( |\eta | < 2.5 \) and a transverse impact parameter significance of \( |d_{0} |/\sigma (d_{0}) < 3 \). A longitudinal impact parameter value of \( |z_{0}\sin (\theta )| < 0.5 \hbox {\,mm}\,\,\) is required for muons. Muon candidates with \(p_{{\text {T}}}\) lower than \(300 \, {\text {GeV}}\) are required to pass the Medium muon identification requirements, while for higher values of \(p_{{\text {T}}}\) a special HighPt muon identification is applied, which is optimised for searches in the high-\(p_{{\text {T}}}\) regime [59]. The muon candidates must also fulfil the TightTrackOnly track-based isolation requirements. This results in a reconstruction and identification efficiency of over 95% in the entire phase space, as measured in \(Z \rightarrow \mu \mu \) events [59]. Muons within \( \Delta R = 0.4 \) of the axis of a jet having more than three tracks and with \( p_{{\text {T}}} (\mu ) / p_{{\text {T}}} ({\mathrm {jet}}) < 0.5 \) are discarded. If a muon candidate overlapping with an electron leaves a sufficiently large energy deposit in the calorimeter and shares a track reconstructed in the inner detector with the electron, this muon candidate is also discarded.

Jets are reconstructed by clustering energy deposits in the calorimeter using the anti-\(k_{t}\) algorithm [60] with the radius parameter equal to 0.4. The measured jet transverse momentum is corrected for detector effects by recalibrating to the particle energy scale before the interaction with the detector [61]. For all jets the expected average transverse energy contribution from pile-up is corrected using a subtraction method. This is based on the level of diffuse noise added to the event per unit area in \( \eta \)\( \phi \) by pile-up which is subtracted from the jet area in \( \eta \)\( \phi \) space and a residual correction derived from the MC simulation, as detailed in Refs. [61, 62]. To reduce the contamination from jets from pile-up, a ‘jet vertex tagger’ (JVT) algorithm [63] is used for jets with \( p_{{\text {T}}} < 60 \, \hbox {GeV} \) and \( |\eta | < 2.4 \). It employs a multivariate technique that relies on jet energy and tracking variables to determine the likelihood that the jet originates from pile-up. The Medium JVT working point is used which has an average efficiency of 92%. Jets within \( \Delta R = 0.2 \) of an electron are discarded. Jets within \( \Delta R = 0.2 \) of a muon and featuring fewer than three tracks or having \( p_{{\text {T}}} (\mu ) / p_{{\text {T}}} ({\mathrm {jet}}) > 0.5 \) are also removed. Jets considered in this analysis are required to have \( p_{{\text {T}}} > 20 \, \hbox {GeV} \) and \( |\eta | < 2.5 \). Jets fulfilling these kinematic criteria are identified as containing b-hadrons and categorised as b-tagged jets if tagged by a multivariate algorithm which uses information about the impact parameters of inner-detector tracks matched to the jet, the presence of displaced secondary vertices, and the reconstructed flight paths of b- and c-hadrons inside the jet [64,65,66]. The algorithm is used at the working point providing a b-tagging efficiency of 77%, as determined in a sample of simulated \(t{\bar{t}}\)  events. This corresponds to rejection factors of approximately 134, 6 and 22 for light-quark and gluon jets, c-jets, and hadronically-decaying \( \tau \)-leptons, respectively.

Missing transverse momentum (with magnitude \(E_{{\text {T}}}^{{\text {miss}}}\)) is present in events with unbalanced kinematics in the transverse plane. This originates from undetected momentum in the event, due either to neutrinos escaping detection or to other particles falling outside the detector acceptance, being badly reconstructed, or failing reconstruction altogether. The \(E_{{\text {T}}}^{{\text {miss}}}\) is calculated as the modulus of the negative vectorial sum of the \(p_{{\text {T}}}\) of the fully calibrated and reconstructed physics objects in the event. Jets in the forward direction are considered if they satisfy \( p_{{\text {T}}} > 30 \, \hbox {GeV} \). An additional ‘soft term’ is added, accounting for all tracks in the event that are not associated with any reconstructed object, but are associated with the identified primary vertex. The use of this track-based soft term is motivated by improved performance in \(E_{{\text {T}}}^{{\text {miss}}}\) reconstruction in a high pile-up environment [67].

Correction factors to account for differences in the identification and selection efficiency, reconstructed energy and energy resolution between data and MC simulation are applied to the selected electrons, muons and jets, and consequently also when calculating \(E_{{\text {T}}}^{{\text {miss}}}\) in MC simulation, as described in Refs. [55, 57, 59, 61].

Table 2 A summary of all analysis regions defined in the text. The region definitions are the same for all analysis channel flavour combinations

5 Analysis strategy and event selection

All selected events, reconstructed as defined in Sect. 4, are required to match the signal topology, as defined in Sect. 1, whereby a separate analysis channel is used for each light-lepton flavour (ee, \( e\mu \), \( \mu \mu \)) and charge combination (SS, OS), resulting in six analysis channels. Events must pass dilepton triggers, where the \(p_{{\text {T}}}\) thresholds are optimised depending on the lepton flavour combination and the data-taking year. Thresholds range from 12 to \(22 \, {\text {GeV}}\) for the leading lepton \(p_{{\text {T}}}\) and from 8 to \(17 \, {\text {GeV}}\) for the \(p_{{\text {T}}}\) of the sub-leading lepton. Compared to single-lepton triggers, the dilepton triggers have looser identification and isolation criteria, minimising biases on the implementation of data-driven estimation of backgrounds, with only a marginal reduction of signal event efficiencies.

Selected events are then categorised into exclusive categories (analysis regions) based on different sets of requirements for reconstructed objects. These regions are grouped according to their purpose into signal regions, control regions, and validation regions. Signal regions (SRs) are defined to achieve the best sensitivity to the targeted models with an enhanced signal-to-background ratio and are used to compare data with each signal hypothesis plus the expected background using the statistical methodology detailed in Sect. 8. The purpose of control regions (CRs) and validation regions (VRs) is to evaluate and validate the background contamination in the SRs. The CRs and VRs are thus constructed with the purpose of minimising the signal presence while maximising the contributions of distinct types of background. Consequently, the CRs and VRs are constructed to be kinematically close to the SRs, which is done by modifying one or more kinematic requirements while also ensuring no overlap with SRs and among themselves. In CRs the background prediction is fit to data by assigning a floating parameter to the background normalisation of the main process. In VRs the background estimation methods are validated by comparing the background model with data.

While optimising the SR definition to maximise the signal significance, the data are not looked at (blinded), until the background modelling has been validated. Likewise, as a part of the same strategy, while validating the background predictions, the data are compared with the background estimates only in the CR and VR, which are designed to have negligible signal contamination.

All regions used in this analysis, with the corresponding selection criteria, are summarised in Table 2 and described below.

As the signal process contains neutrinos in the final state, one of the most important selection criteria is based on the \(E_{{\text {T}}}^{{\text {miss}}}\) significance \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}})\) [67]. The value of \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}})\) is calculated using a maximum-likelihood ratio method which considers the direction of the \(E_{{\text {T}}}^{{\text {miss}}}\) and the calibrated objects as well as their respective resolutions. The SR selection criteria are optimised to maximise the analysis sensitivity. This results in different selection values for the OS and SS channels due to different background compositions and associated event topologies. Consequently, the \(E_{{\text {T}}}^{{\text {miss}}}\) significance selection value is \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}}) > 10\) for the OS channels and \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}}) > 7.5\) for the SS channels.

At least two jets with \( p_{{\text {T}}} > 20 \, {\text {GeV}} \) and \( |\eta | < 2.5 \) are required for each region. A \(b{\text {-jet}}\) veto is applied to all the jets in SR events to suppress background from SM processes involving top quarks. For each signal event, the dijet invariant mass (\(m_{jj}\)) of the two highest-\(p_{{\text {T}}}\) jets is expected to be close to the \(W\) mass. The dijet invariant mass is thus required to be in the window \( 60 \, {\text {GeV}}< m_{jj} < 100 \, {\text {GeV}} \).

In the SR definition, a lower bound on the invariant mass of the lepton pair (\(m_{\ell \ell }\)) is introduced and is found to give optimal results at \(110 \, {\text {GeV}}\) (\(100 \, {\text {GeV}}\)) in the OS (SS) regions. This choice aims to remove the Drell–Yan events around the \(Z \rightarrow ee\) peak, where the electron charge might also have been misidentified.

Furthermore, in the SR definition for the OS analysis channels, the azimuthal angle \( \Delta \phi {(E_{{\text {T}}}^{{\text {miss}}}, \ell )}_{\mathrm {min}} \) between the directions of \(E_{{\text {T}}}^{{\text {miss}}}\) and the closest lepton has been shown to have good separation power between signal and background events, especially reducing the SR background contamination from \(t{\bar{t}}\)  and diboson events. This is explained by difference between the sources of the measured \(E_{{\text {T}}}^{{\text {miss}}}\) for signal and background, because the latter tends to have a spurious component due to misreconstructed jets. After optimisation, a requirement of \( \Delta \phi {(E_{{\text {T}}}^{{\text {miss}}}, \ell )}_{\mathrm {min}} > 1 \) is used. In the SR definition for the SS analysis channels, this selection criterion is not applied, because other criteria already reduce the \(t{\bar{t}}\)  and diboson background contamination, and this criterion is not very effective in reducing the fake-background contribution, whose contribution is significantly larger for the SS analysis channels.

To further increase the expected signal significance, additional selection criteria are introduced: the dijet transverse momentum must fulfil the condition \( p_{{\text {T}}} (jj) > ~100 \, {\text {GeV}}\) (\(60 \, {\text {GeV}}\)) for the OS (SS) regions, while the dilepton transverse momentum must satisfy \( p_{{\text {T}}} (\ell \ell ) > ~100 \, {\text {GeV}}\) for both OS and SS events. These selection requirements exploit the boosted decay topology of object pairs, which would be expected in the presence of heavy leptons.

Due to the high masses of the heavy leptons in the decay chain, the signal process is expected to contain high-\(p_{{\text {T}}}\) leptons, jets and neutrinos. Since the optimal measure for the high-\(p_{{\text {T}}}\) activity of the neutrinos escaping the detector is the \(E_{{\text {T}}}^{{\text {miss}}}\), and an estimator of the event activity for the measured objects is the scalar sum of the transverse momenta of selected leptons and jets \(H_{{\text {T}}}\), a combined \( H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}} > 300 \, {\text {GeV}} \) selection criterion is applied in all analysis regions. The \(H_{{\text {T}}}\) +\(E_{{\text {T}}}^{{\text {miss}}}\) is also the observable used in the final likelihood fit, described in Sect. 8. The final signal selection efficiency relative to the events with at least two leptons is about 1.2% for masses between 500 and \(900 \, {\text {GeV}}\) and drops below 1% for higher mass points.

The dominant background contribution (almost 50% of the total) in the OS channels is \(t{\bar{t}}\)  production in which the two \(W\) bosons in the final state decay leptonically. In the SS channels the \(t{\bar{t}}\)  events originate from charge misidentification and are at the level of 25%. To estimate the contribution from the \(t{\bar{t}}\)  decays, an OS control region enriched in top-quark events is defined (Top CR); no dedicated SS CR is used. In this region, the SR selection is modified by requiring the number of b-tagged jets to be \( N(b{\text {-jet}}) \ge ~2\). To increase the statistical significance, all requirements on the transverse momenta, \( p_{{\text {T}}} (jj) \) and \( p_{{\text {T}}} (\ell \ell ) \), as well as the azimuthal angle, \( \Delta \phi {(E_{{\text {T}}}^{{\text {miss}}}, \ell )}_{\mathrm {min}} \), are omitted, and the \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}})\) selection is relaxed to \( {\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}}) > 5 \). The normalisation of the \(t{\bar{t}}\)  MC sample is a free parameter in the simultaneous likelihood fit across all CRs, as described in Sect. 8.

To obtain validation and control regions with a large fraction of diboson events, the dijet mass \(m_{jj}\) selection is inverted relative to the SR definition. Since the \(m_{jj}\) selection in the SR requires a pair of jets to match the \(W\) mass window, the inverted selection defines a kinematic sideband, with negligible signal contamination but still selecting events kinematically similar to those in the SR. In addition, all requirements on the transverse momenta, \( p_{{\text {T}}} (jj) \) and \( p_{{\text {T}}} (\ell \ell ) \), as well as the azimuthal angle, \( \Delta \phi {(E_{{\text {T}}}^{{\text {miss}}}, \ell )}_{\mathrm {min}} \), are omitted, in order to ensure that there are enough events in the CRs. This selection defines the \(m_{jj}\) VR for the OS analysis channels, while for the SS analysis channels the \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}})\) selection is additionally relaxed to \({\mathcal {S}}(E_{{\text {T}}}^{{\text {miss}}})\) \( > 5 \). The SS Diboson CR is then defined by introducing an additional requirement \( 300 \, {\text {GeV}}< H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}} < 500 \, {\text {GeV}} \), while the remaining kinematic region with \(H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}} > 500 \, {\text {GeV}} \) is used as the SS \(m_{jj}\) VR. The diboson background contributes only slightly less (45%) than the \(t{\bar{t}}\)  background to the contamination in the OS SR and more than half (55%) of the total background in the SS SR. The normalisation of the diboson MC sample used in the analysis is estimated in the SS Diboson CR for both OS and SS events, as the OS and SS diboson cross-sections are found to be the same within the systematics uncertainties. This is achieved by using the normalisation value of the diboson MC sample as a free parameter in the simultaneous likelihood fit across all CRs, as described in Sect. 8. Two representative post-fit distributions of the \( H_{{\text {T}}} + E_{{\text {T}}}^{{\text {miss}}} \) variable in the VRs are shown in Fig. 2. Good agreement between measured data and predictions is observed within uncertainties.

Fig. 2
figure 2

Post-fit distributions of the \( H_{{\text {T}}} + E_{{\text {T}}}^{{\text {miss}}} \) variable for data and SM background predictions in two validation regions, namely a in the OS \( \mu \mu \) \( m_{jj} \) VR and b in the SS ee \( m_{jj} \) VR. The hatched bands include all systematic uncertainties post-fit with the correlations between various sources taken into account. Errors on data are statistical only. The lower panel shows the ratio of the observed data to the estimated SM background

6 Background composition and estimation

The final-state topologies of the six analysis channels have different background compositions. Background estimation techniques, implemented consistently among the channels, are discussed in this section. In general, the expected number of background events and the associated kinematic distributions are derived with a mixture of data-driven methods and simulation.

In the analysis, one considers two conceptually different background categories according to their source, namely irreducible and reducible backgrounds. Irreducible background sources are SM processes producing the same prompt final-state lepton pairs as does the signal, with the dominant contributions in this analysis coming from \(t{\bar{t}}\)  and diboson production. Irreducible-background predictions are obtained from simulated event samples, described in Sect. 3. To avoid double-counting between the irreducible background predictions, which are derived from MC, and the data-driven reducible background predictions, a dedicated algorithm is introduced. Irreducible MC events are only used if simulated leptons originating in the hard process (prompt leptons) can be associated to their reconstructed lepton counterpart.

The reducible background sources are the processes where the leptons reconstructed in the analysis originate from either non-prompt leptons or other particles and thus the selected events contain at least one fake/non-prompt electron or muon. In addition, events where a lepton charge is misidentified (allowing the \(t{\bar{t}}\)  process to contribute to the background in the SS analysis channels) are also considered as a reducible background category.

The reducible background due to charge misidentification is estimated in SS analysis channels that contain electrons. The effect of muon charge misidentification is found to be negligible. The modelling of charge misidentification in simulation deviates from that in data due to the complexity of the processes involved, and it relies strongly on a precise description of the detector material. Consequently, charge reconstruction correction factors (scale factors), applied to the simulated background events to compensate for the differences relative to data, are derived by comparing the charge misidentification probability measured in data with the one in simulation. The charge misidentification probability is extracted by performing a likelihood fit on a dedicated \(Z \rightarrow ee\) data sample, as described in Ref. [58]. These scale factors are then applied to the simulated background events to compensate for the differences.

The ‘fake-background’ sources, non-isolated, non-prompt electrons and muons, are produced by secondary decays of light- or heavy-flavour mesons into light leptons embedded within jets. Although the b-jet veto significantly reduces the number of fake leptons from heavy-flavour decays, a fraction of these events still passes the analysis selection. For electrons, another significant component of fakes arises from photon conversions, as well as from jets that are reconstructed as electrons. MC samples are not used to estimate these background sources because the simulation of jet production and hadronisation has large intrinsic uncertainties. Instead, the fake-lepton background (namely, background from misidentified or non-prompt leptons) is estimated with a data-driven approach, known as the ‘fake factor’ method, as described in Ref. [68]. This method relies on determining the probability, or fake factor (F), for a fake lepton to be identified as a tight (T) lepton, corresponding to the default lepton selection. The fake factor is measured in dedicated fake-enriched regions, where events must pass single-lepton triggers, contain no b-jets and have only one reconstructed lepton that satisfies either the tight selection criteria or a relaxed (loose, L) selection. Loose electrons will pass the Loose identification requirements but fail either the Tight identification or the isolation requirements. Loose muons, on the other hand, will pass the Medium identification requirements but fail isolation. Electron and muon fake factors F are then defined as the ratio of the number of tight leptons to the number of loose leptons and are parameterised as functions of \(p_{{\text {T}}}\) and \(\eta \). The fake-lepton background (containing at least one fake lepton) is then estimated in the SR by applying F as a normalisation correction to relevant distributions in a template region which has the same selection criteria as the SR except that at least one of the two leptons must pass the loose selection but fail the tight one.

The number of events in the SR that contain at least one fake lepton, \( N^{{\mathrm {fake}}} \), is estimated from the adjacent regions, which can be labelled by the lepton pair passing/failing the nominal requirements as (TL, LT, LL), where the first lepton is leading and the second is sub-leading in \(p_{{\text {T}}}\). Data events are weighted with fake factors according to the loose-lepton multiplicity of the region:

$$\begin{aligned} N^{{\mathrm {fake}}}= & {} {\left[ F(N_{TL} + N_{LT}) - F^2N_{LL} \right] }_{{\mathrm {data}}}\\&- {\left[ F(N_{TL} + N_{LT}) - F^2N_{LL} \right] }^{{\mathrm {prompt}}~\ell ~ {\mathrm {only}}}_{{\mathrm {MC}}}, \end{aligned}$$

with \(N_{TL}\), \(N_{LT}\), \(N_{LL}\) denoting the number of events in the corresponding adjacent region. The prompt-lepton contribution is subtracted using the irreducible-background MC samples to account for the prompt-lepton contamination in the adjacent regions. The \(m_{jj}\) VR, as defined in Table 2, is used to verify the data-driven fake-lepton estimate.

The experimental systematic uncertainty of the data-driven estimate of the fake-lepton background is evaluated by rederiving the fake factor while varying the relevant systematic uncertainty sources by their uncertainties. These account for variation in the underlying composition (i.e. relative contributions of different background processes) of fake backgrounds across the regions analysed. Furthermore, the uncertainty in the normalisation of the simulated irreducible events is accounted for in the background estimation procedure when the prompt-lepton contribution gets subtracted. The total uncertainty in the fake-lepton background varies between 10 and 20% across the analysed kinematic range.

7 Systematic uncertainties

Experimental and theoretical systematic uncertainty sources affecting both the background and signal predictions are accounted for in the analysis. The impact of systematic uncertainties on the total event yields as well as the changes in the shapes of kinematic distributions are taken into account when performing the statistical analysis in both the SRs and CRs, as described in Sect. 8.

Uncertainties in the predicted background yields and kinematic distributions of the MC samples (\(t{\bar{t}}\)  and diboson production) arise from variations in parton-shower parameters, uncertainties in the QCD renormalisation/factorisation scales, \( \alpha _{{\text {S}}} (m_{Z}) \) uncertainty, and uncertainties due to PDFs used. All uncertainties listed are calculated using the PDF4LHC prescription [69]. Since the yield of the \(t{\bar{t}}\)  and diboson backgrounds is derived from the likelihood fit to the data in the CRs, these systematic variations contribute by changing the shapes of the background predictions used in the likelihood fit in the CRs and SRs.

The effect of the uncertainty in the strong coupling constant \( \alpha _{{\text {S}}} (m_{Z}) = 0.118 \) is assessed by variations of \( \pm 0.001 \). Uncertainties from missing higher orders are evaluated by using seven variations of the QCD factorisation and renormalisation scales in the matrix elements up to a factor of two [70]. Uncertainties in the nominal PDF set, used in the MC generation, are evaluated from 100 variations using the LHAPDF6 toolkit [71]. Additional uncertainties in the \(t{\bar{t}}\)  acceptance due to the selection of the PDF set are calculated using the central values of the MSTW2008 68% CL NNLO [72, 73], CT10 NNLO [74, 75] and NNPDF2.3lo 5f FFN [25] PDF sets. In Sherpa diboson samples central values of the CT14 [76] and MMHT2014 [77] PDF sets are used.

The uncertainty due to initial-state-radiation (ISR) modelling is estimated by comparing the nominal \(t{\bar{t}}\) sample with two additional samples, produced by simultaneously varying the factorisation and renormalisation scales by a factor of two [78]. Similarly, the impact of final-state-radiation (FSR) modelling is evaluated by a reweighting procedure, using a variation of the FSR parton shower, in which the renormalisation scale for QCD emission is varied by factors of 0.5 and 2.0. The impact of the parton shower and hadronisation model is evaluated by comparing the nominal generator set-up with a sample showered with Herwig 7.13 [79, 80], using the Herwig 7.1  default set of tuned parameters [80] and the MMHT2014  PDF set. To assess the uncertainty in the matching of NLO matrix elements and the parton shower, the Powheg sample is compared with a sample of events generated with MadGraph5_aMC@NLO v2.6.0 interfaced with Pythia 8.230.

For signal samples, only uncertainties in the QCD scales and PDFs used are considered. As signal normalisation is the unknown parameter being measured, only the acceptance effect on the limits is taken into account.

A significant contribution to the total background uncertainty arises from the statistical uncertainty in the data and MC samples. The corresponding data statistical uncertainty in the signal regions varies from 29 to 47%, and the MC statistical uncertainty from 7 to 10%, depending on the region considered. Additionally the 1.7% uncertainty in the combined 2015–2018 integrated luminosity, as defined in Sect. 3, is taken into account.

Fig. 3
figure 3

Distributions of \( H_{{\text {T}}} + E_{{\text {T}}}^{{\text {miss}}} \) in opposite-sign signal regions, namely a the electron–electron signal region, b the electron–muon signal region, and c the muon–muon signal region after the background-only fit described in the text. The coloured lines correspond to signal samples with the \(N^{0}\) and \(L^{\pm }\) mass stated in the legend. The hatched bands include all systematic uncertainties post-fit with the correlations between various sources taken into account. Errors on data are statistical only. The lower panel shows the ratio of the observed data to the estimated SM background

Fig. 4
figure 4

Distributions of \( H_{{\text {T}}} + E_{{\text {T}}}^{{\text {miss}}} \) in same-sign signal regions, namely a the electron–electron signal region, b the electron–muon signal region, and c the muon–muon signal region after the background-only fit described in the text. The coloured lines correspond to signal samples with the \(N^{0}\) and \(L^{\pm }\) mass stated in the legend. The hatched bands include all systematic uncertainties post-fit with the correlations between various sources taken into account. Errors on data are statistical only. The lower panel shows the ratio of the observed data to the estimated SM background

Experimental systematic uncertainties due to different lepton reconstruction, identification, charge identification, isolation, and trigger efficiencies in data compared to those in simulation are also taken into account [57, 59], as are uncertainties in absolute lepton energy calibration [55]. Likewise, experimental systematic uncertainties due to different reconstruction and \(b{\text {-tagging}}\) efficiencies for reconstructed jets in data and simulation are also taken into account [64,65,66], as well as absolute jet and \(E_{{\text {T}}}^{{\text {miss}}}\) energy scale and resolution uncertainties [61, 67]. The uncertainty in the pile-up simulation, derived from comparison of data with simulation, is also taken into account [63]. The uncertainty in the data-driven estimate of the fake-lepton background is evaluated by varying the fake factor as described in Sect. 6. All experimental systematic uncertainties discussed here affect the signal samples as well as the background.

8 Statistical analysis and results

The statistical analysis package HistFitter [81] is used to implement a binned maximum-likelihood fit in all control and signal regions, described in Sect. 5, to obtain the numbers of signal and background events. Validation regions are used to cross-check the fit results but are not included in the fit itself. For the likelihood fit, the CRs and SRs are binned in the \( H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}} \) variable in bins uniformly defined in \( \log (H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}}) \) in the range \( 300 \, {\text {GeV}}< (H_{{\text {T}}} +E_{{\text {T}}}^{{\text {miss}}}) < 2 \, {\text {TeV}} \), where the last bin also includes any overflow values. The SRs have six bins, and the Top CRs and Diboson CRs each have two bins. The VRs have four and three bins for OS and SS events, respectively.

The binned likelihood function used in the final maximum-likelihood fit is a product of Poisson probability density functions that compares the observed number of events with each signal hypothesis plus the expected background, and assumes Gaussian distributions to constrain the nuisance parameters associated with the systematic uncertainties for each bin in the fitted distributions. The systematic uncertainties of the sources listed in Sect. 7 are used to compute the uncertainties of all the bins in the fit and are then mapped to the nuisance parameters. The widths of the Gaussian distributions correspond to the magnitudes of these uncertainties. Poisson distributions are used for modelling MC simulation statistical uncertainties.

The final normalisations of the \(t{\bar{t}}\)  and diboson MC samples are not taken from MC calculations but are derived in the simultaneous likelihood fit to the data in the dedicated Top and Diboson CRs, as introduced in Sect. 5. Consequently, additional free parameters in the likelihood are introduced for the \(t{\bar{t}}\)  and diboson background contributions, to fit their yields in the Top and Diboson control regions, respectively. Fitting the yields of the largest backgrounds reduces the systematic uncertainty in the predicted yield from SM sources. The fitted normalisations are compatible with their SM predictions within the uncertainties.

Post-fit binned distributions of \( H_{{\text {T}}} + E_{{\text {T}}}^{{\text {miss}}} \) are shown in Figs. 3 and 4 for the signal regions. After the fit, the compatibility between the data and the expected background is assessed and good agreement is observed, with a p-value of 0.5 for the background-only hypothesis.Footnote 2 In the absence of a significant deviation from expectations, 95% confidence level (CL) upper limits on the signal production cross-section are derived using the \( {\mathrm {CL}}_{\mathrm {s}} \) method [82]. Pseudo-experiments are used to set the limit. All the plots and tables presented in this paper are then derived from a likelihood fit assuming no signal presence, by fitting only the background normalisation and corresponding sources of systematic uncertainty, as described in the text (called a background-only fit).

Figure 5 shows good agreement, within uncertainties, between the expected background and observed events in all the regions and channels considered in the analysis. The level of agreement between the data and background predictions in the CRs and VRs, being well within the statistical and systematic uncertainties, demonstrates the validity of the background estimation procedures.

Fig. 5
figure 5

The numbers of observed and expected events in the control, validation, and signal regions for all channels, split by lepton flavour and electric charge combination. The background expectation is the result of the background-only fit described in the text. The hatched bands include all post-fit systematic uncertainties with the correlations between various sources taken into account. Errors on data are statistical only. The lower panel shows the ratio of the observed data to the estimated SM background

Fig. 6
figure 6

Relative contributions of different sources of statistical and systematic uncertainty in the total background yield estimation after the fit. Systematic uncertainties are calculated in an uncorrelated way by shifting in turn only one nuisance parameter from the post-fit value by one standard deviation, keeping all the other parameters at their post-fit values, and comparing the resulting event yield with the nominal yield. Validation regions are not included in the fit. Individual uncertainties can be correlated, and do not necessarily add in quadrature to the total background uncertainty, which is indicated by ‘Total uncertainty’

The total relative systematic uncertainty in the background yields after the fit, and its breakdown into components, is presented in Fig. 6. The contributions of theoretical and experimental uncertainties in the SR are both of the order of 10%. The dominant uncertainty is due to the limited number of simulated events. It is reduced in the final fit combination, yielding a total uncertainty of less than 20%. After the fit, none of the nuisance parameters are pulled from their central values or have the uncertainties constrained.

Table 3 The number of expected background events in signal regions after the likelihood fit, compared with the data. Uncertainties correspond to the total uncertainties in the predicted event yields and can be smaller than the value of individual contributions summed in quadrature because the latter can be anti-correlated. Due to rounding, the totals can differ from the sums of components. The \(t{\bar{t}}\) and diboson normalisations are allowed to float in the fit. As a reference, representative signal yields and their errors for signal masses \(m(N^{0}, L^{\pm })\), denoted by m in the table, are shown in the bottom part of the table

Predicted numbers of background events in signal regions are listed together with the observed number of events in data in Table 3. Expected signal yields for several mass points are given for comparison.

The upper limits on the production cross-sections of the processes \( pp \rightarrow W^{*} \rightarrow N^{0} L^{\pm } \) and \( pp \rightarrow Z^{*} \rightarrow L^{\pm } L^{\mp } \) at the 95% CL are evaluated as a function of the heavy-lepton mass. The resulting exclusion limits on the signal cross-section are shown in Fig. 7. By comparing the upper limits on the cross-section with the theoretical model dependence of the cross-section on the heavy-lepton mass, the lower mass limit on the mass of the type-III seesaw heavy leptons \(N^{0}\) and \(L^{\pm }\) can be derived. The expected lower limit is \({820_{-60}^{+40}}\,{{\text {Ge}}{\text {V}}}\), where the uncertainties on the limit are extracted from the \( \pm 1 \, {\sigma } \) band, while the observed lower limit on the mass is \(790 \, {\text {GeV}}\), excluding the mass values below this point.

Fig. 7
figure 7

Expected and observed 95% \({\mathrm {CL}}_{\mathrm {s}} \) exclusion limits for the type-III seesaw process with the corresponding one- and two-standard-deviation bands, showing the 95% CL upper limit on the cross-section. The theoretical signal cross-section prediction, given by the NLO calculation [26, 27], is shown as a red line with the corresponding uncertainty band

9 Conclusion

The ATLAS detector at the Large Hadron Collider was used to search for the pair production of heavy leptons predicted by the type-III seesaw model. The analysis is performed using a final state containing two same-charge or opposite-charge leptons, electrons or muons, two jets from a hadronically decaying \(W\) boson that were not identified as b-tagged, and large missing transverse momentum. The search uses \(139 \, {\text {fb}}^{-1}\) of data from proton–proton collisions at \( \sqrt{s} = 13 \, {\text {TeV}} \), recorded during the 2015, 2016, 2017 and 2018 data-taking periods.

No significant excess above the SM prediction was found. Limits are set on the type-III seesaw heavy-lepton masses, using the simplified type-III seesaw model and assuming branching fractions to all lepton flavours to be equal. Heavy leptons with masses below \(790 \, {\text {GeV}}\) are excluded at the 95% confidence level with the expected lower mass limit at \({820_{-60}^{+40}}{{\text {Ge}}{\text {V}}}\).