1 Introduction

Supersymmetry (SUSY)  [1,2,3,4,5,6] is an extension of the Standard Model (SM) that can resolve the gauge hierarchy problem  [7,8,9,10] by introducing supersymmetric partners of the SM bosons and fermions. The SUSY partner to the top quark, the top squark, plays an important role in cancelling out potentially large top-quark loop corrections to the Higgs boson mass  [11, 12]. Naturalness arguments suggest that the superpartners of the third-generation quarks may be \({{\mathcal {O}}}(\text {Te}\text {V})\), and thus experimentally accessible at the Large Hadron Collider (LHC)  [13, 14]. The superpartners of the left- and right-handed top quarks, \({\tilde{t}}_{\mathrm {L}} \) and \({\tilde{t}}_{\mathrm {R}} \), mix to form two mass eigenstates, \({\tilde{t}}_1 \) and \({\tilde{t}}_2 \), where \({\tilde{t}}_1 \) is the lighter one. Throughout this paper, it is assumed that \({\tilde{t}}_2 \) has sufficiently high mass such that the analysis is sensitive to \({\tilde{t}}_1 \) only, which is labelled \({\tilde{t}}\) in the following.

R-parity-conserving SUSY models  [15] may also provide a dark-matter candidate through the lightest supersymmetric particle (LSP), which is stable  [16, 17]. In these models, the supersymmetric partners are produced in pairs. At the LHC, top squarks are produced mostly via gluon–gluon fusion as well as quark–antiquark annihilation. In a simplified scenario where the first- and second-generation squarks and gluinos are decoupled, the cross section of direct top squark pair production is largely decoupled from the specific choice of SUSY model parameters except for the top squark mass. This production cross section falls steeply with increasing top squark mass, ranging from \(10.0\pm 6.7\) pb for \(m_{{\tilde{t}}}=300\) GeV to \(0.89\pm 0.13\) fb for \(m_{{\tilde{t}}}=1300\) GeV  [18,19,20,21].

In this paper, each top squark is assumed to decay into a top quark (that may be either on-shell or off-shell) and the LSP, which is assumed to be the lightest neutral mass eigenstate of the partners of the electroweak gauge and Higgs bosons, i.e. the lightest neutralino, . The degree to which the top quark is off-shell is directly related to the mass difference between \({\tilde{t}}\) and . The top squark decay scenarios considered are shown in Fig. 1a–c: the top quark is on-shell in two-body decays (), three-body decays contain an off-shell top quark but the \(W \) boson is on-shell (), and in four-body decays both the top quark and \(W \) boson are off-shell (, where f and \(f'\) are fermions originating from the off-shell \(W \) boson decay). Only hadronic \(W \) boson decays are considered in the following.

Fig. 1
figure 1

Decay topologies of the signal models considered in the analysis: a two-body, b three-body, c four-body top squark decays, the top quarks being produced in pairs, and d up-type, third-generation scalar leptoquark pair production, with both leptoquarks decaying into a top quark and a neutralino or a bottom quark and a \(\tau \)-lepton. For simplicity, no distinction is made between particles and antiparticles. Only hadronic \(W \) boson decays are shown

This paper presents a search for top squark pair production with an experimental signature of at least two jets, large missing transverse momentum, and no electrons or muons, using \(139\,\text {fb}^{-1}\) of proton–proton (pp) collision data provided by the LHC at a centre-of-mass energy of \(\sqrt{s} = 13\,\mathrm {TeV}\ \) and collected by the ATLAS detector in 2015–2018. Previous searches have been performed by both the ATLAS  [22,23,24,25,26,27,28] and CMS  [29,30,31,32,33,34,35,36,37] collaborations. In this search, enhanced sensitivity to two-body top squark decays, where is greater than the top quark mass, \(m_t\), is achieved by the analysis of the full LHC Run 2 dataset and the exploitation of techniques designed to efficiently reconstruct top quarks that are Lorentz-boosted in the laboratory frame. Sensitivity to compressed scenarios, where , is extended compared with previous searches through the analysis of events in which high-transverse-momentum jets from initial-state radiation (ISR) boost the top squark system in the transverse plane. Finally, sensitivity to the four-body decay scenario where is less than the sum of the \(W \) boson mass, \(m_{W}\), and the b-quark mass, \(m_{b}\), is achieved by extending the identification efficiency for low-transverse-momentum b-hadron decays through the use of charged-particle tracking information, adding sensitivity to the all-hadronic channel in comparison with previous searches. All sensitivities are also increased thanks to global enhancements in detector performance achieved by the end of LHC Run 2, including more precise estimates of the statistical significance of missing transverse momentum in an event  [38] and improved identification efficiencies for jets containing b-hadrons  [39]. The interpretation of the results uses simplified models  [40,41,42].

As has been demonstrated previously  [23,24,25, 43, 44], top squark searches are sensitive to a variety of additional signal models such as top squarks originating from gluino decays  [40,41,42], top squark decays via charged electroweak SUSY partners  [40,41,42], mediator-based dark-matter models  [45,46,47,48,49,50], scalar dark-energy models  [51], and third-generation scalar leptoquarks  [52,53,54,55,56,57,58]. In this paper, the results are interpreted in models considering the pair production of up-type, third-generation scalar leptoquarks (\(\text {LQ}_{3}^{\mathrm{u}}\)), as shown in Fig. 1d, assuming that the \(\text {LQ}_{3}^{\mathrm{u}}\) only interact with leptons and quarks from the same generation  [59]. Similar \(\text {LQ}_{3}^{\mathrm{u}}\) interpretations have been performed by both the ATLAS  [44] and CMS  [60] collaborations. The third-generation leptoquark production cross section is identical to that of top squark production and the \(\text {LQ}_{3}^{\mathrm{u}}\rightarrow t\nu \) decay channel has the same experimental signature as heavy top squarks decaying into massless neutralinos, and thus additional sensitivity is achieved compared with previous \(\text {LQ}_{3}^{\mathrm{u}}\) results.

2 ATLAS detector

The ATLAS experiment  [61,62,63] at the LHC is a multipurpose particle detector with a cylindrical forward–backward- and \(\phi \)-symmetric geometry and an approximate \(4\pi \) coverage in solid angle.Footnote 1 It consists of an inner tracking detector surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic and hadron calorimeters, and a muon spectrometer. The inner tracking detector covers the pseudorapidity range \(|\eta | < 2.5\). It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. Hadronic calorimetry is provided by a steel/scintillator-tile calorimeter covering the central pseudorapidity range (\(|\eta | < 1.7\)). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to \(|\eta | = 4.9\). The muon spectrometer surrounds the calorimeters and features three large air-core toroidal superconducting magnets with eight coils each, providing coverage up to \(|\eta | = 2.7\), as well as a system of precision tracking chambers and fast detectors for triggering. The field integral of the toroids ranges between 2 and 6 T\(\cdot \)m across most of the detector.

3 Data collection and simulated event samples

The data were collected from 2015 to 2018 at a pp centre-of-mass energy of \(13\,\mathrm {TeV}\ \) with \(25\,\mathrm {ns}\) bunch spacing, resulting in a time-integrated luminosity of \(139.0\pm 2.4~\text{ fb}^{-1}\)  [64], measured using the LUCID-2 detector  [65]. Multiple pp interactions occur per bunch crossing (pile-up) and the average number of these interactions in the data was measured to be \(\langle \mu \rangle = 34\). A two-level trigger system  [66] is used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information to reduce the event rate to at most 100 kHz. This is followed by a software-based trigger that reduces the accepted event rate to 1 kHz (on average) for offline storage.

Selected events are required to pass a missing transverse momentum (whose magnitude is denoted by \(E_{\text {T}}^{\text {miss}}\)) trigger  [67], which is fully efficient for events with reconstructed \(E_{\text {T}}^{\text {miss}} >250\) GeV (the \(E_{\text {T}}^{\text {miss}}\) reconstruction is described in Sect. 4). In order to estimate the background originating from SM processes, events are also selected at lower values of \(E_{\text {T}}^{\text {miss}}\) using single-electron, single-muon, and single-jet triggers. Electron and muon triggers yield an approximately constant efficiency in the presence of a single isolated electron or muon with transverse momentum (\(p_{\text {T}}\)) above 27 GeV (see Sect. 4 for details of the electron, muon, and jet reconstruction); these triggers are needed for the estimation of \(Z\rightarrow \nu \bar{\nu }\) production in association with heavy-flavour jets (\(Z+\mathrm {jets}\)) and top pair production in association with \(Z\rightarrow \nu \bar{\nu }\) (\(t\bar{t} +Z\)) backgrounds. Triggers based on the presence of a single jet were used to collect data samples for the estimation of the multijet and all-hadronic \(t\bar{t}\) backgrounds. The jet \(p_{\text {T}}\)  thresholds after energy calibration ranged from 50 to 400 GeV. In order to stay within the bandwidth limits of the trigger system, only a fraction of the events passing the jet triggers were recorded to permanent storage.

Monte Carlo (MC) simulations are used to model the SUSY and leptoquark signals, as well as to aid in the description of the background processes. SUSY signal models were all generated with MadGraph5_aMC@NLO 2.6.2   [68] at leading order (LO) in QCD, while leptoquark signals were generated with MadGraph5_aMC@NLO 2.4.3 at next-to-leading order (NLO) in QCD. All signal samples were interfaced to Pythia 8.230   [69] for the parton showering (PS) and hadronisation, and with EvtGen 1.6.0  [70] for the b- and c-hadron decays.

The parton distribution function (PDF) set used for the generation of the signal samples is NNPDF2.3 LO  [71] for SUSY signals and NNPDF3.0 NLO  [72] for leptoquark signals, with the A14  [73] set of tuned underlying-event and parton shower parameters (UE tune). Matching of the matrix element (ME) with parton showering was performed following the CKKW-L prescription  [74], with a matching scale set to one quarter of the mass of the top squark or leptoquark. All signal cross sections are calculated to approximate next-to-next-to-leading order (NNLO) in the strong coupling constant, adding the resummation of soft gluon emission at next-to-next-to-leading-logarithm accuracy (approximate NNLO+NNLL)  [18, 19, 75, 76].

The top squark mixing parameter between \({\tilde{t}}_{\mathrm {L}} \) and \({\tilde{t}}_{\mathrm {R}} \) was set to be maximal.Footnote 2 Finally, the top quark mass was set to 172.5 GeV in all simulated samples.

SM background samples were generated with different MC event generators depending on the process. Details of the generators and parton showering used for the different processes are shown in Table 1.

Table 1 Overview of the simulated background samples

The detector simulation  [94] was performed using either \(\textsc {GEANT4}\)  [95] or a fast simulation framework, where the showers in the electromagnetic and hadronic calorimeters are simulated with a parameterised description  [96] and the rest of the detector is simulated with \(\textsc {GEANT4}\). All signal samples were produced using the fast simulation, while SM background samples used the \(\textsc {GEANT4}\) set-up. All MC samples were produced with a varying number of simulated minimum-bias interactions overlaid on the hard-scattering event, to account for pile-up. These interactions were produced using Pythia 8.2 with the A3 tune  [97] and NNPDF2.3 LO PDF set. The simulated events are reweighted to match the distribution of the number of pp interactions per bunch crossing in data. Corrections are applied to the simulated events to account for differences between data and simulation for the lepton trigger, reconstruction, identification and isolation efficiencies, and for the lepton and jet momentum scale and energy resolution. Corrections are also applied to the efficiency of identifying jets containing b-hadrons (b-jets), the probability of mis-tagging jets containing only charm hadrons (c-jets) and only lighter hadrons (light-flavour jets), and the probability of mis-tagging jets originating from the hard pp scattering as pile-up jets.

4 Event reconstruction

Events are required to have a primary vertex  [98, 99] reconstructed from at least two tracks  [100] with \(p_{\text {T}} >500~\text {Me}\text {V}\). Among the vertices found, the vertex with the largest summed \(p_{\text {T}} ^2\) of the associated tracks is designated as the primary vertex.

Calorimeter jets are built from topological clusters of energy in the calorimeter  [101], calibrated to the electromagnetic scale, using the anti-\(k_t\) algorithm with radius parameter \(R=0.4\)  [102, 103]. These types of jets are referred to as ‘jets’. Jet transverse momenta are further corrected to the corresponding particle-level jet \(p_{\text {T}}\), based on the simulation  [104]. Remaining differences between data and simulated events are evaluated and corrected for using in situ techniques, which exploit the transverse momentum balance between a jet and a reference object such as a photon, Z boson, or multijet system in data. After these calibrations, all jets in the event with \(p_{\text {T}} > 20\) GeV and \(|\eta | < 4.5\) must satisfy a set of loose jet-quality requirements  [105]. In the four-body analysis, the leading jet in \(p_{\text {T}}\) must satisfy a set of tighter jet-quality requirements. These requirements are designed to reject jets originating from sporadic bursts of detector noise, large coherent noise or isolated pathological cells in the calorimeter system, hardware issues, beam-induced background or cosmic-ray muons  [105]. If these jet requirements are not met, the event is discarded. All jets are required to have \(p_{\text {T}} > 20\) GeV and \(|\eta | < 2.8\) to be considered in this analysis. In addition, the ‘medium’ working point of the track-based jet vertex tagger  [106, 107] is required for jets with \(p_{\text {T}} <120\) GeV and \(|\eta |<2.5\), to reject jets that originate from pile-up interactions.

Jets which contain b-hadrons and are within the inner-detector acceptance (\(|\eta |<2.5\)) are identified as ‘b-tagged’ using a multivariate algorithm that exploits the impact parametersFootnote 3 of the charged-particle tracks, the presence of secondary vertices, and the reconstructed flight paths of b- and c-hadrons inside the jet  [39]. The output of the multivariate algorithm is a single b-tagging output score, which signifies the likelihood of a jet to contain b-hadrons. The average identification efficiency of jets containing b-hadrons is \(77\%\) as determined in simulated \(t\bar{t}\) events. Using the same simulated sample, a rejection factor of approximately 110 (5) is reached for jets initiated by light quarks and gluons (charm quarks).

In order to identify low-\(p_{\text {T}}\) b-hadrons that are not contained in jets passing the \(p_{\text {T}} >20\) GeV requirement, ‘track-jets’ are reconstructed from inner-detector tracks using the anti-\(k_{t}\) algorithm with radius parameter \(R=0.4\). Tracks considered for inclusion in track-jets are required to have \(p_{\text {T}} >500\) \(\text {Me}\text {V}\), \(|\eta |<2.5\), at least seven hits in the silicon microstrip and pixel detectors, no more than one hit shared by multiple tracks in the pixel detector, no more than one missing hit in the pixel detector, and no more than two missing hits in the silicon microstrip detector. Additional requirements on the longitudinal impact parameter projected along the beam direction (\(\left| z_0 \sin (\theta )\right| <3\) mm) reduce the pile-up contributions and improve the efficiency in selecting tracks from the hard-scatter vertex. Track-jets are required to have \(p_{\text {T}} > 5\) GeV, more than one track within the jet radius, \(|\eta |<2.5\), and not overlap with the leading non-b-tagged jet in the event (\(\Delta R>0.4\)). The standard b-tagging algorithm is employed for track-jets  [108] and the selection requirement is tighter than for regular jets, due to the larger amount of background at low \(p_{\text {T}}\). The average identification efficiency for jets containing b-hadrons is \(70\%\) as determined in simulated \(t\bar{t}\) events. Using the same simulated sample, a rejection factor of approximately 200 (10) is reached for jets initiated by light quarks and gluons (charm quarks).

Electron candidates are reconstructed from clusters of energy deposits in the electromagnetic calorimeter that are matched to a track in the inner detector. They are required to have \(|\eta |<2.47\) and \(p_{\text {T}} >4.5\) GeV, and must pass a loose likelihood-based selection  [109, 110]. The impact parameter along the beam direction is required to be less than 0.5 mm. The electromagnetic shower of an electron can also be reconstructed as a jet such that a procedure is required to resolve this ambiguity. In the case where the separationFootnote 4 between an electron candidate and a non-b-tagged (b-tagged) jet is \(\Delta R_{y}< 0.2\), the candidate is considered to be an electron (b-tagged jet). This procedure uses a b-tagged jet definition that is looser than the one described earlier, to avoid selecting electrons from heavy-flavour hadron decays. If the separation between an electron candidate and any jet satisfies \(0.2<\Delta R_{y}< 0.4\), the candidate is considered to be a jet, and the electron candidate is removed.

Muons are reconstructed by matching tracks in the inner detector to tracks in the muon spectrometer and are required to have \(|\eta |<2.7\) and \(p_{\text {T}} >4\) GeV  [111]. The impact parameter along the beam direction is required to be less than 0.5 mm. Events containing muons identified as originating from cosmic rays, \(|d_0| > 0.2\) mm and \(|z_0| > 1\) mm, or as poorly reconstructed, \(\sigma (q/p)/|(q/p)| > 0.2\), are removed. Here, \(\sigma (q/p)/|(q/p)|\) is a measure of the momentum uncertainty for a particle with charge q. Muons are discarded if they are within \(\Delta R= 0.4\) of jets that survive the electron–jet overlap removal, except when the number of tracks associated with the jet is less than three, where the muon is kept and the jet discarded.

The requirements on electrons and muons are tightened for the selection of events in background control regions (described in Sect. 6) containing at least one electron or muon. The electrons and muons passing the tight selection are called ‘control’ electrons or muons in the following, as opposed to ‘baseline’ electrons and muons, which are only required to pass the requirements described above. Control electrons and muons are required to satisfy the ‘FCLoose’ \(p_{\text {T}} \)-dependent track-based and calorimeter-based isolation criteria  [112]. The calorimeter-based isolation is determined by taking the ratio of the sum of energy deposits in a cone of \(\Delta R=0.2\) around the electron or muon candidate to the sum of energy deposits associated with the electron or muon. The track-based isolation is estimated in a similar way but using a variable cone size with a maximum value of \(\Delta R=0.2\) for electrons and \(\Delta R=0.3\) for muons. Electron candidates are required to pass a ‘tight’ likelihood-based selection. The impact parameter of the electron in the transverse plane is required to be less than five times the transverse impact parameter uncertainty (\(\sigma _{d_0}\)). Further selection criteria are also imposed on reconstructed muons: muon candidates are required to pass a ‘medium’ quality selection and meet the \(|d_0| < 3\sigma _{d_0}\) requirement.

The \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) vector is the negative vector sum of the \(p_{\text {T}}\) of all selected and calibrated electrons, muons, and jets in the event, plus an extra term (‘soft’ term) added to account for energy depositions in the event that are not associated with any of the objects. The ‘soft’ term is calculated from inner-detector tracks (\(p_{\text {T}} > 500~\text {Me}\text {V}\) and matched to the primary vertex, to make it resilient to pile-up contamination) not associated with selected objects  [113, 114]. The missing transverse momentum calculated using only the tracking system (denoted by \({\mathbf {p}}_{\mathrm {T}}^{\mathrm {miss,\,track}}\), with magnitude \(E_{\mathrm {T}}^{{\mathrm{miss,\,track}}}\)) is computed from the vector sum of the inner-detector tracks with \(p_{\text {T}} >500~\text {Me}\text {V}\) and \(|\eta |<2.5\) that are associated with the event’s primary vertex.

Hadronically decaying \(\tau \)-lepton candidates are identified as non-b-tagged jets with \(|\eta |<2.5\) and a maximum of four inner-detector tracks matched to them. They are only used in some regions to veto events with \(\tau \)-lepton candidates most likely originating from \(W \rightarrow \tau \nu \) decays, which are identified with the additional requirement that the \(\Delta \phi \) between the \(\tau \)-lepton candidate and the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) is less than \(\pi / 5\).

5 Signal region definitions

The experimental signature of this search, for all signal topologies, consists of multiple jets, one or two of which are b-tagged, no electrons and muons (following the baseline definition described in Sect. 4), and large missing transverse momentum. The \(E_{\text {T}}^{\text {miss}}\) trigger is used to collect the data in all signal regions.

Beyond these common requirements, four sets of signal regions (SRA–D) are defined to target each decay topology and kinematic regime, as shown in Fig. 2. SRA (SRB) is sensitive to the production of high-mass \({\tilde{t}}\) pairs that each undergo a two-body decay with large (medium) , or the production of high-mass leptoquark pairs. Both SRA and SRB employ top-quark mass-reconstruction techniques to reject background, of which the dominant source is associated production of a Z boson with heavy-flavour jets, with the \(Z\) decaying into neutrinos (\(Z+\mathrm {jets}\)). SRC targets compressed two/three-body top squark decays with and has \(t\bar{t}\) production as the dominant background contribution. A common preselection is defined for SRA–C: at least four jets are required (\(N_{\mathrm {j}}\ge 4\)), at least two of which must be b-tagged (\(N_{b}\ge 2\)), and the leading four jets must satisfy \(p_{\text {T}} >80,80,40,40\) GeV. SRD targets highly compressed four-body top squark decays and uses track-jets to identify b-hadrons with low \(p_{\text {T}}\). As in SRA and SRB, the dominant source of background in SRD is \(Z+\mathrm {jets}\). In both SRC and SRD, a high-\(p_{\text {T}}\) jet originating from ISR is used to improve sensitivity to the targeted decays.

Fig. 2
figure 2

Schematic representation of the various topologies targeted by the different signal regions defined in the analysis (SRA, SRB, SRC, SRD). SRA and SRB are orthogonal and the exact requirements made in the signal regions are detailed in the text and Table 2

5.1 Signal regions A and B

SRA is optimised for exclusion at 95% confidence level (CL) of the scenario where \(m_{{\tilde{t}}}=1300\) GeV and  GeV, while SRB is optimised for \(m_{{\tilde{t}}}=700\) GeV and  GeV. SRA and SRB have the best sensitivity to up-type, third-generation scalar leptoquarks, when leptoquarks decay via \(\text {LQ}_{3}^{\mathrm{u}}\rightarrow t\nu \).

To avoid a loss of efficiency when the top quark has \(p_{\text {T}} >200\) GeV and its daughters are close to each other, the two hadronic top candidates are reconstructed by using the anti-\(k_{t}\) algorithm to cluster \(R=0.4\) jets, using radius parameters of \(R=0.8\) and \(R=1.2\), similar to the technique used in the previous ATLAS search  [23]. Each reclustered jet is assigned a mass which is computed from the four-momenta of its jet constituents. Two \(R=1.2\) reclustered jets, representing top candidates, are required, and the leading reclustered \(R=1.2\) jet must have a mass (\(m_{1}^{R=1.2}\)) greater than 120 GeV. To optimise signal efficiency regardless of the subleading top candidate reconstruction success (measured by how close the candidate mass is to the top quark mass), the events are divided into three categories based on the subleading \(R=1.2\) reclustered jet mass (\(m_{2}^{R=1.2}\)): the ‘TT’ category includes events with \(m_{2}^{R=1.2}>120\) GeV, corresponding to successfully reconstructing a subleading top candidate; the ‘TW’ category contains events with \(60<m_{2}^{R=1.2}<120\) GeV, corresponding to successfully reconstructing a subleading \(W\) candidate; and the ‘T0’ category represents events with \(m_{2}^{R=1.2}<60\) GeV, corresponding to not reconstructing a top nor a \(W\) candidate.

In SRA, in addition to using the mass of the reclustered jets, information about the flavour content of the reclustered jet is used to improve background rejection. For all SRA categories, a \(b\)-tagged jet is required to be within \(\Delta R=1.2\) of the leading reclustered \(R=1.2\) jet, \(j^{R=1.2}_1(b)\), while in the SRA-TT category, the same selection is made for the subleading \(R=1.2\) jet, \(j^{R=1.2}_2(b)\). A requirement is also made on the leading \(R=0.8\) reclustered jet mass (\(m_{1}^{R=0.8}>60\) GeV) in SRA.

In order to reject events with mismeasured \(E_{\text {T}}^{\text {miss}}\) originating from multijet and hadronic \(t\bar{t}\) decays, the minimum difference in azimuthal angle between the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) and the leading four jets (\(\left| \Delta \phi _{\min }\left( \mathbf {p}_{\mathrm {T},1-4},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \)) is required to be greater than 0.4.

The most powerful rejection of background comes from requiring that the object-based \(E_{\text {T}}^{\text {miss}}\) significance (\({\mathcal {S}}\))  [38] is greater than 25 (14) in SRA (SRB). This variable characterises the \(E_{\text {T}}^{\text {miss}}\) according to the \(p_{\text {T}}\), \(p_{\text {T}}\) resolution, and \(\phi \) resolution of all objects in the event, and is defined as:

$$\begin{aligned} {{\mathcal {S}}} = \frac{E_{\text {T}}^{\text {miss}}}{\sqrt{\sigma _{\text {L}}^2(1- \rho _{\text {LT}}^2)}}, \end{aligned}$$

where \(\sigma _{\text {L}}\) is the expected resolution of the total longitudinal momentum (relative to the direction of \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\)) of all objects in the event as a function of the \(p_{\text {T}}\) of each object. Likewise, \( \rho _{\text {LT}}\) is the correlation factor between the longitudinal and transverse momentum resolutions for all objects.

Substantial \(t\bar{t} \) background rejection is provided by additional requirements to reject events in which one \(W \) boson decays via a lepton plus neutrino. The first requirement is that the transverse mass (\(m_{\mathrm {T}}\)) calculated from the \(E_{\text {T}}^{\text {miss}}\) and the \(b\)-tagged jet closest in \(\phi \) to the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) direction and defined as:

$$\begin{aligned} m_{\mathrm {T}}^{b,\mathrm {min}}\ = \sqrt{2\,p_\mathrm {T}^b\,E_{\text {T}}^{\text {miss}} \left[ 1-\cos {\Delta \phi \left( {\mathbf {p}}_\mathrm {T}^b,{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) }\right] }, \end{aligned}$$

must be above 200 GeV. The second requirement consists of vetoing events containing hadronic \(\tau \)-lepton candidates likely to have originated from a \(W \rightarrow \tau \nu \) decay (\(\tau \)-veto).

Table 2 Selection criteria for SRA and SRB. Each signal region is separated into three categories based on reconstructed top candidate masses. A dash indicates that no selection is applied

To reject events that contain \(b\)-tagged jets from gluon splitting, requirements are made on the angular distance between the two leading \(b\)-tagged jets, \(\Delta R\left( b_1,b_2\right) \). In SRB, an additional requirement of \(m_{\mathrm {T}}^{b,\mathrm {max}}>200\) GeV is made, where \(m_{\mathrm {T}}^{b,\mathrm {max}}\) is analogous to \(m_{\mathrm {T}}^{b,\mathrm {min}}\) except that the transverse mass is computed with the \(b\)-tagged jet that has the largest \(\Delta \phi \) relative to the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) direction. This requirement is a more stringent version of \(m_{\mathrm {T}}^{b,\mathrm {min}}\), requiring that the leading two \(b\)-tagged jets are not near the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\).

Finally, to allow the statistical combination of SRA and SRB, SRA is required to have the \(m_{{\mathrm {T2},\chi ^2}}\) variable greater than 450 GeV, while SRB is required to have \(m_{{\mathrm {T2},\chi ^2}}<450\) GeV. The \(m_{{\mathrm {T2},\chi ^2}}\) variable is based on \(m_{\mathrm {T2}}\)  [115, 116] and is constructed from the direction and magnitude of \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) and the direction of each of the top candidates, reconstructed using a \(\chi ^2\)-like method with \(R=0.4\) jets as inputs. The minimisation for finding the top candidates used in \(m_{{\mathrm {T2},\chi ^2}}\) is performed in terms of a \(\chi ^2\)-like penalty function, \(\chi ^2 = (m_{\mathrm {cand}}-m_{\mathrm {true}})^2/m_{\mathrm {true}}\), where \(m_{\mathrm {cand}}\) is the top quark or \(W\) boson candidate mass and \(m_{\mathrm {true}}\) is set to 80.4 GeV for \(W\) boson candidates and 173.2 GeV for top quark candidates.Footnote 5 Initially, single or pairs of \(R=0.4\) jets, whichever configuration results in a mass closest to \(m_W\), form \(W\) boson candidates, which are then combined with additional \(b\text {-jets}\) in the event to construct top quark candidates. When calculating \(m_{{\mathrm {T2},\chi ^2}}\) the momenta of top quark candidates selected by the \(\chi ^2\) method are used, while the masses of the top quarks are set to 173.2 GeV and the invisible particles are assumed to be massless. Table 2 summarises all the selection criteria used in SRA and SRB.

In addition to SRA and SRB, which are optimised for high \(m_{{\tilde{t}}}\) via a statistical combination, a signal region is optimised for discovery. This region, SRA-TT-Disc, has the same requirements as SRA-TT, with the exception of a less stringent requirement of \({\mathcal {S}}> 11\). When setting exclusion limits on specific signal models, SRA-TT-Disc is not considered.

5.2 Signal regions C

SRC is optimised for the case where , a regime in which the signal topology is similar to SM \(t\bar{t}\) production. In the presence of high-momentum ISR jets, the di-top-squark system is boosted in the transverse plane and better discrimination can be achieved. A recursive jigsaw reconstruction technique, as described in Ref.  [117], is used to divide each event into an ISR hemisphere (denoted by ‘ISR’) and a sparticle hemisphere (denoted by ‘S’), where the latter consists of the pair of candidate top squarks.

Table 3 Selection criteria for SRC. The signal regions are separated into five categories based on ranges of \(R_{\mathrm {ISR}}\)

Objects are grouped together according to their proximity in the laboratory frame’s transverse plane by minimising the reconstructed transverse masses of the ISR system and sparticle system, simultaneously over all choices of object assignment. Kinematic variables are then defined, based on this assignment of objects to either the ISR system or the sparticle system.

The ratio of the \(E_{\text {T}}^{\text {miss}}\) to the \(p_{\text {T}}\) of the ISR system (\(p_{\text {T}} ^{\mathrm {ISR}}\)), defined as \(R_{\mathrm {ISR}}\), is proportional to the ratio of the and \({\tilde{t}} \) masses  [118, 119]:

Due to the scaling of \(R_{\mathrm {ISR}}\) with the ratio of to \(m_{{\tilde{t}}}\), signals with are expected to form a peak in the \(R_{\mathrm {ISR}}\) distribution, with the location of the peak depending on divided by \(m_{{\tilde{t}}}\). In order to maximise the sensitivity for a wide range of to \(m_{{\tilde{t}}}\) ratio values, the events are divided into five categories, defined by non-overlapping ranges of \(R_{\mathrm {ISR}}\) and targeting different top squark and masses. For instance, SRC1 is optimised for \(m_{{\tilde{t}}}= 225\) GeV and  GeV, and SRC5 is optimised for \(m_{{\tilde{t}}}= 600\) GeV and  GeV.

In addition, at least four jets must be assigned to the sparticle hemisphere of the event (\(N_{\mathrm {j}}^{\mathrm {S}}\)), and at least two of those jets must be b-tagged. Requirements on \(p_{\text {T}} ^{\mathrm {ISR}}\), the highest-\(p_{\text {T}}\) \(b\)-tagged jet in the sparticle hemisphere (\(p_{\mathrm {T},1}^{\mathrm {S},b}\)), and the fourth-highest-\(p_{\text {T}}\) jet in the sparticle hemisphere (\(p_{\mathrm {T},4}^{\mathrm {S}}\)) are applied. To reject events with poorly measured \(E_{\text {T}}^{\text {miss}}\), the difference in \(\phi \) between the \({\mathbf {p}}_{\mathrm {T}}^{\mathrm {miss,\,track}}\) and \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\), \(\left| \Delta \phi \left( {\mathbf {p}}_\mathrm {T}^\mathrm {miss},{\mathbf {p}}_{\mathrm {T}}^{\mathrm {miss,\,track}}\right) \right| \), is required to be less than \(\pi /3\) and the leading two jets are required to be separated in azimuthal angle from the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\): \(\left| \Delta \phi \left( \mathbf {p}_{\mathrm {T},1-2},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| >0.4\). The transverse mass of the sparticle system and \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\), defined as \(m_{\mathrm {S}}\), is required to be greater than 400 GeV. The ISR system is also required to be separated in azimuthal angle from \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\): \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {ISR}},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| >3.0\). The selection criteria for SRC are summarised in Table 3.

In addition to SRC1–5, a region optimised for discovery, SRC-Disc, is defined. In SRC-Disc, the same requirements as in the other SRCs are applied, with the exception of requiring \(R_{\mathrm {ISR}}>0.5\) and \({\mathcal {S}}> 11\). As with SRA-TT-Disc, when setting exclusion limits on specific signal models, this region is not considered.

5.3 Signal regions D

SRD aims to select four-body top squark decays, for which the kinematic properties depend mainly on . Four-body top squark decays result in final state particles with low \(p_{\text {T}}\), which are particularly challenging to reconstruct. For instance, low-\(p_{\text {T}}\) b-hadrons originating from such decays are usually not contained in jets passing the minimum \(p_{\text {T}} >20\) GeV requirement when , and therefore cannot be tagged the same way as in SRA–C. To circumvent this problem and identify the low-\(p_{\text {T}}\) b-hadrons produced in a larger part of the four-body decay phase space, b-tagging using track-jets with \(p_{\text {T}} >5\) GeV is employed. Three signal region categories, SRD0, SRD1, and SRD2, are defined according to the b-tagged jet multiplicity (zero, one, and two, respectively), and are optimised for  20, 50, 80 GeV, respectively. In SRD0 and SRD1 the presence of at least one b-tagged track-jet is required to recover undetected jets that contain b-hadrons.

An event including a pair of four-body top squark decays with \(E_{\text {T}}^{\text {miss}} >250\) GeV is likely to be caused by the presence of significant ISR emission. Thus the leading non-\(b\)-tagged jet, identified as the ISR jet (\(\mathrm {j}^{\mathrm {ISR}}\)), is required to have large \(p_{\text {T}}\) (\(p_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}}\)), as well as a large azimuthal separation \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \) from \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\).

In order to reject events with mismeasured \(E_{\text {T}}^{\text {miss}}\) originating from multijet and hadronic \(t\bar{t}\) decays, requirements are made on \(E_{\mathrm {T}}^{{\mathrm{miss,\,track}}}\) and \(\left| \Delta \phi \left( {\mathbf {p}}_\mathrm {T}^\mathrm {miss},{\mathbf {p}}_{\mathrm {T}}^{\mathrm {miss,\,track}}\right) \right| \). Further background reduction is required in SRD0 and attained by selecting large \(\left| \Delta \phi _{\min }\left( \mathbf {p}_{\mathrm {T},1-4},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \).

Only low-\(p_{\text {T}}\) jets and track-jets (\(p_{\mathrm {T},1}^{\text {track}}\), \(p_{\mathrm {T},1}^{b,\text {track}}\), \(p_{\mathrm {T},1}^{b}\)) are considered in all three categories. Requirements are also made on b-tagged jet and track-jet pseudorapidities (\(|\eta ^{b,\text {track}}_1|\), \(|\eta ^{b}_1|\), \(|\eta ^{b}_2|\)) to ensure they are in the central region of the detector, which make them more likely to originate from a top squark decay and maximise the b-tagging performance. Only events with high \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) are kept, where \(H_{\text {T}}\) is the scalar sum of the transverse momenta of all jets. This kinematic variable was found to provide better signal versus background discrimination than the object-based \(E_{\text {T}}^{\text {miss}}\) significance when the final state is composed of low-\(p_{\text {T}}\) objects.

Given the absence of on-shell top quarks and \(W\) bosons, no top nor W reconstruction methods are used, such that additional discrimination of the signal from the background relies on differences in angular separation between jets and track-jets. In SRD1 (SRD2), requirements are made on the angular separation between the ISR jet and the \(b\)-tagged jet(s), \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T,1}}^{b}\right) \right| \) (\(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T,1}}^{b}\right) \right| \) and \(\left| \Delta \phi \left( \mathbf {p}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T,2}}^{b}\right) \right| \)), to ensure the \(b\)-tagged jet(s) is (are) well-separated from the ISR jet. In SRD1, the minimum \(\Delta \phi \) between the leading four track-jets and the ISR jet (\(\left| \Delta \phi \left( \mathbf {p}_{\mathrm {T},1-4}^{\mathrm {{j}^{track}}},\mathbf {p}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}}\right) \right| \)) is also required to be large, to separate the low-\(p_{\text {T}}\) top squark decay products from the ISR jet. Further background rejection is required in SRD0 and attained by requiring significant azimuthal separation between the leading b-tagged track-jet and the ISR jet (\(\mathrm {max}\left| \Delta \phi \left( \mathbf {p}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T}}^{b^\mathrm {track}}\right) \right| \)), and between the leading b-tagged track-jet and the next track-jet most likely to contain a b-hadron (\(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T,1}}^{b^{\mathrm {track}}},{\mathbf {p}}_{\mathrm {T,2}}^{b^{\mathrm {track}}}\right) \right| \)). Table 4 summarises the full signal region selections for SRD0-2.

Table 4 Signal region selections for SRD. Variables involving track-jets are denoted with the label ‘track’. A dash indicates that no selection is applied

6 Background estimation

The main SM background process in SRA, SRB, and SRD is \(Z\rightarrow \nu \bar{\nu }\) production in association with heavy-flavour jets. In SRC, \(t\bar{t}\) production dominates, including mostly events where one \(W\) boson decays hadronically and the other \(W\) boson decays via a \(\tau \)-lepton and its corresponding neutrino (\(W +\mathrm {jets}\)). Other important background processes include leptonic \(W \) decays produced in association with heavy-flavour jets, a single top quark produced with a \(W\) boson, and the irreducible background from \(t\bar{t} +Z\), where the \(Z\) boson decays into two neutrinos.

Significant background contributions are estimated primarily from comparisons between data and simulation in specially designed ‘control regions’ (CRs), which have a selection orthogonal to all SRs and aim to enhance a particular background process, while probing a similar event topology. Sufficient data are needed to minimise the statistical uncertainties in the background estimates in the CRs, while the extrapolation from the CR to the SR, evaluated with simulated events, should be as small as possible to reduce the associated systematic uncertainties. Furthermore, CR selection criteria are chosen to minimise potential contamination by signal. The signal contamination is below 10% in all CRs for top squark and neutralino mass combinations that have not yet been excluded at 95% confidence level by previous ATLAS searches  [22,23,24,25, 27, 28].

Separate CRs are defined for SRA–B, SRC and SRD, with the observed number of events in each region included in one of the three dedicated binned profile likelihood fits  [120] of the analysis (SRA–B fit, SRC fit, SRD fit). The CRs are defined so that all CRs associated with a given signal region are orthogonal to the other CRs for that specified region. Partial overlaps remain possible between regions included in different fits. Each likelihood function is built as the product of Poisson probability density functions, describing the observed and expected numbers of events in the control regions. Additional terms, constrained by Gaussian probability density functions accounting for MC statistics and common systematic uncertainties (discussed in Sect. 7) between the control and signal regions and their correlations, are included and treated as nuisance parameters in the fitting procedure.

Control regions targeting the \(Z+\mathrm {jets}\), \(t\bar{t}\), \(W +\mathrm {jets}\), single-top and \(t\bar{t} +Z\) backgrounds are included in the SRA–B fit, while for the SRC fit only a \(t\bar{t}\) control region is defined. For the SRD fit, control regions are defined for \(Z+\mathrm {jets}\), \(t\bar{t}\), and \(W +\mathrm {jets}\) backgrounds. For each fit (SRA–B, SRC, and SRD), the normalisations of these backgrounds are determined simultaneously in order to best match the observed data in each control region, including contributions from all backgrounds (background-only fit). No observed or expected number of events in the signal regions is considered at this stage. In cases where there are multiple control regions for one background in one fit, the fit yields one normalisation which best fits all regions.

Contributions from all-hadronic \(t\bar{t}\) and multijet production are found to be negligible in all signal regions except for SRC, where they are subdominant. These backgrounds are estimated from data collected by single-jet triggers using a jet smearing procedure described in Ref.  [121] and are fixed in the fit, with an uncertainty assigned to them (discussed in Sect. 7). The contributions from all other background processes (diboson, tZ, \(t\bar{t} H\), \(t\bar{t} W\), tWZ) are less than 15% of the total SM background expectations and are fixed at the value expected from the simulation, using the most accurate theoretical cross sections available, while their uncertainties are included as additional nuisance parameters in the fit. In the following, the multijet, diboson, tZ, \(t\bar{t} H\), \(t\bar{t} W\), and tWZ backgrounds are grouped together and referred to as ‘other’.

Validation regions (VRs) are defined for the major sources of background in each signal region such that they are orthogonal to the control regions and the signal regions. They usually suffer from a higher signal contamination (up to 20%) than the CRs, but probe a kinematic region which is closer to that of the SRs. The background normalisation factors from the simultaneous fit are applied to their respective backgrounds and compared with data in each VR to verify good agreement and that the simultaneous fit is well-behaved.

Detailed CR definitions for the estimation of \(Z+\mathrm {jets}\) (CRZ), \(t\bar{t} +Z\) (CRTTZ), \(t\bar{t}\) (CRT), \(W +\mathrm {jets}\) (CRW), and single-top (CRST) backgrounds are described in the following subsections, while a summary of the control region strategy in the SRA–B and SRD fits is shown in Fig. 3. The strategy for SRC only involves one control region (extrapolating from an electron or muon multiplicity of zero in the SR to an electron or muon multiplicity of one in the CR) and one validation region (extrapolating over \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {ISR}},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \)) for the dominant \(t\bar{t}\) background.

Fig. 3
figure 3

A summary of the background control region strategy used in the a SRA–B and b SRD fits. The orthogonality between the \(Z+\mathrm {jets}\) (Z), \(t\bar{t} +Z\) (TTZ), \(t\bar{t}\) (T), \(W +\mathrm {jets}\) (W), and single-top (ST) backgrounds’ control regions and the signal and validation regions (SR+VR) included in the SRA–B fit rely on the number of leptons, \(N_{\ell }\), and the number of b-tagged jets, \(N_{b}\). T and ST are made orthogonal by selecting either low-\(p_{\text {T}}\) (\(<20\) GeV) or high-\(p_{\text {T}}\) (\(>27\) GeV) leptons, respectively. The orthogonality between the \(Z+\mathrm {jets}\) (Z), \(t\bar{t}\) (T), and \(W +\mathrm {jets}\) (W) backgrounds’ control regions and the signal and validation regions (SR+VR) included in the SRD fit relies on \(N_{\ell }\) and, for \(N_{\ell } =1\), the angular distance between the lepton and the closest b-tagged jet (b-tagged track-jet in CRWD0), \(\Delta R(b,\ell )\). Placeholders for the values of \(\Delta R(b,\ell )\) are shown as \(Y_1\) and \(Y_2\) and vary in different SRD regions depending on \(N_{b}\). Additional selections not appearing on the sketches ensure orthogonality between the SR and the VR. Additional extrapolations from CRs to SRs in other kinematic quantities not necessarily shown in this sketch are region-specific and detailed in the text

Table 5 Selection criteria for the \(Z+\mathrm {jets}\) control regions. The defining extrapolation for these control regions is over the number of leptons; two electrons or muons (\(\ell \)) from \(Z\) decays are required, compared with zero leptons in the signal regions. A dash indicates that no selection is applied. Variables for which the signal and control region requirements differ are highlighted by a thick border around the cell that contains the requirement. Requirements are made on the following variables in the signal regions but have no equivalent requirement in the control regions: \(\tau \)-veto, \(m_{1}^{R=0.8}\), \(j^{R=1.2}_1(b)\), \(j^{R=1.2}_2(b)\), \(\Delta R\left( b_1,b_2\right) \), \(m_{\mathrm {T}}^{b,\mathrm {max}}\), \(m_{{\mathrm {T2},\chi ^2}}\), \(E_{\mathrm {T}}^{{\mathrm{miss,\,track}}}\), and \(\left| \Delta \phi \left( {\mathbf {p}}_\mathrm {T}^\mathrm {miss},{\mathbf {p}}_{\mathrm {T}}^{\mathrm {miss,\,track}}\right) \right| \)

6.1 \(Z+\mathrm {jets}\) background estimation

The normalisation of the simulation of \(Z\rightarrow \nu \bar{\nu }\) produced in association with heavy-flavour jets is estimated from \(Z\rightarrow e^{+}e^{-}\) and \(Z\rightarrow \mu ^{+}\mu ^{-}\) events produced in association with heavy-flavour jets, which is the strategy adopted for SRA–B (CRZAB) and SRD (CRZD). Data events passing a single-electron or single-muon trigger are considered, and events with two control electrons or two control muons with opposite charge are selected. In CRZAB (CRZD), \(p_{\text {T}} ^\ell > 27, 20\) GeV (\(p_{\text {T}} ^\ell > 30, 20\) GeV) is required for the leading and subleading leptons, respectively, which must also have an invariant mass within 10 GeV of the \(Z\) boson mass, \(m_{Z}=91\) GeV. Events with \(E_{\text {T}}^{\text {miss}}\ > 50\) GeV (\(E_{\text {T}}^{\text {miss}}\ > 70\) GeV) in CRZAB (CRZD) are discarded in order to reject \(t{\bar{t}}\) events. The transverse momenta of the selected electrons or muons are vectorially added to the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) to mimic the \(Z(\rightarrow \nu \bar{\nu })\,+\,\mathrm {jets}\) decays in the SRs, forming the quantity \(E_{\mathrm {T}}^{\mathrm {miss'}}\). High-\(p_{\text {T}}\) \(Z\) bosons are then effectively selected by requiring large \(E_{\mathrm {T}}^{\mathrm {miss'}}\).

Recalculated quantities that use \(E_{\mathrm {T}}^{\mathrm {miss'}}\) instead of \(E_{\text {T}}^{\text {miss}}\) are identified by the addition of a prime (e.g. \(m_{\mathrm {T}}^{b,\mathrm {min}'}\)). Where possible, the CR selection criteria are identical to the criteria used in the signal region; however, the criteria for key variables such as \(m_{\mathrm {T}}^{b,\mathrm {min}'}\) and \({\mathcal {S}}'\) for CRZAB, or \(E_{\mathrm {T}}^{\mathrm {miss'}}/\sqrt{H_{\text {T}}}\) for CRZD, are loosened to enhance the number of data events in the CR. The \(Z+\mathrm {jets}\) CR included in the SRA–B (SRD) fit is split into two (three) categories depending on \(m_{2}^{R=1.2}\) (\(N_{b}\)), to minimise the extrapolation across the various SR categories. There are only two categories in CRZAB, CRZAB-TTTW (representing the background in the TT and TW signal categories) and CRZAB-T0, due to the limited number of data events. The detailed set of selection criteria for the \(Z+\mathrm {jets}\) CRs are presented in Table 5; representative distributions for CRZ variables that have looser requirements than in the SRs are shown in Fig. 4.

6.2 \(t\bar{t} +Z\) background estimation

The SM production of \(t\bar{t} +Z\), where \(Z\rightarrow \nu \bar{\nu }\), is a significant source of background in SRA and SRB and is largely irreducible. To estimate this background, a three-lepton (electrons and muons) region is defined, to maximise the purity of \(t\bar{t} +Z\).

Events that pass a single-electron or single-muon trigger are selected. The trigger electron or muon must pass the requirements for a control electron or muon and have offline \(p_{\text {T}} >27\) GeV. Exactly two additional control leptons (electrons or muons) with \(p_{\text {T}} > 20\) GeV are required. The sum of the charges of the three leptons is required to equal 1 or \(-1\), while two of the leptons are required to have the same flavour and opposite charge. The pair of same-flavour, opposite-sign leptons that is most consistent with the \(Z\) boson mass forms the \(Z\) boson candidate and is required to have an invariant mass satisfying \(81\text {Ge}\text {V}<m (\ell ,\ell )<101\) GeV. The \(Z\) boson candidate is required to have \(p_{\text {T}} >200\) GeV. The remaining lepton and the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\) are treated as non-b-tagged jets in the computation of all jet-related variables (such as \(p_{\text {T}}\)), to mimic hadronic \(W \) decays.

Four jets are required to be in the event, in addition to the lepton not associated with the \(Z\) boson candidate and the \({\mathbf {p}}_\mathrm {T}^\mathrm {miss}\), and two of the jets are required to be \(b\)-tagged jets. The selection criteria are summarised in Table 6. Representative distributions for CRTTZ variables that have looser requirements than in the SRs are shown in Fig. 5.

6.3 \(t\bar{t}\), \(W +\mathrm {jets}\), and single-top background estimation

The \(t\bar{t}\) background in SRB, SRC, and SRD originates from events where a \(W \) boson decays into a hadronically decaying \(\tau \)-lepton, where the \(\tau \)-lepton is either not reconstructed (due to falling below the jet \(p_{\text {T}}\) threshold of 20 GeV), or is reconstructed as a jet. In order to model this process in the CRs, events that pass the same \(E_{\text {T}}^{\text {miss}}\) trigger as the signal region, but also have a control electron or muon, are selected. The electron or muon is used as a proxy for the \(\tau \)-lepton in the SRs.

Fig. 4
figure 4

Distributions illustrating the level of agreement between data (points) and the SM expectation (stacked histograms, after simultaneously fitting to all backgrounds) in several \(Z+\mathrm {jets}\) control regions: a \({\mathcal {S}}'\) and b \(m_{{\mathrm {T2},\chi ^2}}\) for CRZAB-TTTW, c \(\Delta R\left( b_1,b_2\right) \) for CRZAB-T0, and d \(E_{\mathrm {T}}^{\mathrm {miss'}}/\sqrt{H_{\text {T}}}\) for CRZD0. The hatched uncertainty band around the SM expectation includes the combination of MC statistical, theory-related and detector-related systematic uncertainties. The rightmost bin in each plot includes all overflows

Table 6 Selection criteria for the \(t\bar{t} +Z\) control region. The defining extrapolation for these control regions is over the number of leptons; three leptons (a combination of electrons and muons) from \(W \) and \(Z\) decays is required, compared with zero leptons in the signal region. Variables for which the signal and control region requirements differ are highlighted by a thick border around the cell that contains the requirement. Requirements are made on the following variables in SRA and SRB but have no equivalent requirement in the control region: \(\left| \Delta \phi _{\min }\left( \mathbf {p}_{\mathrm {T},1-4},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \), \(m_{\mathrm {T}}^{b,\mathrm {min}}\), \(\tau \)-veto, \(m_{1}^{R=1.2}\), \(m_{2}^{R=1.2}\), \(m_{1}^{R=0.8}\), \(j^{R=1.2}_1(b)\), \(j^{R=1.2}_2(b)\), \(\Delta R\left( b_1,b_2\right) \), \(m_{\mathrm {T}}^{b,\mathrm {max}}\), \({\mathcal {S}}\), and \(m_{{\mathrm {T2},\chi ^2}}\)

In SRA and SRB, the hadronically decaying \(\tau \)-leptons are most likely to have fallen below the jet \(p_{\text {T}} >20\) GeV requirement, such that for the \(t\bar{t} \) and \(W +\mathrm {jets}\) control regions (CRTAB and CRWAB, respectively), exactly one control electron in the range \(4.5< p_{\text {T}} ^{e} < 20\) GeV or muon in the range \(4.0< p_{\text {T}} ^{\mu } < 20\) GeV is required. In SRC and SRD, the hadronically decaying \(\tau \)-leptons have higher \(p_{\text {T}}\), such that one control electron or muon with \(p_{\text {T}} > 20\) GeV is required, and is treated as a non-b-tagged jet in the computation of all jet-related variables.

In the \(t\bar{t} \) control regions (CRTC, CRTD), the angular separation between the electron or muon and the \(b\)-tagged jet closest to the electron or muon, \(\Delta R(b,\ell )\), is used to enhance the \(t\bar{t}\) purity. In CRTD, \(\Delta R(b,\ell )\) is also used to ensure orthogonality with the \(W +\mathrm {jets}\) control region (CRWD). All \(t\bar{t} \) control regions (CRTAB, CRTC, CRTD) have an upper bound on \(m_{\mathrm {T}}\left( \ell ,{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \) to preserve orthogonality between the CRs and the signal regions of other ATLAS ongoing studies in the one-lepton plus missing transverse momentum channel, as well as to reduce potential signal contamination. In addition to the variables used in SRC, CRTC has a \(m_{\mathrm {V}}/m_{\mathrm {S}}< 0.75\) requirement, where \(m_{\mathrm {S}}\) is the variable used in SRC and \(m_{\mathrm {V}}\) is the invariant mass of all visible objects, which provides additional signal rejection. The \(t\bar{t} \) CR included in the SRD fit is split into two categories (CRTD1 or CRTD2, which require exactly one or at least two b-tagged jets, respectively) to minimise the extrapolation across the various SR categories. The various \(t\bar{t}\) control regions designed for the analysis are defined in Table 7. Representative distributions are shown in Fig. 6.

The \(W +\mathrm {jets}\) background is important for SRA–B and SRD, while the single-top background is significant for SRA–B only; corresponding control regions (CRWAB, CRWD, and CRSTAB, respectively) are defined in Table 8. The \(W +\mathrm {jets}\) background in SRA–B originates from \(W \) boson decays into low-\(p_{\text {T}}\) \(\tau \)-leptons; thus, the strategy is similar to that described for CRTAB except that exactly one b-tagged jet is required, which makes CRWAB orthogonal to CRTAB. The single-top control region, CRSTAB, is defined as having exactly one control electron or muon with \(p_{\text {T}} > 20\) GeV (making CRSTAB orthogonal to both CRWAB and CRTAB) and two or more b-tagged jets. A requirement of \(p_{\text {T}} > 20\) GeV is used in CRWD because the \(W +\mathrm {jets}\) background in SRD is dominated by high-\(p_{\text {T}}\) electrons, muons, and \(\tau \)-leptons. To enhance the purity of the \(W +\mathrm {jets}\) background in CRWD and ensure orthogonality with CRTD, lower bounds are put on \(\Delta R(b,\ell )\), which is defined with respect to the b-tagged jet (b-tagged track-jet) closest to the lepton in CRWD1–2 (CRWD0). Representative distributions for the various \(W +\mathrm {jets}\) and single-top control regions defined in the analysis are shown in Fig. 7.

Fig. 5
figure 5

Distributions illustrating the level of agreement between data (points) and the SM expectation (stacked histograms, after simultaneously fitting to all backgrounds) in the \(t\bar{t} +Z\) control region: a \({\mathcal {S}}\) and b \(p_{\mathrm {T}} (\ell ,\ell )\) for CRTTZ. The hatched uncertainty band around the SM expectation includes the combination of MC statistical, theory-related and detector-related systematic uncertainties. The rightmost bin in each plot includes all overflows

6.4 Validation of background estimates

The background normalisation factors derived from the SRA–B, SRC and SRD background-only fits are summarised in Fig.8. Most normalisation factors are within 1\(\sigma \) of unity, where \(\sigma \) denotes the total uncertainty, including the data statistical uncertainty in the CRs and the theory-related and detector-related systematic uncertainties (described in Sect. 7). However, the \(t\bar{t} \) (\(t\bar{t} \) and \(Z+\mathrm {jets}\)) normalisation factors derived from the SRC (SRD) fit are lower than unity by one to two \(\sigma \). Significant amounts of ISR radiation are required in SRC, SRD, and the associated control regions, unlike SRA–B and the associated control regions. The simulated event yields in \(t\bar{t} \)-enriched regions compare differently with data in SRA–B control regions and SRC–D control regions, overestimating the number of events in the latter, while fairly good agreement is observed in the former. A similar effect is observed in CRZAB and CRZD. These observations point to a mismodelling possibly related to the ISR system in \(t\bar{t} \) and \(Z+\mathrm {jets}\) events. The fitting procedure corrects for this mismodelling and is validated in the VRs discussed below.

Table 7 Selection criteria for the \(t\bar{t} \) control regions. The defining extrapolation for these control regions is over the number of leptons; one electron or muon (\(\ell \)) from \(W \) decays is required, compared with zero leptons in the signal region. A dash indicates that no selection is applied. Variables for which the signal and control region requirements differ are highlighted by a thick border around the cell that contains the requirement. Requirements are made on the following variables in the signal regions but have no equivalent requirement in the control regions: \(R_{\mathrm {ISR}}\), \(\tau \)-veto, \(m_{1}^{R=0.8}\), \(j^{R=1.2}_1(b)\), \(j^{R=1.2}_2(b)\), \(m_{\mathrm {T}}^{b,\mathrm {max}}\), \(m_{{\mathrm {T2},\chi ^2}}\)

Validation regions are defined to check the validity of the normalisation factors in the signal regions and to check the ability of the MC to describe the shapes of the kinematic variables over which extrapolations are made in propagating background estimates from the control regions to the signal regions. The defining extrapolation from control to signal regions is in the lepton multiplicity, whereas the validation regions include only events with zero leptons, as in the signal regions. Validation regions are designed for the \(Z+\mathrm {jets}\) background in SRA (VRZA) and SRB (VRZB-TTTW, VRZB-T0) and SRD (VRZD0–2), as well as for the \(t\bar{t}\) background in SRA–B (VRTAB), SRC (VRTC), and SRD (VRTD1–2). Requirements applied in the SRs are modified in the VRs to ensure orthogonality with the SRs, to limit signal contamination, and to retain a sufficient number of events expected in data. Signal contamination in the VRs, for all signals considered in this search, is kept below 20%.

Fig. 6
figure 6

Distributions illustrating the level of agreement between data (points) and the SM expectation (stacked histograms, after simultaneously fitting to all backgrounds) in the \(t\bar{t}\) control regions: a \(m_{\mathrm {T}}^{b,\mathrm {max}}\) for CRTAB, b \(R_{\mathrm {ISR}}\) for CRTC, and  \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) for c CRTD1 and d CRTD2. The hatched uncertainty band around the SM expectation includes the combination of MC statistical, theory-related and detector-related systematic uncertainties. The rightmost bin in each plot includes all overflows

VRZA is made to be orthogonal to SRA–B by vetoing events where the leading reclustered jet contains a \(b\)-tagged jet while still requiring at least two \(b\)-tagged jets. Orthogonality between VRZB-TTTW and SRA–B is achieved by inverting the \(\Delta R\left( b_1,b_2\right) \) requirement made in SRB, \(\Delta R\left( b_1,b_2\right) <1.4\), and selecting a lower \({\mathcal {S}}\) window than in SRA, \(15<{\mathcal {S}}<17\). For VRZD0 and VRZD1–2, the orthogonality with SRD is ensured by inverting the \(\mathrm {max}\left| \Delta \phi \left( \mathbf {p}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T}}^{b^\mathrm {track}}\right) \right| \) and \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T,1}}^{b}\right) \right| \) requirements, respectively.

VRTAB is orthogonal to SRA–B due to the inversion of the \(m_{\mathrm {T}}^{b,\mathrm {min}}\) requirement, while VRTC is orthogonal to SRC by inverting the \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {ISR}},{\mathbf {p}}_\mathrm {T}^\mathrm {miss}\right) \right| \) requirement. In VRTD1–2, the \(\left| \Delta \phi \left( {\mathbf {p}}_{\mathrm {T}}^{\mathrm {j}^{\mathrm {ISR}}},\mathbf {p}_{\mathrm {T,1}}^{b}\right) \right| \) requirement is inverted, as is done in VRZD1–2. The purity of \(t\bar{t}\) events in VRTD1 is enhanced by introducing a \(\Delta R\left( b_1,b_2\right) >2.0\) requirement. Only one \(b\)-tagged jet is required in VRTD1 (as in SRD1) and therefore \(\Delta R\left( b_1,b_2\right) \) is defined as the angular distance between the one \(b\)-tagged jet and the next jet most likely to contain a b-hadron, quantified by the b-tagging discriminator output value.

Table 8 Selection criteria for the \(W +\mathrm {jets}\) and single-top control regions. The defining extrapolation for these control regions is over the number of leptons; one electron or muon (\(\ell \)) from \(W \) decays is required compared with zero leptons in the signal regions. A dash indicates that no selection is applied. Variables for which the signal and control region requirements differ are highlighted by a thick border around the cell that contains the requirement. Requirements are made on the following variables in the signal regions but have no equivalent requirement in the control regions: \(m_{2}^{R=1.2}\), \(m_{1}^{R=0.8}\), \(j^{R=1.2}_1(b)\), \(j^{R=1.2}_2(b)\), \(m_{\mathrm {T}}^{b,\mathrm {max}}\), \(m_{{\mathrm {T2},\chi ^2}}\)
Fig. 7
figure 7

Distributions illustrating the level of agreement between data (points) and the SM expectation (stacked histograms, after simultaneously fitting to all backgrounds) in several \(W +\mathrm {jets}\) and single-top control regions: a \(m_{{\mathrm {T2},\chi ^2}}\) for CRWAB, \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) for b CRWD0 and c CRWD1, and d \(m_{\mathrm {T}}^{b,\mathrm {max}}\) for CRSTAB. The hatched uncertainty band around the SM expectation includes the combination of MC statistical, theory-related and detector-related systematic uncertainties. The rightmost bin in each plot includes all overflows

Fig. 8
figure 8

A summary of the normalisation factors determined from the various background-only fits. The total number of data events (points) and the SM expectation (stacked histograms) are shown in each control region before the fit. The uncertainty associated with the SM expectation includes the combination of MC statistical uncertainties, theory-related and detector-related systematic uncertainties. The normalisation factor applied to each background source (\(\mu _{\text {bkg}}\)) after the fit and respective uncertainty, including the combination of MC statistical uncertainties, theory-related and detector-related systematic uncertainties, is shown in the lower panel. The control regions included in the SRA–B, SRC and SRD fits are separated by vertical dashed lines

Representative distributions for the validation regions defined in the analysis are shown in Fig.9. A summary of the expected and observed yields in the VRs after the SRA–B, SRC and SRD background-only fits is shown in Fig.reffig:VRSummary. All the background predictions in the VRs agree with the data within 1\(\sigma \) except the predictions in VRZD2, which agree with the data within 2\(\sigma \).

7 Systematic uncertainties

Uncertainties affecting the sensitivity of the analysis which originate from statistical sources are considered together with systematic uncertainties related to the detector calibration (detector-related uncertainties) and physics modelling of signal and background (theory-related uncertainties). The data statistical uncertainty in the number of events in the SRs dominates the total uncertainty in SRA and SRD, while uncertainties related to the physics modelling of the background play a significant role in SRB and SRC.

The impact of detector-related and theory-related systematic uncertainties in the background predictions are included in the profile likelihood fits (described in Sect. 6) as nuisance parameters constrained by Gaussian probability density functions. Their impact is reduced by scaling the dominant background components in the SRs using the data observed in the CRs via the introduction of free-floating normalisation parameters. After the SRA–B and SRD (SRC) background-only fit, none of the nuisance parameters are pulled significantly and most (all) of them are not constrained. The largest constraints are observed in the SRD fit and are on the \(t{\bar{t}}\) modelling uncertainties and reach 30%.

The dominant systematic uncertainties in the background estimates in SRA and SRB (SRC and SRD), expressed as percentages of the total background expectations, are shown in Table 9 (Table 10). By convention, the data statistical uncertainty in the numbers of events in the CRs is accounted for as a systematic uncertainty and included in table rows indicated by the normalisation factors for each background source (\(\mu _{\text {bkg}}\)).

The dominant detector-related systematic uncertainty in the background estimates originates from sources related to the jet energy scale (JES) and resolution (JER)  [104], which encompass both the modelling of the detector response and the analysis techniques used to derive the calibration, the b-tagging performance, which includes the uncertainty in the b-tagging efficiency of true b-jets  [39] and in the b-tagging rate of light-flavour jets  [123] and c-jets  [124], and the energy scale and resolution of the \(E_{\text {T}}^{\text {miss}}\) soft term  [113, 114]. The uncertainty in the modelling of pile-up events contributes significantly to the total uncertainty only in SRD2.

The JES uncertainty is derived as a function of the \(p_{\text {T}} \) and \(\eta \) of each jet, the pile-up conditions, and the jet flavour. It is determined using a combination of simulated samples and collision data, including measurements of dijet, multijet, \(Z+\)jet, and \(\gamma +\)jet events. The JER uncertainty is derived as a function of the \(p_{\text {T}} \) and \(\eta \) of each jet, and is determined from a random cone technique applied to data recorded without selection bias, and studies of asymmetries in dijet events. The uncertainty in the JER is significant in many signal regions (maximally \(18\%\) in SRD1), while the most significant impact of the JES uncertainty reaches \(6\%\) in SRC5.

The uncertainty originating from the b-tagging performance of jets is estimated by varying the \(p_{\text {T}} \)- and flavour-dependent per-jet scale factors, applied to each jet, within predefined ranges determined from efficiency and mis-tag rate measurements in data. The b-tagging uncertainty is highest in SRA and SRD and does not exceed \(7\%\) (reached in SRD2).

Uncertainties in the b-tagging performance of track-jets, which are only relevant in SRD, are estimated for track-jets with \(p_{\text {T}} >10\) GeV in the same way as for jets. The largest contribution from this systematic uncertainty is in SRD1, where it is 7%. For track-jets with \(5<p_{\text {T}} <10\) GeV, which may be selected in SRD0, no evaluation of the b-tagging performance in data is available, so the uncertainty is evaluated by comparing the b-tagging performance observed in \(Z \rightarrow \ell ^{+}\ell ^{-}\) events generated with Sherpa and MadGraph interfaced with Pythia 8.2, resulting in an uncertainty of 7% in SRD0. The difference between the predictions of these two generators was found to always be larger than the difference between the predictions from extrapolation from the neighbouring bins, hence assigning the uncertainty based on the generator comparison is assumed to be conservative.

All jet-, electron-, and muon-related uncertainties are propagated to the calculation of the \(E_{\text {T}}^{\text {miss}}\) when evaluated, and additional uncertainties in the energy scale and resolution of the \(E_{\text {T}}^{\text {miss}}\) soft term are evaluated. The uncertainty in the soft term of the \(E_{\text {T}}^{\text {miss}}\) is derived using \(Z\rightarrow \mu ^{+}\mu ^{-}\) events and is less than \(3\%\) in all SRs. The uncertainty due to the reweighting of the simulated samples to match the distribution of pile-up in data is negligible in all SRs except SRD, where it is at most \(12\%\) (reached in SRD1).

Uncertainties in electron and muon reconstruction and identification uncertainties are also considered but have a negligible impact on the final background estimates. The uncertainty in the combined 2015–2018 integrated luminosity is \(1.7\%\)  [64, 65] and has a negligible impact on the analysis.

Theoretical uncertainties in the physics modelling of the background processes are also evaluated for each background component. For the \(t\bar{t}\) background, uncertainties are estimated from the comparison of different matrix-element calculations (Powheg-Box vs aMC@NLO), the choice of parton-showering model (Pythia vs Herwig 7), and the emission of ISR and final-state radiation (FSR) within Pythia 8 while leaving all other parameters for each comparison unchanged. The effects of ISR and FSR are explored by reweighting the nominal \(t\bar{t}\) events in a manner that reduces (reduces and increases) initial-state (final-state) parton-shower radiation  [125] and by using an alternative Powheg-Box v2 + Pythia 8.2 sample with \(h_{\text {damp}}\) set to \(3m_t\) and parameter variation group Var3 (described in Ref.  [125]) increased, leading to increased ISR. SRC is most sensitive to \(t\bar{t}\) theory systematic uncertainties, ranging from 11% to 21%, followed by SRB, ranging from 4% to 11%.

Most of the single-top background events include a \(W \) boson in the final state (\(W t\)). To account for the interference between \(W t\) and \(t\bar{t}\) production, yields in the signal and control regions are compared between a \(W t\) simulated sample that uses the diagram-subtraction scheme, and the nominal sample that uses the diagram-removal scheme  [126]. The final single-top uncertainty relative to the total background estimate is maximally 7% (in SRA-TT).

For the \(t\bar{t}\) +W/Z background, largely dominated by \(t\bar{t} +Z\), the modelling uncertainty is estimated through variations of the renormalisation and factorisation scales simultaneously by factors of 2.0 and 0.5, and a comparison of parton-showering models (Pythia vs Herwig 7), resulting in a maximum uncertainty of 4% (in SRB-TT).

Fig. 9
figure 9

Distributions illustrating the level of agreement between data (points) and the SM expectation (stacked histograms, after simultaneously fitting to all backgrounds) in several validation regions: a \(m_{2}^{R=1.2}\) in VRZA, b \(m_{\mathrm {T}}^{b,\mathrm {min}}\) in VRZB-TTTW, c \(p_{\text {T}} ^{\mathrm {ISR}}\) in VRTC, and d \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) in VRZD1. The hatched uncertainty band around the SM expectation includes the combination of MC statistical, theory-related and detector-related systematic uncertainties. The rightmost bin in each plot includes all overflows

The modelling uncertainties for the \(W/Z+\mathrm {jets}\) background processes due to missing higher orders are evaluated  [127] using both coherent and independent variations of the QCD factorisation and renormalisation scales in the matrix elements by factors of 0.5 and 2, avoiding variations in opposite directions. The matrix-element matching scale between jets from the matrix element and the parton shower, and the resummation scale for soft gluon emission within Sherpa, are also varied by factors of 0.5 and 2. The resulting impact on the total background yields from the \(W/Z+\mathrm {jets}\) modelling is at most \(7\%\) (in SRD0).

Uncertainties in each background from scale variations are fully correlated across regions and categories, and uncorrelated between processes. In some cases this may result in uncertainties cancelling out, while the higher-order corrections may not cancel out. The sensitivity of the results to the correlation assumptions was tested by redoing the fit with scale variations uncorrelated across all regions and categories, which resulted in negligible changes in the excluded cross sections near the edge of the exclusion region.

Detector and theory-related systematic uncertainties in signal yields are also evaluated when setting exclusion limits on specific signals (see Sect. 8). Detector-related uncertainties consider the same sources as for the background and are usually smaller than the modelling uncertainties. Signal theory uncertainties include sources related to signal acceptance, which are included in the profile likelihood fits as a single nuisance parameter, and the uncertainty in the total cross section, which is accounted for by repeating the exclusion procedure for the central and \(\pm 1\sigma \) values of the cross section. The uncertainty in the total cross section is 7–16% for direct top squark production  [75, 128,129,130,131,132,133,134], depending on the top squark mass. The same uncertainty is used for leptoquark production, due to the similarities between the two types of signals.

The main detector-related uncertainties in the signal yields originate from the JER, ranging from 2% to 15%, the JES, ranging from 2% to 20%, and the b-tagging performance (including track-jet b-tagging in SRD), ranging from 2% to 11%. Uncertainties in the signal acceptance due to variations of the renormalisation and factorisation scales and the matching scale (each varied up and down by a factor of two), and the parton-shower tune variations, are also taken into consideration. In regions where a high-\(p_{\text {T}}\) ISR system is selected (SRC and SRD), the \(p_{\text {T}}\) scale of the ISR is large enough such that the leading ISR jet is included in the matrix elements. The uncertainty in the ISR is therefore accounted for when varying the renormalisation and factorisation scales. The total uncertainty in the signal acceptance, considering the full range of \(m_{{\tilde{t}}}\) and used in this search, is at most 12–13% in SRA–C, and 25% for SRD.

Fig. 10
figure 10

The total number of data events (points) and the SM expectation (stacked histograms) in all validation regions after the SRA–B, SRC and SRD background-only fits. The stacked histograms show the SM prediction and the hatched uncertainty band around the SM prediction shows the total uncertainty, which includes the MC statistical uncertainties, theory-related and detector-related systematic uncertainties. The lower panel shows the significance of the difference between data and the background prediction calculated with the method described in Ref.  [122]. The validation regions considered in the SRA–B, SRC and SRD fits are separated by vertical dashed lines

8 Results and interpretation

The background originating from SM processes is determined separately for each set of signal regions (SRA–B, SRC, SRD) from three profile likelihood ‘background-only’ fits (SRA–B fit, SRC fit, SRD fit) that include the relevant control regions as described in Sect. 6. The observed event yields in the various SRA–B, SRC and SRD categories are compared with the post-fit background estimates in Tables 1112,  13, and Fig.11. In the SRs optimised for discovery, SRA-TT-Disc and SRC-Disc, 14 and 28 events are observed, respectively, compared with \(15.2\pm 1.8\) and \(28.0\pm 4.9\) expected events, respectively. Figure 12 shows the distribution of \({\mathcal {S}}\) in SRA-TW, \(m_{1}^{R=1.2}\) in SRB-TT, \(R_{\mathrm {ISR}}\) in SRC, and \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) in SRD0, SRD1 and SRD2. The background predictions are scaled to the values determined from the background-only fits.

Table 9 Systematic uncertainties (in percent) greater than \(1\%\) for at least one category within SRA and SRB. Uncertainties are expressed relative to the total background estimates. The uncertainties due to the scaling of background events based on data in control regions are indicated for each background component by \(\mu _{t\bar{t}}\), \(\mu _{t\bar{t} +Z}\), \(\mu _{Z}\), \(\mu _{W}\), and \(\mu _{\mathrm {single~top}}\). The theory uncertainties quoted for each background include the different distribution shape uncertainties described in the text

Observed event yields are in good agreement with the background estimates in all the signal regions. The significance of a data excess with respect to the background predictions can be quantified by the probability (p) of a background-only hypothesis to be more signal-like than what is observed. To evaluate these probabilities in each signal region category, alternative fit configurations (discovery fits) are defined. Each discovery likelihood function is defined as the product of the Poisson probability density function describing the numbers of events of a single signal region category and the background-only likelihood function associated with that signal region. An additional parameter, the signal strength, defined for positive values and corresponding to the signal normalisation in the signal region, is included and free-floating in the fit. The smallest p-value, assuming the background-only hypothesis, is 0.03, corresponding to 1.87\(\sigma \), in SRB-TT. In this signal region, 67 events are observed compared with \(46.7\pm 6.7\) expected events. The largest deficit in the data is found in SRA-T0 where 11 events are observed compared with \(17.3\pm 1.7\) expected events.

Model-independent upper limits set at 95% CL on the number of beyond the SM (BSM) events in each signal region are derived using the \(\hbox {CL}_\mathrm {s}\) prescription  [135] and neglecting any possible signal contamination in the control regions. Normalising these by the integrated luminosity of the data sample, they are interpreted as upper limits on the visible BSM cross section, \(\sigma _{\mathrm {vis}}\), where \(\sigma _{\mathrm {vis}}\) is defined as the product of the acceptance, reconstruction efficiency and production cross section. The results from the discovery fits are shown in Table 14, with SRA-TT having the lowest upper bound on the visible cross section of new physics phenomena of 0.04 fb.

Table 10 Systematic uncertainties (in percent) greater than \(1\%\) for at least one category within SRC and SRD. Uncertainties are expressed relative to the total background estimates. The uncertainties due to the scaling of background events based on data in control regions are indicated for each background component by \(\mu _{t\bar{t}}\), \(\mu _{Z}\), and \(\mu _{W}\). The theory uncertainties quoted for each background include the different distribution shape uncertainties described in the text
Table 11 Observed event yields in SRA and SRB compared with the expected SM background yields in each signal region after the SRA–B background-only fit. The uncertainties include MC statistical uncertainties, detector-related systematic uncertainties, and theoretical uncertainties
Table 12 Observed event yields in SRC compared with expected SM background yields in each signal region after the background-only fit. The uncertainties include MC statistical uncertainties, detector-related systematic uncertainties, and theoretical uncertainties
Table 13 Observed and expected yields after the background-only fit, for SRD. The uncertainties include MC statistical uncertainties, detector-related systematic uncertainties, and theoretical uncertainties
Fig. 11
figure 11

Event yields comparing data (points) to the SM prediction (stacked histograms) in all signal regions after the SRA–B, SRC and SRD background-only fits. The hatched uncertainty band around the SM prediction shows the total uncertainty, which includes the MC statistical uncertainties, detector-related systematic uncertainties, and theoretical uncertainties. The signal regions included in the SRA–B, SRC and SRD fits are separated by vertical dashed lines

A profile-likelihood-ratio test is defined in order to set limits on direct pair production of top squarks. A new fit configuration is defined (referred to as an exclusion fit) for each signal region category (SRA–B, SRC or SRD), where the Poisson probability density functions describing the observed and expected numbers of events in all relevant signal region bins are included in the likelihood function, and the signal-strength parameter, defined for positive values, is free-floating in the fit. Signal contamination in the CRs is taken into account. Limits are derived using the \(\hbox {CL}_\mathrm {s}\) prescription and calculated from asymptotic formulae  [136]. The nominal event yield in each set of SRs is set using the background-level estimates obtained from a background-only fit to both the CRs and the SRs to determine the expected limits, while a coloured band that represents the \(\pm 1\sigma \) of the total uncertainty (\(\sigma _{\mathrm {exp}}\)) is also evaluated. The observed event yields and the same background estimates are used to determine the observed limits for each set of SRs (SRA–B, SRC and SRD); these are evaluated for the nominal signal cross sections as well as for \(\pm 1\sigma \) theory uncertainties in those cross sections, denoted by \(\sigma ^{\mathrm {SUSY}}_{\mathrm {theory}}\).

Figure 13 shows the observed and expected exclusion contours at 95% CL as a function of (a) the mass vs the \({\tilde{t}}\) mass and (b) vs the \({\tilde{t}}\) mass. The exclusion contour is obtained by choosing the exclusion fit configuration (SRA–B, SRC or SRD) with the best expected limit for each signal hypothesis. The data included in the SRA–B fit, together with previous limits from the overlay of zero-, one-, and two-lepton channels and other analyses  [22,23,24,25, 27, 28], exclude top squark masses up to 1250 GeV for masses below 200 GeV. Additional constraints are set by the SRC fit in the case where , for which top squark masses in the range \(300\mathrm {-}630\) GeV are excluded. Some structures in the expected exclusion contour are observed in this region and were traced back to the fixed \(R_{\mathrm {ISR}}\)-binning adopted in SRC. Since the binning of \(R_{\mathrm {ISR}}\) is fixed and does not depend on the signal considered, for some signals the peak is located at the boundary between two bins and therefore leads to a global lower signal over background ratio across all SRC bins. Finally, limits are set by the SRD fit in the case where (with  GeV), for which top squark masses in the range \(300\mathrm {-}660\) GeV are excluded. The sharp structure in the middle of Fig.13b is an artifact of stitching together the exclusion contours of SRC and SRD. Signals with  GeV, which is the smallest value considered, are excluded for \(m_{{\tilde{t}}}<490\) GeV. This is the first time that an ATLAS all-hadronic search reaches exclusion sensitivity in the four-body region. This is due to the newly introduced and dedicated SRD, which takes advantage of track-jet b-tagging to discriminate between signal and background. The exclusion limits shown in Fig.13 are derived for unpolarised top squarks. Footnote 6 The exclusion limit stays within \(\pm 25\) GeV of top squark mass when varying the top squark polarisation, which is within the \(\sigma ^{\mathrm {SUSY}}_{\mathrm {theory}}\) uncertainty band; hadronic final states are less sensitive to polarisation effects than final states with one or more leptons.

Fig. 12
figure 12

Distributions of a \({\mathcal {S}}\) in SRA-TW, b \(m_{1}^{R=1.2}\) in SRB-TT, c \(R_{\mathrm {ISR}}\) in SRC, and df \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}}\) in SRD0–2 after the SRA–B, SRC and SRD background-only likelihood fits. The stacked histograms show the SM prediction and the hatched uncertainty band around the SM prediction shows the total uncertainty, which includes the MC statistical uncertainties, detector-related systematic uncertainties, and theoretical uncertainties. The data (points) are overlaid. For each variable, the distribution for a representative signal hypothesis is overlaid as a dashed line. The rightmost bin includes overflow events

Table 14 Left to right: 95% CL upper limits on the visible cross section (\(\langle \epsilon \sigma \rangle _{\mathrm {obs}}^{95}\)) and on the number of signal events (\(S_{\mathrm {obs}}^{95}\) ). The third column (\(S_{\mathrm {exp}}^{95}\)) shows the 95% CL upper limit on the number of signal events, given the expected number (and \(\pm 1\sigma \) excursions on the expectation) of background events. The last two columns indicate the \(\text {CL}_{\text {B}}\) value, i.e. the confidence level observed for the background-only hypothesis, the discovery p-value (\(p(s = 0)\)), and the significance (Z). In cases where the observed data yields are less than the total expected background yields, the p-value is truncated to 0.5
Fig. 13
figure 13

Observed (red solid line) and expected (black dashed line) exclusion contours at 95% CL as a function of the a vs \({\tilde{t}} \) masses and b vs \({\tilde{t}} \) mass. Masses that are within the contours are excluded. Uncertainty bands corresponding to the \(\pm 1 \sigma \) variation of the expected limit (yellow band) and the sensitivity of the observed limit to \(\pm 1\sigma \) variations of the signal total cross section (red dotted lines) are also indicated. Observed limits from previous ATLAS searches  [22,23,24,25, 27, 28] based on 36.1 \(\hbox {fb}^{-1}\) of pp collision data are provided for comparison in grey

Fig. 14
figure 14

Observed (red solid line) and expected (black dashed line) limits on up-type, third-generation leptoquarks. a Limits as a function of the branching ratio of leptoquarks decaying into \(b\tau \) (with the only other decay allowed being into \(t\nu \)) vs leptoquark mass. b Limits on the production cross section at 95% CL as a function of leptoquark mass assuming that all leptoquarks decay into \(t\nu \). Uncertainty bands corresponding to the \(\pm 1\sigma \) variation of the expected limit (yellow band) and the sensitivity of the observed limit are also indicated. Observed limits from previous searches with the ATLAS detector at \(\sqrt{s}=13\) TeV  [44] are overlaid a in grey and b as a blue dashed line

The SRA–B exclusion fit was repeated considering the analysis sensitivity to the production of up-type, third-generation scalar leptoquarks. The results are shown in Fig.14a as a function of the leptoquark branching ratio to charged leptons, \({{{\mathcal {B}}}}\left( \text {LQ}_{3}^{\mathrm{u}}\rightarrow b\tau \right) \) in this scenario, vs the leptoquark mass. For \({{{\mathcal {B}}}}\left( \text {LQ}_{3}^{\mathrm{u}}\rightarrow b\tau \right) = 0\), where the leptoquarks decay into \(t\nu \) \(100\%\) of the time, leptoquarks are excluded up to a mass of \(1240\) GeV as shown in Fig.14b. The difference in exclusion reach at \({{{\mathcal {B}}}}\left( \text {LQ}_{3}^{\mathrm{u}}\rightarrow b\tau \right) = 0\) between the leptoquark and top squark interpretations comes from the fact that top squark samples were produced at LO in QCD while the leptoquark samples were produced at NLO, which changes the kinematics slightly. This difference, however, is covered by the signal acceptance uncertainty (12% in SRA–B).

9 Conclusions

Results from a search for a scalar partner of the top quark based on an integrated luminosity of \(139\,\text {fb}^{-1}\) of \(\sqrt{s} = 13\) \(\text {Te}\text {V}\) proton–proton collision data recorded by the ATLAS experiment at the LHC from 2015 to 2018 are presented. Final states with high-\(p_{\text {T}}\) jets, large missing transverse momentum, and no electrons or muons are selected. The sensitivity of the search is enhanced by the analysis of the full LHC Run 2 dataset, improvements achieved in the detector performance by the end of the LHC Run 2, and new analysis techniques such as the use of b-tagged track-jets, which extend sensitivity to all-hadronic four-body decays.

Direct top squark pair production is considered, assuming both top squarks decay via , and considering a large range of mass differences between the top squark and the neutralino. In particular, fully hadronic final states are used for the first time in an ATLAS analysis to set limits on the scenario where both the top quarks and \(W \) bosons originating from their decays are off-shell, , due to improvements in the identification efficiency of low-transverse-momentum b-hadrons. The results are also reinterpreted in the context of third-generation, up-type scalar leptoquark pair production, restricted to the scenario where the leptoquarks decay into \(t\nu \) or \(b\tau \) final states.

No significant excess above the expected SM background is observed. Exclusion limits at 95% confidence level are derived as a function of \(m_{{\tilde{t}}}\) and , resulting in the exclusion of top squark masses that extend up to 1.25 \(\text {Te}\text {V}\) for masses below 200 GeV. In the case where , top squark masses in the range \(300\mathrm {-}630\) GeV are excluded, while in the scenario where (with  GeV), top squark masses in the range \(300\mathrm {-}660\) GeV are excluded. Exclusion limits for up-type, third-generation scalar leptoquarks are extended to masses below \(1240\) GeV, assuming the leptoquarks can decay only via \(t\nu \). Model-independent limits and p-values for each signal region are also reported, with 0.04 fb as the lowest upper bound on the visible cross section of new physics phenomena.