1 Introduction

Supersymmetry (SUSY) [1,2,3,4,5,6] is one of the most studied frameworks to extend the Standard Model (SM) beyond the electroweak scale. It predicts new bosonic (fermionic) partners for the known fermions (bosons). Assuming R-parity conservation [7], SUSY particles are produced in pairs and the lightest supersymmetric particle (LSP) is stable, providing a possible dark-matter candidate. The SUSY partners of the Higgs bosons and electroweak gauge bosons mix to form the mass eigenstates known as charginos (\(\tilde{\chi }^{\pm }_{k}\), \(k = 1, 2\)) and neutralinos (\(\tilde{\chi }^{0}_{m}\), \(m = 1, 2, 3, 4\)), where the increasing index denotes increasing mass. The scalar partners of right-handed and left-handed quarks, \(\tilde{q}_{\mathrm {R}}\) and \(\tilde{q}_{\mathrm {L}}\) squarks, mix to form two mass eigenstates, \(\tilde{q}_{1}\) and \(\tilde{q}_{2}\), with \(\tilde{q}_{1}\) defined to be the lighter of the two. To address the SM hierarchy problem [8,9,10,11], \(\text {TeV}\)-scale masses are favoured [12, 13] for the supersymmetric partners of the gluons, and the top squarks [14, 15].

Fig. 1
figure 1

Diagrams for the top squark pair production processes considered in this analysis: a with decays (showing for illustration the case where the two decay differently, although events with the same decays are also considered in the analysis), and b \(\tilde{t}_2 \rightarrow Z \tilde{t}_1 \) with decays

Top squark production with SM Higgs (h) or Z bosons in the decay chain can appear either in production of the lighter top squark mass eigenstate (\(\tilde{t}_1\)) decaying via with , or in production of the heavier top squark mass eigenstate (\(\tilde{t}_2\)) decaying via \(\tilde{t}_2 \rightarrow Z \tilde{t}_1 \) with , as illustrated in Fig. 1. Unlike other top squark models, these signals can be efficiently distinguished from the SM top quark pair production (\(t\bar{t} \)) background by requiring either a same-flavour opposite-sign (SF-OS) lepton pair originating from the \(Z\rightarrow \ell ^+\ell ^-\) (\(\ell \equiv e, \mu \)) decay or a pair of b-tagged jets originating from the \(h\rightarrow b\bar{b} \) decay, plus the presence of an additional lepton produced in the decay of the top quarks in the event.

Simplified models [16,17,18] are used for the analysis optimisation and interpretation of the results. In these models, direct top squark pair production is considered and all SUSY particles are decoupled except for the top squarks and the neutralinos involved in their decay. In all cases the is assumed to be the LSP. Simplified models featuring direct \(\tilde{t}_1 \) production with and decays via either Higgs () or Z () bosons with different branching ratio values are considered. In these models, the is assumed to be very light and the mass difference to be large enough to allow on-shell Higgs or Z boson decays.

Additional simplified models featuring direct \(\tilde{t}_2 \) production with \(\tilde{t}_2 \rightarrow Z\tilde{t}_1 \) decays and are also considered. The mass difference between the \(\tilde{t}_1 \) and is set to be smaller than the W boson mass, and the four-body decay is assumed to occur, where f and \(f'\) are two fermions from the \(W^*\) decay, such as . Direct production of \(\tilde{t}_1\) pairs is not considered in the \(\tilde{t}_2\) production simplified models . The \(\tilde{t}_1\) pair production contribution to the selections presented in this paper has been found to be negligible.

This paper presents the results of a search for top squarks in final states with Higgs or Z bosons at \(\sqrt{s}=13\) \(\text {TeV}\) using the complete data sample collected with the ATLAS detector [19] in proton–proton (pp) collisions during Run-2 of the LHC (2015–2018), corresponding to 139 \(\hbox {fb}^{-1}\). Searches for top squark production in events involving Higgs or Z bosons have been performed previously by both ATLAS [20, 21] and CMS [22, 23].

2 ATLAS detector

The ATLAS detector [19, 24] at the LHC is a multipurpose particle detector with a forward–backward symmetric cylindrical geometry and a near \(4\pi \) coverage in solid angle.Footnote 1 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2 T axial magnetic field, electromagnetic (EM) and hadron calorimeters, and a muon spectrometer (MS). The inner tracking detector covers the pseudorapidity range \(|\eta | < 2.5\). It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide EM energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range \(|\eta | < 1.7\). The endcap and forward regions are instrumented with LAr calorimeters for both the EM and hadronic energy measurements up to \(|\eta | = 4.9\). The muon spectrometer surrounds the calorimeters and is based on three large air-core toroidal superconducting magnets with eight coils each. The field integral of the toroids ranges between 2 and 6 T m across most of the detector. The muon spectrometer includes a system of precision tracking chambers and fast detectors for triggering. A two-level trigger system is used to select events [25]. The first-level trigger is implemented in hardware and uses a subset of the detector information to reduce the accepted rate to at most 100 kHz. This is followed by a software-based trigger that reduces the accepted event rate to 1 kHz on average depending on the data-taking conditions.

Table 1 Simulated signal and background event samples: the corresponding event generator used for the hard-scatter process, the generator used to model the parton showering, the source of the cross-section used for normalisation, the PDF set and the underlying-event tune are shown

3 Data set and simulated event samples

The data were collected by the ATLAS detector during the LHC Run-2 (2015–2018) with a peak instantaneous luminosity of \(\mathcal{L}=2.1\times 10^{34}~\hbox {cm}^{-2}\,\hbox {s}^{-1}\), resulting in a mean number of pp interactions per bunch crossing of \(\langle \mu \rangle = 34\). Data quality requirements are applied to ensure that all subdetectors were operating normally, and that the LHC beams were in stable-collision mode. The integrated luminosity of the resulting data sample is \(139\hbox { fb}^{-1}\). The uncertainty in the combined 2015–2018 integrated luminosity is 1.7%. It is derived from the calibration of the luminosity scale using xy beam-separation scans, following a methodology similar to that detailed in Ref. [26], and using the LUCID-2 detector for the baseline luminosity measurements [27].

Monte Carlo (MC) simulated event samples are used to aid the estimation of the background from SM processes and to model the SUSY signal. The choices of MC event generator, parton shower and hadronisation, the cross-section normalisation, the parton distribution function (PDF) set and the set of tuned parameters (tune) for the underlying event of these samples are summarised in Table 1. More details of the event generator configurations can be found in Refs. [28,29,30,31]. Cross-sections calculated at next-to-next-to-leading order (NNLO) in quantum chromodynamics (QCD) including resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms were used for top quark production processes. For production of top quark pairs in association with vector or Higgs bosons, cross-sections calculated at next-to-leading order (NLO) were used, and the event generator NLO cross-sections from Sherpa  [32] were used when normalising the multi-boson backgrounds. In all MC samples, except those produced by Sherpa, the EvtGen v1.2.0 program [33] was used to model the properties of the bottom and charm hadron decays.

SUSY signal samples were generated with MG5_aMC@NLO 2.6.2 [34] interfaced to Pythia 8.212 [35] for the parton showering (PS) and hadronisation. The matrix element (ME) calculation was performed at tree level and includes the emission of up to two additional partons for all signal samples. MadSpin  [55] was used to model the decays. MadSpin emulates kinematic distributions to a good approximation without calculating the full ME. The PDF set used for the generation of the signal samples was NNPDF2.3LO [40] with the A14 [41] set of tuned underlying-event and shower parameters (UE tune). The ME–PS matching was performed with the CKKW-L prescription [56], with a matching scale set to one quarter of the top squark mass. All signal cross-sections were calculated to approximate NNLO in the strong coupling constant, adding the resummation of soft gluon emission at NNLL accuracy (approximate NNLO+NNLL) [36,37,38,39]. The nominal cross-section and its uncertainty were derived using the PDF4LHC15_mc PDF set, following the recommendations of Ref. [57].

To simulate the effects of additional pp collisions in the same and nearby bunch crossings (pile-up), additional interactions were generated using the soft QCD processes provided by Pythia 8.186 with the A3 tune [58] and the MSTW2008LO PDF set [59], and overlaid onto each simulated hard-scatter event. The MC samples were reweighted so that the pile-up distribution matches the one observed in the data. The MC samples were processed through an ATLAS detector simulation [60] based on Geant 4 [61] or, in the case of \(t\bar{t} t\) and the SUSY signal samples, a fast simulation using a parameterisation of the calorimeter response and Geant 4 for the other parts of the detector. All MC samples were reconstructed in the same manner as the data.

4 Event selection

Candidate events are required to have a reconstructed vertex [62] with at least two associated tracks with transverse momentum (\(p_{\text {T}}\)) larger than 500 MeV that are consistent with originating from the beam collision region in the xy plane. The primary vertex in the event is the vertex with the highest sum of squared transverse momenta of associated tracks.

Two categories of leptons (electrons and muons) are defined: ‘candidate’ and ‘signal’ (the latter being a subset of the ‘candidate’ leptons satisfying tighter selection criteria). Electron candidates are reconstructed from isolated electromagnetic calorimeter energy deposits matched to ID tracks and are required to have \(|\eta |<2.47\), a transverse momentum \(p_{\text {T}} > 4.5~\hbox {GeV}\), and to satisfy the ‘LooseAndBLayer’ requirement defined in Ref. [63], which is based on a likelihood using measurements of shower shapes in the calorimeter and track properties in the ID as input variables.

Muon candidates are reconstructed in the region \(|\eta |<2.4\) from MS tracks matching ID tracks. Candidate muons are required to have \(p_{\text {T}} >4~\hbox {GeV}\) and satisfy the ‘medium’ identification requirements defined in Ref. [64], based on the number of hits in the different ID and MS subsystems, and on the ratio of the charge and momentum (q/p) measured in the ID and MS divided by the sum in quadrature of their corresponding uncertainties.

The tracks associated with the lepton candidates are required to have a significance of the transverse impact parameter relative to the reconstructed primary vertex, \(d_0\), of \(\vert d_0\vert /\sigma (d_0) < 5\) for electrons and \(\vert d_0\vert /\sigma (d_0) < 3\) for muons, and a longitudinal impact parameter relative to the reconstructed primary vertex, \(z_0\), satisfying \(\vert z_0 \sin \theta \vert < 0.5\) mm.

Jets are reconstructed from three-dimensional energy clusters in the calorimeter [65] using the anti-\(k_t\) jet clustering algorithm [66] with a radius parameter \(R=0.4\). Only jet candidates with \(p_{\text {T}} >20~\hbox {GeV}\) and \(|\eta |<2.8\) are considered. Jets are calibrated using MC simulation with corrections obtained from in situ techniques [67]. To reduce the effects of pile-up, jets with \(p_{\text {T}} <120~\hbox {GeV}\) and \(|\eta |<2.5\) are required to have a significant fraction of their associated tracks compatible with originating from the primary vertex, as defined by the jet vertex tagger [68]. This requirement reduces the fraction of jets from pile-up to 1%, with an efficiency for hard-scatter jets of about 90%. Events are discarded if they contain any jet with \(p_{\text {T}} >20~\hbox {GeV}\) not satisfying basic quality selection criteria designed to reject detector noise and non-collision backgrounds [69].

Identification of jets containing b-hadrons (b-tagging) is performed with a multivariate discriminant that makes use of track impact parameters and reconstructed secondary vertices [70, 71]. Jets are considered as b-tagged if they fulfil a requirement corresponding to a 77% average efficiency obtained for jets containing b-hadrons in simulated \(t\bar{t} \) events. The rejection factors for light-quark and gluon jets, jets containing c-hadrons and jets containing hadronically decaying \(\tau \)-leptons in simulated \(t\bar{t} \) events are approximately 113, 16 and 4, respectively.

Jet candidates with \(p_{\text {T}} <200\) GeV within an angular distance \(\Delta R =\sqrt{(\Delta y)^2+(\Delta \phi )^2} = 0.2\) of an electron candidate are discarded, unless the jet has a value of the b-tagging discriminant larger than the value corresponding to approximately 85% b-tagging efficiency, in which case the lepton is discarded since it is likely to have originated from a semileptonic b-hadron decay. The same procedure is applied to jets within \(\Delta R = 0.2\) of a muon candidate irrespective of the jet \(p_{\text {T}}\). Any remaining electron candidate within \(\Delta R=0.4\) of a non-pile-up jet, and any muon candidate within \(\Delta R= {\text {min}}\{0.4, 0.04 + p_{\text {T}} (\mu )/10\hbox { GeV}\}\) of a non-pile-up jet is discarded. In the latter case, if the jet has fewer than three associated tracks, the muon is retained and the jet is discarded instead to avoid inefficiencies for high-energy muons undergoing significant energy loss in the calorimeter. Finally, any electron candidate sharing an ID track with a remaining muon candidate is also removed.

Tighter requirements on the lepton candidates are imposed, which are then referred to as ‘signal’ electrons or muons. Signal electrons must satisfy the ‘Medium’ identification requirement as defined in Ref. [63] and signal muons must have \(p_{\text {T}} >5\) GeV. Isolation requirements are applied to both the signal electrons and muons. The scalar sum of the \(p_{\text {T}}\) of tracks within a variable-size cone around the lepton, excluding its own track, must be less than 6% of the lepton \(p_{\text {T}}\); these tracks are required to be associated with the primary vertex to limit sensitivity to pile-up. The size of the track isolation cone for electrons (muons) is given by the smaller of \(\Delta R = 10\hbox { GeV}/p_{\text {T}} \) and \(\Delta R = 0.2\,(0.3)\). In addition, in the case of electrons the energy of calorimeter energy clusters in a cone of \(\Delta R_\eta = \sqrt{(\Delta \eta )^2+(\Delta \phi )^2} = 0.2\) around the electron (excluding the deposition from the electron itself) must be less than 6% of the electron \(p_{\text {T}}\).

Simulated events are corrected for differences between data and MC simulation in jet vertex tagger and b-tagging efficiencies as well as b-tagging mis-tag rates [68, 71,72,73]. Corrections are also applied to account for minor differences between data and MC simulation in the signal-lepton trigger, reconstruction, identification and isolation efficiencies.

The missing transverse momentum vector, whose magnitude is denoted by \(E_{\mathrm{T}}^{\mathrm{miss}}\), is defined as the negative vector sum of the transverse momenta of all identified electrons, muons and jets, plus an additional soft term. The soft term is constructed from all tracks that originate from the primary vertex but are not associated with any identified lepton or jet. In this way, the \(E_{\mathrm{T}}^{\mathrm{miss}}\) is adjusted for the best calibration of leptons and jets, while contributions from pile-up interactions are suppressed through the soft term [74, 75].

The events are classified into two exclusive categories: at least three leptons (referred to as 3\(\ell \) selection, aimed at top squark decays involving Z bosons), or exactly one lepton (referred to as \(1\ell \) selection, aimed at top squark decays involving Higgs bosons). The selection requirements for each of these categories are described below.

Table 2 Definition of the signal regions used in the \(3\ell \) selection (see text for further description)
Table 3 Definition of the signal regions used in the \(1\ell \) selection (see text for further description)
Table 4 Definition of the validation regions used for the FNP lepton estimation (see text for further description)
Table 5 Background fit results for the FNP validation regions. The ‘Others’ category is dominated by \(t\bar{t} W\) production and also contains the contributions from \(t\bar{t} h\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. Combined statistical and systematic uncertainties are given. Some of the uncertainties are correlated and therefore the overall sum in quadrature does not necessarily add to the total systematic uncertainty. The number of \(t\bar{t}Z\) and multi-boson background events is estimated as described in Sect. 5.2

4.1 3\(\ell \) selection

In this selection, events are accepted if they satisfy a trigger requiring either two electrons, two muons or an electron and a muon. The requirements imposed offline on the \(p_{\text {T}}\), identification and isolation of the leptons involved in the trigger decision are tighter than those applied online, so as to be on the trigger efficiency plateau [25]. The presence of at least three signal leptons (electrons or muons, referred to collectively with the symbol \(\ell \)), with at least one SF-OS lepton pair whose invariant mass is compatible with the Z boson mass (\(|m_{\ell \ell }-m_Z|<15\) GeV, with \(m_Z=91.2\) GeV) is required. In addition, the leading (highest \(p_{\text {T}} \)) lepton is required to have \(p_{\text {T}} >40\) GeV and the subleading to have \(p_{\text {T}} >20\) GeV. The SF-OS requirements are not applied for the validation of the background induced by fake and non-prompt leptons in Sect. 5.1.

Four overlapping signal regions (SRs) are optimised for the best discovery sensitivity, two for each of the simplified models described in Sect. 1. The requirements in each SR are summarised in Table 2.

Signal region \(\hbox {SR}_{1\mathrm{A}}^{Z}\) is optimised for large mass splittings in the with model. It includes requirements on \(m_{T2}^{3l}\), a variation of the stransverse mass \(m_{\text {T2}}\) which is used to bind the masses of a pair of particles that are presumed to have each decayed semi-invisibly into one visible and one invisible particle [76, 77]. In the case of \(m_{T2}^{3l}\), the two visible legs of the two semi-invisible decays are set to be the third leading lepton and the system of the SF-OS lepton pair with an invariant mass closest to \(m_Z\). Models with small mass differences between the and the are targeted with \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\), which instead imposes requirements on the transverse momentum of the SF-OS lepton pair (\(p_{\text {T}}^{\ell \ell }\)).

Table 6 Definition of the control and validation regions used for the \(t\bar{t} Z\) and multi-boson background estimation
Table 7 Background fit results for the control and validation regions for the \(t\bar{t} Z\) and multi-boson backgrounds. The pre-fit predictions from MC simulation are given for comparison for those backgrounds (\(t\bar{t}Z\), multi-boson) that are normalised to data. The ‘Others’ category contains the contributions from \(t\bar{t} h\), \(t\bar{t} W\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. Combined statistical and systematic uncertainties are given. Some of the uncertainties are correlated and therefore the overall sum in quadrature does not necessarily add to the total systematic uncertainty. The number of events with fake/non-prompt leptons is estimated with the data-driven technique described in Sect. 5.1
Fig. 2
figure 2

Jet multiplicity distributions in control and validation regions a \(\hbox {CR}_{t{\bar{t}}Z}^{Z}\), b \(\hbox {CR}_{VV}^{Z}\), c \(\hbox {VR}_{t{\bar{t}}Z}^{Z}\) and d \(\hbox {VR}_{VV}^{Z,\mathrm{n\text {-}jet}}\)after normalising the \(t\bar{t}Z\) and multi-boson background processes via the simultaneous fit described in Sect. 5. The contributions from all SM backgrounds are shown as a histogram stack; the bands represent the total uncertainty in the background prediction. The ‘Others’ category contains the contributions from \(t\bar{t} h\), \(t\bar{t} W\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. The ‘FNP’ category represents the background from fake or non-prompt leptons. The last bin in each figure contains the overflow. The lower panels show the ratio of the observed data to the total SM background prediction, with bands representing the total uncertainty in the background prediction

Two SRs are optimised for the \(\tilde{t}_2 \rightarrow Z\tilde{t}_1 \) with model, \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) and \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\), targeting small and large mass splittings between the \(\tilde{t}_2 \) and the , respectively. Due to the overall soft kinematics of the particles in compressed \(\tilde{t}_2 \) signals, \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) features upper bounds on the \(p_{\text {T}} \) of the third leading lepton and on \(p_{\text {T}}^{\ell \ell }\), as well as no requirement on the number of b-tagged jets since they are likely to be soft in this scenario. \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\) also includes an upper bound on the third leading lepton \(p_{\text {T}} \) but requires the presence of b-tagged jets and large \(E_{\text {T}}^{\text {miss}} \) and \(p_{\text {T}}^{\ell \ell }\).

The requirement on the number of signal leptons makes the \(\hbox {SR}_{1\mathrm{A}}^{Z}\) and \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\) selections insensitive to potential contributions from alternative \(\tilde{t}_1\) decays with each . The acceptance for mixed decay scenarios, with was found to depend linearly on the branching fraction of \(\tilde{t}_1\) to . The signal lepton multiplicity requirement depletes the contribution from the direct production of \(\tilde{t}_1\) pairs to \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) and \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\).

4.2 1\(\ell \) selection

Events in this selection are accepted if they satisfy a combination of single-lepton and \(E_{\text {T}}^{\text {miss}} \) trigger requirements, the latter being used only for events with \(E_{\text {T}}^{\text {miss}} >230\) GeVand lepton \(p_{\text {T}} <30\) GeV. The offline requirements on the \(E_{\text {T}}^{\text {miss}} \), \(p_{\text {T}}\), identification and isolation of the lepton are tighter than those applied online, so as to be on the trigger efficiency plateau [25]. The presence of exactly one signal lepton (electron or muon) is required.

The identification of Higgs boson candidates decaying into b-quarks is performed using a neural network that uses as input the four-momentum and b-tagging information of pairs of jets. The scaled jet \(p_{\text {T}}/m_{jj}\) observable is used to prevent the dijet invariant mass being used as the primary discriminating variable. The network is trained with the PyTorch package [78] using as signal dijet pairs originating from Higgs boson decays in the simulated \(t\bar{t} H\) process, and as background dijet pairs in \(t\bar{t} \) and \(t\bar{t} H\) not originating from Higgs boson decays. A jet pair is tagged as a Higgs boson candidate if the corresponding neural network score is above a threshold optimised to target models predicting sizeable branching fractions of into Higgs bosons. The efficiency to correctly identify jet pairs originating from Higgs boson decays in signal events is 50–54% depending on the \(\tilde{t}_1 \) mass, as evaluated on a simulated event sample selected by requiring one signal lepton and at least two b-tagged jets. Using the same selection, the probability of tagging a jet pair in \(t\bar{t} \) events is approximately 0.05%.

Two overlapping SRs are optimised for the best discovery sensitivity in the simplified model discussed in Sect. 1, and their requirements are summarised in Table 3. Both require the presence of at least four b-tagged jets and at least one Higgs boson candidate (\(n_{h\text {-cand}}\)). The SRs include requirements on the transverse mass \(m_{\text {T}}\) (computed as \(m_{\text {T}} = \sqrt{2 p_\text {T}^{\ell }E_{\text {T}}^{\text {miss}} (1 - \cos \Delta \phi )}\) where \(\Delta \phi \) is the azimuthal angle between the missing transverse momentum vector and the lepton), and on the object-based \(E_{\text {T}}^{\text {miss}} \)-significance [79] (\(\mathcal S\)), which is used to discriminate events where the \(E_{\text {T}}^{\text {miss}} \) is due to invisible particles in the final state from events where the \(E_{\text {T}}^{\text {miss}} \) is due to poorly measured particles and jets. The requirements on \(n_{b{\mathrm{-tagged \, jets}}}\) and \(n_{h\text {-cand}}\) reduce the potential contributions from alternative \(\tilde{t}_1\) decays with each to a negligible level. Analogously to the 3\(\ell \) selection, the acceptance for mixed decay scenarios with has been found to depend linearly on the branching fraction of \(\tilde{t}_1\) to .

Signal region \(\hbox {SR}_{1\mathrm{A}}^{h}\) is optimised for small mass splittings, while large mass splittings are targeted by \(\hbox {SR}_{1\mathrm{B}}^{h}\).

Table 8 Definition of the control region used for the \(t\bar{t} \) background estimation
Table 9 Background fit results for the \(t\bar{t} \) background control region in the \(1\ell \) selection. The pre-fit predictions from MC simulation are given for comparison for those backgrounds (\(t\bar{t} \)) that are normalised to data. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. Combined statistical and systematic uncertainties are given. Some of the uncertainties are correlated and therefore the overall sum in quadrature does not necessarily add to the total systematic uncertainty

5 Background estimation

The dominant SM background contribution to the \(3\ell \) SRs is expected to originate from \(t\bar{t} Z\) production, with minor contributions from multi-boson production (mainly WZ) and backgrounds containing hadrons misidentified as leptons (hereafter referred to as ‘fake’ leptons) or non-prompt leptons from decays of hadrons (mainly in \(t\bar{t}\) events). The main background affecting the \(1\ell \) SRs is expected to originate from \(t\bar{t} \) production in association with heavy-flavour quarks.

The background from fake/non-prompt (FNP) leptons is estimated in a data-driven way, while the normalisation of the main backgrounds (\(t\bar{t} Z\) and multi-boson in the \(3\ell \) selection; \(t\bar{t} \) in the \(1\ell \) selection) is obtained by fitting the yield from MC simulation to the observed data in dedicated control regions (CRs) enhanced in a particular background component, and then extrapolating this yield to the SRs. Backgrounds from other sources, which provide a subdominant contribution to the SRs, are determined from MC simulation only.

The expected SM background is determined separately in each SR from a profile likelihood fit [80] implemented in the HistFitter framework [81]. The fit uses as a constraint the observed event yield in the fitted regions to adjust the normalisation of the main backgrounds. The quality of the resulting background model is judged by performing a ‘background-only’ fit to data using exclusively the event yield in the CRs to adjust the normalisation of the backgrounds. The agreement of the resulting background model is compared with the data yields in dedicated validation regions (VRs). Systematic uncertainties related to the MC modelling affect the expected yields in the SRs and CRs, and are taken into account to determine the uncertainty in the background prediction. Each source of uncertainty is described by a single nuisance parameter, and correlations between background processes and selections are taken into account. The VRs, used to assess the quality of the background model, are not used to constrain the nuisance parameters in the fit. The fit does not significantly affect either the uncertainty or the central value of these nuisance parameters. The systematic uncertainties considered in the fit are described in Sect. 6.

Fig. 3
figure 3

a Transverse mass and b number of Higgs boson candidates distributions in \(\hbox {CR}_{t\bar{t}}^{h}\) after normalising the \(t\bar{t} \) background process via the simultaneous fit described in Sect. 5. The contributions from all SM backgrounds are shown as a histogram stack; the bands represent the total uncertainty in the background prediction. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. The last bin in each figure contains the overflow. The lower panels show the ratio of the observed data to the total SM background prediction, with bands representing the total uncertainty in the background prediction

Fig. 4
figure 4

Comparison of the observed and expected event yields in the different kinematic regions used to validate the \(t\bar{t} \) background estimation in the 1\(\ell \) selection. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. The lower panel shows the ratio of the observed data to the total SM background prediction, with bands representing the total uncertainty in the background prediction

Fig. 5
figure 5

Comparison of the relative uncertainty for the total background yield in each SR, including the contribution from the different sources of uncertainty. The ‘Detector’ category contains all detector-related systematic uncertainties and is dominated by jet energy scale and resolution, and b-tagging uncertainties

5.1 Fake/non-prompt lepton background

A method similar to that described in Refs. [82, 83] is used for the estimation of the FNP lepton background. Two types of lepton identification criteria are defined for this evaluation: ‘tight’ and ‘loose’, corresponding to the signal and candidate electrons and muons described in Sect. 4. With the number of observed events with tight or loose leptons, the method estimates the number of events containing prompt or FNP leptons using as input the probability for loose prompt or FNP leptons to satisfy the tight criteria. The probability for prompt loose leptons to satisfy the tight selection is determined from \(t\bar{t} Z\) MC simulation, applying correction factors obtained by comparing \(Z\rightarrow \ell ^+ \ell ^-\) (\(\ell = e, \mu \)) events in data and MC simulation. The equivalent probability for loose FNP leptons to satisfy the tight selection is measured in a data sample enhanced in \(t\bar{t} \) using events with one electron and one muon with the same charge plus at least one b-tagged jet.

The estimates for the FNP background are validated in dedicated regions with selection criteria similar to those defining the 3\(\ell \) SRs but with reduced contributions from processes with three prompt leptons. Two VRs are defined as detailed in Table 4, with \(\hbox {VR}_{1\mathrm{F}}^{Z}\) probing the lepton \(p_{\text {T}} \) regime in \(\hbox {SR}_{1\mathrm{A}}^{Z}\) and \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\), while \(\hbox {VR}_{2\mathrm{F}}^{Z}\) includes soft lepton requirements as in \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) and \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\). To enhance fakes and be mutually exclusive with the SR, events in these VRs are required to have no SF–OS dilepton pairs and to have at least one different-flavour opposite-sign (DF–OS) dilepton pair. The observed and expected yields in these VRs are shown in Table 5, with a purity of FNP leptons above 55% and with good agreement between data and the background estimates.

The contribution of the FNP background in the 1\(\ell \) selection is negligible.

5.2 Estimation of the \(t\bar{t} Z\) and multi-boson background in the 3\(\ell \) selection

The two dedicated control regions used for the \(t\bar{t} Z\) (\(\hbox {CR}_{t{\bar{t}}Z}^{Z}\)) and multi-boson (\(\hbox {CR}_{VV}^{Z}\)) background estimation in the 3\(\ell \) selection are defined in Table 6. To ensure mutual exclusion with the SRs, only events with \(50~\hbox {GeV}< E_{\mathrm{T}}^{\mathrm{miss}} <100\) GeVare included in \(\hbox {CR}_{t{\bar{t}}Z}^{Z}\), while a b-tagged jet veto and a \(50~\hbox {GeV}<E_{\mathrm{T}}^{\mathrm{miss}} <200\) GeVrequirement are applied in \(\hbox {CR}_{VV}^{Z}\).

To validate the background estimates and provide a statistically independent cross-check of the extrapolation to the SRs, three validation regions are defined, as shown in Table 6. The \(\hbox {VR}_{t{\bar{t}}Z}^{Z}\) region primarily validates the \(t\bar{t}Z\) background estimate. The \(\hbox {VR}_{VV}^{Z,\mathrm{n\text {-}jet}}\) and \(\hbox {VR}_{VV}^{Z,b\text {-}\mathrm{tag}}\) regions validate the multi-boson background estimate, the former focusing on the extrapolation in jet multiplicity and the latter releasing the b-tagged jet veto. The overlap between these multi-boson VRs is around 50%.

Table 10 Observed and expected numbers of events in the 3\(\ell \) signal regions. The pre-fit predictions from MC simulation are given for comparison for those backgrounds (\(t\bar{t} Z\), multi-boson) that are normalised to data in dedicated control regions. The ‘Others’ category is dominated by \(t\bar{t} W\) production and also contains the contributions from \(t\bar{t} h\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. Combined statistical and systematic uncertainties are given. The table also includes model-independent 95% CL upper limits on the visible number of BSM events (\(S^{95}_{\text {obs}}\)), the number of BSM events given the expected number of background events (\(S^{95}_{\mathrm{exp}}\)) and the visible BSM cross-section (\(\sigma _{\mathrm {vis}}\)), as well as the discovery p-value (\(p_0\)) for the background-only hypothesis, all calculated from pseudo-experiments. The value of \(p_0\) is capped at 0.5 if the observed number of events is below the expected number of events

Table 7 shows the observed and expected yields in the CRs and VRs for each background source, and Fig. 2 shows the jet multiplicity distribution after the background fit for these CRs and VRs. The normalisation factors for the \(t\bar{t} Z\) and multi-boson backgrounds do not differ from unity by more than 20% and the post-fit MC-simulated jet multiplicity distributions agree well with the data.

5.3 Estimation of the \(t\bar{t} \) background in the 1\(\ell \) selection

The \(t\bar{t} \) background represents more than 70% of the total background in the 1\(\ell \) SRs and it is estimated with the aid of a dedicated control region (\(\hbox {CR}_{t\bar{t}}^{h}\)), defined in Table 8. Compared to the SRs in Table 3, this region does not apply any selection on the number of Higgs boson candidates and features relaxed requirements on \(m_{\mathrm{T}}\), while only events with exactly four jets and \(7<\mathcal{S}<10\) are included in this CR to ensure orthogonality with the SRs. Table 9 shows the observed and expected yields in the CR for each background source, and Fig. 3 shows the distribution of the transverse mass and number of Higgs boson candidtes after the background fit. The normalisation factor for the \(t\bar{t} \) background was found to be \(1.09 \pm 0.13\).

The extrapolation across \(\mathcal S\) and \(m_{\mathrm{T}}\) between \(\hbox {CR}_{t\bar{t}}^{h}\) and the SRs is tested in several validation regions. Figure 4 shows the definition, the observed number of events and expected yields in these regions, which feature either the same \(b{\mathrm{-tagged \, jets}}\) multiplicity as the SRs with relaxed \(m_{\mathrm{T}}\) requirements, or the same \(m_{\mathrm{T}}\) selections as the SRs requiring exactly three b-tagged jets. Good agreement between the background estimation and the data is observed in all these validation regions.

Table 11 Observed and expected numbers of events in the 1\(\ell \) signal regions. The pre-fit predictions from MC simulation are given for comparison for the \(t\bar{t} \) background that is normalised to data in a dedicated control region. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. Combined statistical and systematic uncertainties are given. The table also includes model-independent 95% CL upper limits on the visible number of BSM events (\(S^{95}_{\text {obs}}\)), the number of BSM events given the expected number of background events (\(S^{95}_{\text {exp}}\)) and the visible BSM cross-section (\(\sigma _{\mathrm {vis}}\)), as well as the discovery p-value (\(p_0\)) for the background-only hypothesis, all calculated from pseudo-experiments. The value of \(p_0\) is capped at 0.5 if the observed number of events is below the expected number of events
Fig. 6
figure 6

Distributions of a \(E_{\mathrm{T}}^{\mathrm{miss}}\) in \(\hbox {SR}_{1\mathrm{A}}^{Z}\), b \(p_{\text {T}}^{\ell \ell }\) in \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\), c \(E_{\mathrm{T}}^{\mathrm{miss}}\) in \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\), and d \(p_{\text {T}}^{\ell \ell }\) in \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\) for events passing all the SR requirements except those on the variable being plotted (the requirements are indicated by the arrows). The contributions from all SM backgrounds are shown after the background fit described in Sect. 5; the hashed bands represent the total uncertainty. The ‘Others’ category contains the contributions from \(t\bar{t} h\), \(t\bar{t} W\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. The ‘FNP’ category represents the background from fake or non-prompt leptons. The expected distributions for selected signal models are also shown as dashed lines. The last bin in each figure contains the overflow

Fig. 7
figure 7

Distributions of a \(E_{\mathrm{T}}^{\mathrm{miss}}\) significance in \(\hbox {SR}_{1\mathrm{A}}^{h}\) and b jet multiplicity in \(\hbox {SR}_{1\mathrm{B}}^{h}\), for events passing all the SR requirements except those on the variable being plotted (the requirements are indicated by the arrows). The contributions from all SM backgrounds are shown after the background fit described in Sect. 5; the hashed bands represent the total uncertainty. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. The expected distributions for selected signal models are also shown as dashed lines. The last bin in each figure contains the overflow

6 Systematic uncertainties

The main sources of systematic uncertainty affecting the analysis SRs are related to the theoretical and modelling uncertainties in the background, the limited number of events in the CRs and MC simulated samples, the uncertainties in the FNP probabilities, as well as the jet energy scale and resolution. The effects of the systematic uncertainties are evaluated for all signal samples and background processes. Since the normalisation of the dominant background processes is extracted in dedicated CRs, the systematic uncertainties only affect the extrapolation to the SRs in these cases. Figure 5 summarises the contributions from the different sources of systematic uncertainty to the total SM background predictions in the signal regions.

The jet energy scale and resolution uncertainties are derived as a function of the \(p_{\text {T}}\) and \(\eta \) of the jet, as well as of the pile-up conditions and the jet flavour composition (more like a quark or a gluon) of the selected jet sample. They are determined using a combination of data and simulated event samples, through measurements of the jet response asymmetry in dijet, Z+jet and \(\gamma +\)jet events [67, 84].

Systematic uncertainties in the b-tagging efficiency are estimated by varying the \(\eta \)-, \(p_{\text {T}} \)- and flavour-dependent scale factors applied to each jet in the simulation within a range that reflects the systematic uncertainty in the measured tagging efficiency and mis-tag rates in data [71,72,73].

Other detector-related systematic uncertainties, such as those related to the \(E_{\mathrm{T}}^{\mathrm{miss}}\) modelling, as well as lepton reconstruction efficiency, energy scale and energy resolution are found to have a small impact on the results.

Table 12 Selection criteria used in the shape-fit to derive the model-dependent exclusion limits. The additional SR labelled as SR\(_{1{\text {AB}}}^{h}\) overlaps with both \(\hbox {SR}_{1\mathrm{A}}^{h}\) and \(\hbox {SR}_{1\mathrm{B}}^{h}\) defined in Table 3

Systematic uncertainties are assigned to the FNP background estimation to account for different compositions (heavy flavour, light flavour or conversions) between the signal and control regions, as well as the contamination from prompt leptons in the regions used to measure the FNP probabilities.

The diboson background MC modelling uncertainties are estimated by varying the renormalisation, factorisation and resummation scales used to generate the samples [30]. For the \(t\bar{t} Z\) background, uncertainties due to parton shower and hadronisation modelling are evaluated by comparing the predictions from \(\hbox {MG}5\_{\textsc {a}}\hbox {MC}@\hbox {NLO}\) interfaced to Pythia and Herwig 7.0.4 [85], while the uncertainties related to the choice of renormalisation and factorisation scales are assessed by varying the corresponding event generator parameters up and down by a factor of two around their nominal values [31]. For the \(t\bar{t} \) background, uncertainties due to parton shower and hadronisation modelling are evaluated by comparing the predictions from Powheg-Box interfaced to Pythia and Herwig 7.0.4 [85], while the uncertainties due the choice of generator are evaluated by comparing the predictions from Powheg-Box and \(\hbox {MG}5\_{\textsc {a}}\hbox {MC}@\hbox {NLO}\) both interfaced to Pythia. Variations of the \(t\bar{t} \) initial- and final-state radiation, renormalisation and factorisation scales are also considered [86].

The cross-sections used to normalise the MC samples are varied according to the uncertainty in the cross-section calculation, i.e. 6% for diboson, 12% for \(t\bar{t} Z\), 13% for \(t\bar{t} W\) [34] and 5% for single top production. For \(t\bar{t} WW\), tZ, tWZ, \(t\bar{t} h\), Wh, Zh, \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), and triboson production processes, which constitute a small background, a 50% uncertainty in the event yields is assumed.

Fig. 8
figure 8

Comparison of the observed and expected event yields in all the 3\(\ell \) SRs and bins used for the model-dependent exclusion limits. The ‘Others’ category contains the contributions from \(t\bar{t} h\), \(t\bar{t} W\), \(t\bar{t} WW\), \(t\bar{t} t\), \(t\bar{t} t\bar{t} \), Wh, and Zh production. The ‘FNP’ category represents the background from fake or non-prompt leptons. The lower panel shows the significance in each SR bin, computed as described in Ref. [88]

Fig. 9
figure 9

Comparison of the observed and expected event yields in all the 1\(\ell \) SRs and bins used for the model-dependent exclusion limits. The ‘Higgs’ category contains the contributions from gluon–gluon fusion, vector-boson fusion, Wh, Zh and \(t\bar{t} h\) production. The ‘Others’ category contains the contributions from \(t\bar{t} WW\), \(t\bar{t} WZ\), \(t\bar{t} t\) and \(t\bar{t} t\bar{t} \) production. The lower panel shows the significance in each SR bin, computed as described in Ref. [88]

7 Results

The observed number of events and expected yields are shown in Tables 10 and 11 for each of the inclusive 3\(\ell \) and 1\(\ell \) SRs, respectively. Figures 6 and 7 show kinematic distributions after applying all the SR selection requirements except those on \(E_{\mathrm{T}}^{\mathrm{miss}} \), \(p_{\text {T}}^{\ell \ell }\) or \(\mathcal S\) and jet multiplicity. The data agree with the SM background predictions and these results are interpreted as exclusion limits for several beyond-the-SM (BSM) scenarios.

The HistFitter framework, which utilises a profile-likelihood-ratio test statistic [80], is used to estimate 95% confidence intervals using the CL\(_\mathrm {s}\) prescription [87]. The likelihood is built from the product of probability density functions describing the observed numbers of events in the SR and the associated CRs. The statistical uncertainties in the CRs and SRs are modelled using Poisson distributions. Systematic uncertainties enter the likelihood as nuisance parameters that are constrained by Gaussian distributions whose widths correspond to the sizes of these uncertainties. Table 10 also shows upper limits (at the 95% CL) on the number of BSM events \(S^{95}\), and on the visible BSM cross-section \(\sigma _{\mathrm {vis}} = S_{\text {obs}}^{95}/\int \!\mathcal{L}\text {d}t\), defined as the product of the production cross-section, acceptance and efficiency.

Model-dependent limits are also set in specific classes of SUSY models. For each signal hypothesis, the background fit is repeated taking into account the signal contamination in the CRs, which is found to be below 12% for signal models close to the existing exclusion limits [20]. Correlations of the uncertainties between the SM backgrounds and the signals are taken into account.

To enhance the exclusion power in the SUSY models considered, a ‘shape-fit’ approach is used where several mutually exclusive bins in different kinematic variables are defined in order to take advantage of the different signal-to-background ratios in the different bins. Table 12 shows the definition of these bins, which loosen a few of the requirements of the discovery SRs to increase the acceptance for different classes of models across the phase space (Tables 2 and 3). \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\)is used in the exclusion fits with no changes. The observed number of events and expected yields in all these bins are shown in Figs. 8 and 9.

Fig. 10
figure 10

Exclusion limits at 95% CL on the masses of the \(\tilde{t}_1 \) and , for a fixed  GeV, assuming and . The dashed line and the shaded band are the expected limit and its \(\pm 1 \sigma \) uncertainty, respectively. The thick solid line is the observed limit for the central value of the signal cross-section. The expected and observed limits do not include the effect of the theoretical uncertainties in the signal cross-section. The dotted lines show the effect on the observed limit of varying the signal cross-section by \(\pm 1 \sigma \) of the theoretical uncertainty. Results are compared with the observed limits obtained by the previous ATLAS search in Ref. [20]

Fig. 11
figure 11

Exclusion limits at 95% CL on the masses of the \(\tilde{t}_1 \) and , for a fixed  GeV and different values of with . The dashed and solid lines are the expected and observed limits for the central value of the signal cross-section, respectively. The expected and observed limits do not include the effect of the theoretical uncertainties in the signal cross-section

Fig. 12
figure 12

Exclusion limits at 95% CL on the masses of the \(\tilde{t}_2 \) and , for a fixed  GeV and assuming \(\mathcal{B}(\tilde{t}_2 \rightarrow Z\tilde{t}_1) = 1\). The dashed line and the shaded band are the expected limit and its \(\pm 1 \sigma \) uncertainty, respectively. The thick solid line is the observed limit for the central value of the signal cross-section. The expected and observed limits do not include the effect of the theoretical uncertainties in the signal cross-section. The dotted lines show the effect on the observed limit of varying the signal cross-section by \(\pm 1 \sigma \) of the theoretical uncertainty

Figure 10 shows the exclusion limits in the with simplified model with 50% branching ratios to each decay mode. These results are obtained from the statistical combination of the shape-fit bins of \(\hbox {SR}_{1\mathrm{A}}^{Z}\), \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\), \(\hbox {SR}_{1\mathrm{A}}^{h}\), \(\hbox {SR}_{1\mathrm{B}}^{h}\) and SR\(_{1{\text {AB}}}^{h}\) shown in Table 12. The bins of \(\hbox {SR}_{1\mathrm{A}}^{h}\), \(\hbox {SR}_{1\mathrm{B}}^{h}\) and SR\(_{1{\text {AB}}}^{h}\) are separately combined with the bins of \(\hbox {SR}_{1\mathrm{A}}^{Z}\) and \(\hbox {SR}_{1\mathrm{B}}^{\mathrm{Z}}\). For each combination of sparticle masses, only the option among those two with best expected sensitivity is considered for the final limit setting. The change in best expected combination is responsible for the kink at \(\tilde{t}_1\) masses of about 900–1000 GeV and masses below 200 GeV. For masses above 225 GeV, \(\tilde{t}_1\) masses up to about 1100 GeV are excluded at 95% CL, while \(\tilde{t}_1\) masses below 1220 GeV are excluded for a mass of 925 GeV. These results improve upon the existing limits on the \(\tilde{t}_1\) mass in this model by approximately 300 GeV [20].

The same statistical combination strategy is also used to obtain exclusion limits in the with simplified model for different decay branching ratios, shown in Fig. 11. For masses above 300 GeV, the exclusion limits on \(\tilde{t}_1\) masses vary by at most 40 GeV depending on the branching ratios. However, for masses below 200 GeV the exclusion limits on \(\tilde{t}_1\) masses are up to 300 GeV better for models featuring only decays compared with models only considering the decays.

Figure 12 shows the exclusion limits in the \(\tilde{t}_2 \rightarrow Z \tilde{t}_1 \) with simplified model for a mass difference between the \(\tilde{t}_1 \) and of 40 GeV. Several options for the \(\tilde{t}_1 \) and mass difference are also considered and the sensitivity is found to appreciably decrease only for values below 20 GeV. These results are obtained from the statistical combination of \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) and the shape-fit bins of \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\) shown in Table 12. The shape of the contour is driven by \(\hbox {SR}_{2\mathrm{A}}^{\mathrm{Z}}\) and \(\hbox {SR}_{2\mathrm{B}}^{\mathrm{Z}}\) being most sensitive to small and large mass splittings between the \(\tilde{t}_2\) and the respectively. Masses of the \(\tilde{t}_2\) up to 875 GeV are excluded at 95% CL for a mass of about 350 GeV  and masses of approximately 520 (450) GeV are excluded for \(\tilde{t}_2\) masses of 650 (800) GeV, extending beyond the previous limits on the mass from Ref. [89] by up to 160 GeV.

8 Conclusion

A search for direct top squark pair production in events with a leptonically decaying Z boson or a SM Higgs boson decaying into a b-quark pair is presented. The analysis uses 139 \(\hbox {fb}^{-1}\) of proton–proton collision data at \(\sqrt{s} = 13\) \(\text {TeV}\) recorded by the ATLAS detector at the LHC. No excess over the SM background predictions is observed, and exclusion limits are presented for a selection of simplified models. The limits exclude, at 95% confidence level, \(\tilde{t}_1 \) masses up to 1220 GeV in models featuring \(\tilde{t}_1 \) production and with decays, and \(\tilde{t}_2 \) masses up to 875 GeV in models featuring \(\tilde{t}_2 \) production and \(\tilde{t}_2 \rightarrow Z\tilde{t}_1 \) with decays. Compared with previous limits, these results extend the mass parameter space exclusion by up to 300 GeV in \(\tilde{t}_1 \) mass and by up to 160 GeV in mass in the considered \(\tilde{t}_2 \) model.