1 Introduction

Being the heaviest known elementary particle of the Standard Model (SM), the top quark has a large coupling to the SM Higgs boson and is predicted to have large couplings to hypothetical new particles in many models beyond the SM (BSM). In that respect, rare processes involving the top quark are particularly relevant to study. Among these, the production of four top quarks (\(t\bar{t}t\bar{t}\)) is predicted by the SM but has not been observed yet. The \(t\bar{t}t\bar{t}\) cross section is sensitive to the magnitude and CP properties of the Yukawa coupling of the top quark to the Higgs boson since four top quarks can be produced via an offshell SM Higgs boson [1, 2]. Enhancements of the \(t\bar{t}t\bar{t}\) cross section (\(\sigma _{t\bar{t}t\bar{t}}\)) are expected in many BSM scenarios, such as gluino pair production in supersymmetry theories [3, 4], pair production of scalar gluons [5, 6], or the production of a heavy pseudoscalar or scalar boson in association with a top-quark pair (\(t\bar{t}\)) in Type II two-Higgs-doublet models (2HDM) [7,8,9]. Within an effective field theory framework [10], the BSM contribution to \(t\bar{t}t\bar{t}\) production can be parameterised by non-renormalisable effective couplings and can be expressed for instance via a \(t\bar{t}t\bar{t}\) contact interaction.

The cross section of the SM production of four top quarks from proton–proton (pp) collisions at \(\sqrt{s}=13\) \(\text {TeV}\) is predicted to be \(\sigma _{t\bar{t}t\bar{t}} = 12.0~\text {fb}\) with a relative scale uncertainty of \(\pm 20\%\) at next-to-leading order (NLO) in QCD including electroweak corrections [11]. Examples of Feynman diagrams for \(t\bar{t}t\bar{t}\) QCD production in the SM are shown in Fig. 1.

Fig. 1
figure 1

Examples of Feynman diagrams for SM \(t\bar{t}t\bar{t}\) production at leading order in QCD

In the SM, the top quark is expected to decay into a W boson and a b-quark with a branching ratio of approximately 100%. Thus, the \(t\bar{t}t\bar{t}\) process will give rise to \(W^+W^-W^+W^-b\bar{b}b\bar{b}\) events which then produce different final states depending on the hadronic or leptonic decay mode of the W bosons. This paper considers events that contain exactly two isolated leptons with the same electric charge (2LSS) or events with at least three isolated leptonsFootnote 1 (3L), having branching fractions of 7 and 5%, respectively. Although this channel, referred to as 2LSS/3L, has a small branching fraction, it benefits from low levels of background. The \(t\bar{t}t\bar{t}\) topology is characterised by high jet and b-jet multiplicities and high overall energy, which can be quantified as a large value for the scalar sum of the transverse momenta of objects in the event.

A number of SM processes can produce events with topologies similar to those of \(t\bar{t}t\bar{t}\) events and thus are backgrounds to \(t\bar{t}t\bar{t}\) production. The dominant source is \(t\bar{t}\) production in association with other particles, such as a Higgs boson (\(t\bar{t}H\)+jets), W boson (\(t\bar{t}W\)+jets), or Z boson (\(t\bar{t}Z\)+jets). Smaller contributions are expected from \(t\bar{t}WW\), multi-boson production, single-top-quark as well as \(t\bar{t}t\) production. Significant backgrounds also come from events where one of the leptons has a misassigned charge and events that contain leptons arising from heavy-flavour decays, photon conversions or misidentified jets, the latter three being collectively referred to as ‘fake/non-prompt ’. The heavy-flavour decays are the dominant source for muons, while other sources mostly affect electrons. The charge misassignment and fake/non-prompt background comes mainly from \(t\bar{t}\) events.

In the analysis described in this paper, signal events are separated from background events using a multivariate discriminant. A fit is then performed on the distribution of the multivariate discriminant in the signal-enriched region. Background-enriched regions are also added to the fit to determine the normalisations of the \(t\bar{t}W\)+jets background and of some sources of fake/non-prompt background.

ATLAS and CMS previously searched for \(t\bar{t}t\bar{t}\) production in 13 \(\text {TeV}\) \(pp\) collisions. The ATLAS search combined results in the 2LSS/3L channel with those in a channel comprising single-lepton events and dilepton events with two opposite-sign charged leptons (called the 1L/2LOS channel). This analysis used 36 fb\(^{-1}\) of data and led to an observed (expected) significance of 2.8 (1.0) standard deviations [12, 13]. The CMS combination of the 1L/2LOS and 2LSS/3L channels using 36 fb\(^{-1}\) quotes an observed (expected) significance of 1.4 (1.1) standard deviations [14]. The latest CMS search using 137 fb\(^{-1}\) in the 2LSS/3L channel leads to an observed (expected) significance for the \(t\bar{t}t\bar{t}\) signal of 2.6 (2.7) standard deviations [15].

2 The ATLAS detector

The ATLAS experiment [16,17,18] at the LHC is a multipurpose particle detector with a forward–backward symmetric cylindrical geometry and a nearly \(4\pi \) coverage in solid angle.Footnote 2 It consists of an inner tracking detector (ID) surrounded by a thin superconducting solenoid providing a 2T axial magnetic field, electromagnetic (EM) and hadron calorimeters, and a muon spectrometer. The inner tracking detector covers the pseudorapidity range \(|\eta | < 2.5\). It consists of silicon pixel, silicon microstrip, and transition radiation tracking detectors. Lead/liquid-argon (LAr) sampling calorimeters provide electromagnetic (EM) energy measurements with high granularity. A steel/scintillator-tile hadron calorimeter covers the central pseudorapidity range (\(|\eta | < 1.7\)). The endcap and forward regions are instrumented with LAr calorimeters for both EM and hadronic energy measurements up to \(|\eta | = 4.9\). The muon spectrometer (MS) surrounds the calorimeters and is based on three large air-core toroidal superconducting magnets with eight coils each. The field integral of the toroids ranges between 2.0 and 6.0 Tm across most of the detector. The MS includes a system of precision tracking chambers and fast detectors for triggering. A two-level trigger system is used to select events. The first-level trigger is implemented in hardware and uses a subset of the detector information to keep the accepted rate below 100 kHz. This is followed by a software-based trigger that reduces the accepted event rate to 1kHz, on average, depending on the data-taking conditions [19].

3 Object and event selection

Data used in this analysis were collected by the ATLAS detector between 2015 and 2018 at \(\sqrt{s} = {13}\,{\hbox {TeV}}\). Only events for which all detector subsystems were operational are considered. The data set corresponds to an integrated luminosity of 139 fb\(^{-1}\) [20, 21].

Events were collected using single-lepton or dilepton triggers. Single-lepton triggers select events with leptons satisfying either low transverse momentum (\(p_{\text {T}}\)) thresholds and an isolation requirement, or a looser identification criterion and higher thresholds with no isolation requirement. The lowest \(p_{\text {T}}\) thresholds used in the single-lepton triggers varied from 20 to 26 \(\text {GeV}\) depending on the lepton flavour and the data-taking period [22, 23]. The \(p_{\text {T}}\) thresholds used in the dilepton triggers varied from 8 to 24 \(\text {GeV}\) depending on the lepton flavour and the data-taking period. Dilepton triggers are used to select events with leptons without requiring any isolation requirement; these are used to validate the fake/non-prompt background estimation.

Events are required to have at least one vertex reconstructed from at least two ID tracks with transverse momenta of \(p_{\text {T}} > {0.4}\,{\hbox {GeV}}\). The primary vertex for each event is defined as the vertex with the highest sum of \(p_{\text {T}}^{2}\) over all associated ID tracks [24].

Electron candidates are reconstructed from energy deposits in the EM calorimeter associated with ID tracks [25] and are required to have a calorimeter energy cluster with pseudorapidity \(|\eta _{\text {cluster}}|<2.47\), excluding the transition region between the barrel and the endcap calorimeters (\({|\eta _{\text {cluster}}|\not \in [1.37,1.52]}\)). Muon candidates are reconstructed by combining tracks in the ID with tracks in the MS [26] and are required to have \(|\eta |<2.5\). Both the electron and muon candidates are required to have \(p_{\text {T}}>\) 28 \(\text {GeV}\). The transverse impact parameter divided by its estimated uncertainty, \({|d_{0} |/\sigma (d_{0})}\), is required to be lower than five (three) for electron (muon) candidates. The longitudinal impact parameter must satisfy \({|z_{0} \sin (\theta ) |<0.5}\) mm for both lepton flavours. Electrons are required to meet the ‘Tight’ likelihood-based identification criterion and to be isolated using criteria based on the properties of the topological clusters in the calorimeter and of the ID tracks around the reconstructed electron [25]. Muons are required to meet the ‘Medium’ cut-based identification criterion, which includes requirements on the number of hits in the ID and MS as well as requiring compatibility between momentum measurements in the ID and MS. Muons also have to satisfy the isolation requirement based on the properties of ID tracks around the reconstructed muon.

To reduce the impact of charge misassignment background, an additional requirement is imposed on electrons in the \(e^{\pm }e^{\pm }\) and \(e^{\pm }\mu ^{\pm }\) channels. This requirement is based on the score of a boosted decision tree (BDT) that uses the calorimeter cluster and track properties of the electron [25] and is trained on data enriched in \(Z \rightarrow ee\) events to separate events with correct and incorrect electron charge assignments. The chosen requirement on the BDT score removes approximately 90% of electrons with a wrong charge assignment while selecting 98% of electrons with correctly measured charge.

Jets are reconstructed from topological clusters [27] of energy deposits in the calorimeters using the anti-\(k_{t}\) algorithm [28, 29] with a radius parameter of \(R=0.4\) and are calibrated as described in Ref. [30]. Jets are required to have \(p_{\text {T}} >{25}\,{\hbox {GeV}}\) and \(|\eta |<2.5\). To reduce the effect from additional \(pp\) collisions in the same or a nearby bunch crossing, collectively referred to as pile-up, jets with \(p_{\text {T}} <{120}\,{\hbox {GeV}}\) and \(|\eta |<2.4\) are considered only when they satisfy a requirement based on the output of a multivariate classifier called the jet-vertex-tagger (JVT) [31]. Events that contain at least one jet arising from non-collision sources or detector noise are rejected by a set of quality criteria [32]. The MV2c10 multivariate algorithm [33] is used to identify jets containing b-hadrons. A jet is considered b-tagged if it passes the operating point corresponding to 77% average efficiency for b-quark jets in simulated \(t\bar{t}\) events with the corresponding rejection factors against light-quark/gluon jets and c-quark jets of 110 and 4, respectively.

A sequential overlap removal procedure is applied to avoid the same calorimeter energy deposit or the same track being reconstructed as two different objects. As a first step, electrons sharing their track with a muon candidate are removed. Next, the closest jet within \(\Delta R_y = \sqrt{(\Delta y)^2+(\Delta \phi )^2} = 0.2\) of an electron is removed.Footnote 3 Then, electrons within \(\Delta R_y = 0.4\) of a remaining jet are removed since they likely arise from b- or c-decays. After that, jets with fewer than three associated tracks that are within \(\Delta R_y = 0.2\) of a muon are removed. Finally, muons are removed if their tracks are within \(\Delta R_y = 0.4+ {10}\,{\hbox {GeV}}/{pT}_{\mu }\) of any remaining jets as they also likely arise from b- or c-decays.

The missing transverse momentum in the event, whose magnitude is denoted in the following by \(E_{\text {T}}^{\text {miss}}\), is defined as the negative vector sum of the \(p_{\text {T}}\) of the reconstructed and calibrated objects in the event [34]. This sum includes the momenta of the ID tracks that are matched to the primary vertex and are not associated with any other objects.

The events are required to have one same-sign lepton pair or at least three leptons without charge requirement. Each event must have at least one reconstructed lepton that matches a lepton that fired the trigger. Events with two same-sign electrons are required to have the invariant mass \(m_{ee}>15~\text {GeV}\) and \(|m_{ee}-91~\text {GeV}|>10~\text {GeV}\) to reduce the charge misassignment background coming from low-mass resonances and Z-boson decay. In events with at least three leptons, all the opposite-sign same-flavour lepton pairs are required to satisfy \(|m_{\ell \ell }-91~\text {GeV}|>10~\text {GeV}\) to reduce the contamination from Z-boson decay.

Events arising from \(t\bar{t}t\bar{t}\) production are selected by exploiting the high multiplicities of light-flavour jets and b-tagged jets as well as the large overall event activity. This last property is probed by the scalar sum of the transverse momentum of the isolated leptons and jets in the event, denoted by \(H_{\text {T}}\). The inclusive signal region (SR) is defined by requiring at least six jets, at least two b-tagged jets, and \(H_{\text {T}}\) above \(500\,\text {GeV}\).

4 Monte Carlo samples

Production of \(t\bar{t}t\bar{t}\) events is modelled according to the SM expectation. The nominal sample used to model the \(t\bar{t}t\bar{t}\) signal was generated using  the MadGraph5_aMC@NLO v2.6.2 [35] generator, which provides matrix elements (ME) at NLO in the strong coupling constant \(\alpha _{\text {S}}\), with the NNPDF3.1nlo [36] PDF set. The functional form of the renormalisation and factorisation scales was set to \(\mu _\mathrm {r} = \mu _\mathrm {f} = m_{\texttt {T}}/4\), where \(m_{\texttt {T}}\) is defined as the scalar sum of the transverse masses \(\sqrt{p_{\texttt {T}}^{2} + m^{2}}\) of the particles generated from the matrix element calculation, following Ref. [11]. The parton shower, fragmentation, and underlying event were simulated using Pythia 8.230  [37] with the A14 set of tuned parameters (tune) [38] and the NNPDF2.3lo PDF set. The top-quark mass \(m_{\mathrm {top}}\) in this sample and in all other simulated samples is set to 172.5 \(\text {GeV}\). An alternative \(t\bar{t}t\bar{t}\) sample generated with the same MadGraph5_aMC@NLO set-up but interfaced to Herwig 7.04  [39, 40] with the H7UE tune [40] and the MMHT2014LO PDF set [41] is used to evaluate uncertainties due to the choice of parton shower and hadronisation model. In order to mitigate the effect of the large fraction of negative weights present in the nominal sample that would be detrimental for training of the multivariate discriminant used to separate signal from background (see Sect. 6), an additional sample with settings similar to the nominal ones was generated using leading-order (LO) matrix elements. Good agreement between the distributions of the kinematic variables used by the multivariate discriminant simulated at LO and NLO was observed.

The \(t\bar{t}W\) simulated events were generated using the Sherpa 2.2.1  [42] generator with the NNPDF3.0nlo PDF set and the tune provided by the Sherpa authors. The ME was calculated for up to one additional parton at NLO QCD and up to two partons using the five-flavour scheme, including c- and b-quarks, at LO QCD using the Comix [42] and OpenLoops [43, 44] libraries, and was merged with the Sherpa parton shower [45] using the MEPS@NLO prescription [46,47,48,49] and a merging scale of 30 \(\text {GeV}\). The renormalisation and factorisation scales were set to \(\mu _\mathrm {r} = \mu _\mathrm {f} = m_{\texttt {T}}/2\). The simulated \(t\bar{t}W\) sample is normalised to the cross section of 601 fb computed at NLO in QCD with the leading NLO electroweak corrections [50,51,52]. An alternative \(t\bar{t}W\) sample was generated at NLO in QCD with no additional partons using the MadGraph5_aMC@NLO v2.3.3 generator with the same PDF as the nominal sample. The events were interfaced to Pythia 8.210 using the A14 tune and the NNPDF2.3lo PDF set.

The production of \(t\bar{t}(Z/\gamma ^{*})\) events was modelled at NLO in QCD using the MadGraph5_aMC@NLO  v2.3.3 generator in the five-flavour scheme with the NNPDF3.0nlo PDF set interfaced to Pythia 8.230 using the A14 tune and the NNPDF2.3lo PDF set. It is normalised to the inclusive \(t\bar{t}\ell ^+\ell ^- \) cross section of 880 fb computed at NLO in QCD [50,51,52], including off-shell Z and \(\gamma ^*\) contributions with \(m(\ell ^+\ell ^-) > 5~\text {GeV}\). An alternative sample was generated with NLO matrix elements using Sherpa  2.2.1 and the same PDF as the nominal sample.

The production of \(t\bar{t}\) and single-top-quark events was modelled using the Powheg-Box  [53,54,55,56] v2 generator at NLO in QCD with the NNPDF3.0nlo PDF set. In the \(t\bar{t}\) sample the \(h_\mathrm {damp}\) parameterFootnote 4 was set to 1.5 \(m_{\mathrm {top}}\)  [57]. The overlap between the \(t\bar{t}\) and the tW final states was removed using the diagram removal technique [58]. The \(t\bar{t}\) and single-top-quark simulated samples are normalised to the cross sections calculated at next-to-next-to-leading order (NNLO) in QCD including the resummation of next-to-next-to-leading logarithmic (NNLL) soft-gluon terms [59,60,61,62].

The production of \(t\bar{t}H\) events was modelled by the Powheg-Box v2 generator at NLO in QCD using the five-flavour scheme with the NNPDF3.0nlo PDF set, with the \(h_\mathrm {damp}\) parameter set to \(1.5\times (2m_{\mathrm {top}} +m_H)/2\) where the Higgs boson mass is \(m_H=125\) \(\text {GeV}\). The simulated sample is normalised to the cross section computed at NLO in QCD with the leading NLO electroweak corrections [50,51,52]. An alternative sample generated using the MadGraph5_aMC@NLO v2.3.3 generator with the same settings is used to evaluate the uncertainty in \(t\bar{t}H\) modelling due to the generator choice.

The \(tWZ\) events were generated at NLO in QCD using the MadGraph5_aMC@NLO v2.3.3 generator with the NNPDF3.0nlo PDF set. The other rare top-quark processes, namely the production of \(tZ\), \(t\bar{t}WW\) \(t\bar{t}ZZ\), \(t\bar{t}WZ\), \(t\bar{t}HH\) and \(t\bar{t}WH\), and \(t\bar{t}t\), were modelled using the MadGraph5_aMC@NLO generator at LO in QCD. The \(t\bar{t}\), single-top-quark, \(t\bar{t}H\), \(tZ\), \(tWZ\), \(t\bar{t}WW\), \(t\bar{t}ZZ\), \(t\bar{t}WZ\), \(t\bar{t}HH\), \(t\bar{t}WH\) and \(t\bar{t}t\) generated samples were all interfaced with Pythia 8.230 using the A14 tune and the NNPDF2.3lo PDF set. Rare top-quark background contributions are normalised using their NLO QCD theoretical cross sections. The \(t\bar{t}t\) production is normalised to the cross section of 1.64 fb calculated at LO in QCD with NNPDF2.3lo.

The WH and ZH processes were generated using the Pythia 8.230 generator with the A14 tune and NNPDF2.3lo PDF set and normalised to their theoretical cross sections calculated at NNLO in QCD and NLO electroweak accuracies.

Samples of diboson (\(VV \)) and triboson (\(VVV\)) production were simulated with the Sherpa 2.2.1 generator, and samples of Z+jets and W+jets production were simulated with the Sherpa 2.2.2 generator, both with the NNPDF3.0nnlo PDF set. The \(VV \) and \(VVV\) samples are normalised to the theoretical cross sections calculated at NLO QCD, and Z+jets and W+jets backgrounds are normalised to the NNLO cross sections [63].

The effects of pile-up were modelled by overlaying minimum-bias events, simulated using the soft QCD processes of Pythia 8.186 with the A3 tune [64], on events from hard processes. For all samples of simulated events, except those generated using Sherpa, the EvtGen v1.2.0 program [65] was used to describe the decays of bottom and charm hadrons.

The nominal signal and background samples were processed through the simulation [66] of the ATLAS detector geometry and response using Geant4 [67], and then reconstructed using the same software as is used for the collider data. Some of the alternative samples used to evaluate systematic uncertainties were instead processed through a fast detector simulation making use of parameterised showers in the calorimeters [68]. Corrections were applied to the simulated events so that the physics objects’ selection efficiencies, energy scales and energy resolutions match those determined from data control samples.

5 Background estimation

Backgrounds in the 2LSS/3L channel can be categorised as irreducible and reducible. Irreducible backgrounds are those for which all selected leptons are from W- or Z-boson decays or from leptonic \(\tau \)-lepton decays. The main irreducible backgrounds originate from the \(t\bar{t}W\)+jets, \(t\bar{t}Z\)+jets and \(t\bar{t}H\)+jets processes, mainly when additional jets are b-jets. The smaller backgrounds include diboson or triboson production, VH production in association with jets, and rare processes (\(t\bar{t}WW\), \(tWZ\), \(tZq\), \(t\bar{t}t\)). The irreducible background is evaluated using MC simulation normalised to the SM cross sections, except \(t\bar{t}W\)+jets for which the normalisation is corrected using data in a dedicated control region. The different treatment for the \(t\bar{t}W\)+jets background is motivated by theoretical studies [69] showing that electroweak corrections not included in the simulation have a significant effect as well as by the large \(t\bar{t}W\)+jets background normalisation factor found in recent measurements in similar phase space [70].

The reducible backgrounds originate mainly from \(t\bar{t}\)+jets and tW+jets production and have prompt leptons with misassigned charge (Q mis-id) or fake/non-prompt leptons. This fake/non-prompt background, together with \(t\bar{t}W\)+jets, is evaluated using the template method (cf. Sect. 5.1). The charge misassignment background is defined for the 2LSS channels only. It arises mainly from \(t\bar{t}\)+jets events with an opposite-charge lepton pair in which the charge of one electron is mismeasured either due to bremsstrahlung photon emission followed by its conversion (\(e^{\pm } \rightarrow e^{\pm } \gamma \rightarrow e^{\pm } e^+ e^-\)) or due to mismeasured track curvature. This background is evaluated using a data-driven method (cf. Sect. 5.2). The charge misassignment rate is negligible for muons due to the low probability of bremsstrahlung and the large lever arm of the muon spectrometer.

The estimated yield from each source of background is given in Sect. 8.

5.1 Fake/non-prompt lepton background and \(t\bar{t}W\)+jets production

The template method used to estimate the fake/non-prompt background relies on the simulation to model the kinematic distributions of background processes arising from fake and non-prompt leptons and on control regions to determine their normalisations. These control regions are included in the fit together with the signal region, and the normalisation factors are determined simultaneously with the \(t\bar{t}t\bar{t}\) signal.

Table 1 Summary of the signal and control regions used in the template fit. The variable \(m_{ee}^{\mathrm {CV}}\) (\(m_{ee}^{\mathrm {PV}}\)) is defined as the invariant mass of the system formed by the track associated with the electron and the closest track at the conversion (primary) vertex. \(N_j\) (\(N_b\)) indicates the jet (b-tagged jet) multiplicity in the event. \(H_{\text {T}} \) is defined as the scalar sum of the transverse momenta of the isolated leptons and jets

The following main contributions of the fake/non-prompt background are distinguished:

  • events with one non-prompt electron (muon) from heavy-flavour decay, HF e (HF \(\mu \)),

  • events with one non-prompt electron originating from photon conversion taking place in the detector material (Mat. Conv.),

  • events with a virtual photon (\(\gamma ^*\)) leading to an \(e^{+} e^{-}\) pair (Low \(m_{\gamma ^*}\)).

The minor components of the fake/non-prompt background arising from events with a lepton originating from light-meson decay (LF) or with a jet misidentified as a lepton (other fakes) are determined from MC simulation.

Several control regions, non-overlapping with the signal region, are defined to determine the normalisation of various components of the fake/non-prompt background from data. Each region is required to have a dominant component or a variable with good discriminating power between different components. Since events arising from \(t\bar{t}W\)+jets production represent a large contribution in all control and signal regions, the normalisation of that process is also determined using a dedicated control region. In total, four control regions with their corresponding discriminating variables are used in the analysis. They are summarised in Table 1 and are defined below:

  • ‘CR Conv. ’ is enriched in background events arising from both material photon conversion and processes with a virtual photon leading to an \(e^{+}e^{-}\) pair. For each electron in the selected \(e^\pm e^\pm \) or \(e^\pm \mu ^\pm \) events, the invariant mass of the system formed by the track associated with the electron and the closest track at the conversion (primary) vertex \(m_{ee}^{\mathrm {CV}}\) (\(m_{ee}^{\mathrm {PV}}\)) is computed. The conversion vertex is defined as the point where the track from the electron and its closest track in \(\Delta R\) have the same \(\phi \). The control region is then obtained by selecting events with at least four or five jets, at least one identified b-jet, with low \(m_{ee}^{\mathrm {CV}}\) and using the \(m_{ee}^{\mathrm {PV}}\) distribution in the fit to separate the material conversion and the \(\gamma ^*\) components from each other. Virtual photons lead to a lepton pair originating from the primary vertex, having a low \(m_{ee}^{\mathrm {PV}} \sim m_{\gamma ^*}\) and a low conversion radius. Material conversions happen further away from the primary vertex with a larger conversion radius, and the track extrapolation induces a larger apparent invariant mass. According to the MC simulation, the background arising from both \(\gamma ^*\) and material conversions accounts for around \(40\%\) of the total event yield in this control region.

  • ‘CR HF e ’ (‘CR HF \(\mu \) ’) is enriched in background events with an electron (muon) from heavy-flavour decay. This region is defined by selecting events with three leptons, namely eee and \(ee\mu \) (\(\mu \mu \mu \) and \(\mu \mu e\)) for CR HF e (CR HF \(\mu \)), and exactly one identified b-jet. This selection targets \(t\bar{t}\) dileptonic decays with an extra non-prompt lepton in events with low \(H_{\text {T}} \). The number of events in the region is used in the maximum-likelihood fit. According to the MC simulation, the background with an electron (muon) coming from heavy-flavour decay accounts for around \(40\%\) (\(50\%\)) of the total event yield in the CR HF e (CR HF \(\mu \)).

  • ‘CR ttW ’ is enriched in \(t\bar{t}W\)+jets events. This region is obtained by selecting \(e\mu \) and \(\mu \mu \) events with at least four jets and two b-jets which are neither in other CRs nor in the SR. Events containing electrons with \(|\eta |>1.5\) and ee final states are not considered, in order to reduce the contamination arising from charge misassignment background. The sum of the lepton \(p_{\text {T}}\) provides discrimination from other processes and is used in the maximum-likelihood fit. According to the MC simulation, the \(t\bar{t}W\)+jets background accounts for around \(33\%\) of the total event yield in this control region.

Fig. 2
figure 2

Pre-fit comparison between data and prediction in the signal region for two of the input variables used to train the multivariate discriminant: the pseudo-continuous b-tagging discriminant score summed over all the jets in the event (left) and the minimum distance between two leptons among all possible pairs (right). The band includes the total uncertainty of the pre-fit computation. The ratio of the data to the total pre-fit expectation is shown in the lower panel. The first and last bins contain underflow and overflow events, respectively. See Sect. 5 for the definitions of the different background categories

5.2 Charge misassignment background

The probability for an electron to have its charge incorrectly assigned is measured using a data sample of \(Z \rightarrow ee\) events requiring the invariant mass of the electron pair to be within 10 \(\text {GeV}\) of the Z-boson mass and without any requirement on the charge of the two electron tracks. The background contamination is subtracted using a sideband method [12]. The charge misassignment rate is parameterised as a function of electron \(p_{\text {T}}\) and \(|\eta |\), except for the conversion control region defined in Sect. 5.1, where it is also parameterised as a function of the invariant mass of the electron track and its closest track assuming that both tracks originate from the primary vertex. The charge misassignment rate varies from 0.002 to 4% depending on the electron \(p_{\text {T}}\) and \(|\eta |\).

The expected number of events arising from charge misassignment background is determined by applying the measured charge misassignment rate to data events satisfying the requirements of the kinematic selection of the 2LSS channel, except that the two leptons are required to be of opposite charge. In this sample, each event is weighted according to the value of the charge misassignment rate of each electron in the event.

6 Signal discrimination

The background composition of the SR is largely dominated by the production of top-quark pairs in association with additional jets and/or bosons. To separate signal from background events, a multivariate discriminant is built in the SR by combining several input observables into a boosted decision tree (BDT). This set of variables and the BDT hyperparameters are optimised to maximise the integral under the receiver operating characteristic (ROC) curve of the BDT. In total, 12 observables are selected, based on their discrimination power and the requirement of good modelling. Among them, the pseudo-continuous b-tagging discriminant score [33] summed over all the jets in the event is the best discriminating variable due to the four b-jets being produced mainly in signal events. The pseudo-continuous b-tagging discriminant score is an integer from 1 to 5 assigned to a jet, based on the operating point of the b-tagging algorithm it passes, with a value of five corresponding to the most b-like jet. For the signal region selection the minimum score is 10 for an event with exactly two b-tagged jets out of exactly six jets. The minimum distance \(\Delta R = \sqrt{(\Delta \eta )^2+(\Delta \phi )^2}\) between two leptons among all possible pairs is the second best discriminating variable since it provides good discrimination for events with at least three leptons. The distributions for those two variables are shown in Fig. 2. The other input variables are the leading lepton \(p_{\text {T}} \), \(E_{\text {T}}^{\text {miss}}\), the \(p_{\text {T}}\) of the leading and second-leading jets, the \(p_{\text {T}}\) of the sixth jet, the \(p_{\text {T}}\) of the leading b-jet, the scalar sum of transverse momenta over all leptons and all jets excluding the leading \(p_{\text {T}} \) jet, the sum of distances \(\Delta R\) between two leptons for all possible pairs, the maximum distance \(\Delta R\) between a b-jet and a lepton among all possible pairs, and the minimum distance \(\Delta R\) between a jet and a b-jet among all possible pairs. Taking into account all uncertainties, no significant discrepancy between data and predicted background was found for these variables in the various CRs.

The BDT training is performed inclusively, both in lepton flavour and lepton multiplicity for events passing the SR requirements. The LO \(t\bar{t}t\bar{t}\) simulated signal sample is used in the training. The background sample corresponds to the total expected background, as predicted by the simulation.

7 Systematic uncertainties

Various sources of systematic uncertainty impact the estimated signal and background rates, the migration of events between regions, and the shape of the fitted discriminant distributions. They can be classified into the experimental uncertainties and modelling uncertainties of the \(t\bar{t}t\bar{t}\) signal and of backgrounds, as described below. The impact of each source of systematic uncertainty on the final result can be found in Sect. 8.

7.1 Experimental uncertainties

The uncertainty in the combined 2015–2018 integrated luminosity is 1.7% [20], obtained using the LUCID-2 detector [21] for the primary luminosity measurements. To account for the difference between the pile-up distributions in data and MC simulations, an uncertainty related to the scale factors used to adjust the MC pile-up to the data pile-up profile is applied.

For electrons and muons, the reconstruction, identification, isolation and trigger performances as well as the lepton momentum scale and resolution differ between data and MC simulation. To correct for these differences, scale factors for each are applied. These scale factors were estimated using the tag-and-probe method [25, 26], which is performed using the leptonic decays of Z and W bosons and of \(J/\psi \) mesons. The associated systematic uncertainties are then propagated to the final distributions used in this analysis.

To determine the jet energy scale (JES) and its associated uncertainty, information from test-beam data, LHC collision data, and simulation was used, as described in Ref. [71]. The JES uncertainty is decomposed into a set of 30 uncorrelated components, 29 of which are used per event depending on the type of simulation. The jet energy resolution (JER) is measured separately for data and MC simulation using in situ techniques, similar to those in Ref. [72]. Its uncertainty is represented by nine components accounting for jet-\(p_{\text {T}} \) and \(\eta \)-dependent differences between simulation and data, eight of which are used per event depending on the type of simulation. The systematic uncertainty associated with the JVT is obtained by varying the scale factor used to correct the JVT efficiency in simulation up and down within its uncertainties [31].

The b-tagging efficiencies and mistagging rate are measured in data using the same methods as are described in Refs. [33, 73, 74], with the systematic uncertainties due to b-tagging efficiency and the mistagging rates calculated separately. The impact of the uncertainties on the b-tagging calibration is evaluated separately for b-jets, c-jets and light-flavour jets in the MC samples.

The \(E_{\text {T}}^{\text {miss}}\) uncertainty due to a possible miscalibration of its soft-track component is derived from data–MC comparisons of the \(p_{\text {T}}\) balance between the hard and soft \(E_{\text {T}}^{\text {miss}}\) components [34].

7.2 Signal modelling uncertainties

Several sources of modelling uncertainty are considered for the \(t\bar{t}t\bar{t}\) signal. The uncertainty due to missing higher-order QCD corrections is determined by varying the renormalisation and the factorisation scales simultaneously by factors of 2.0 and 0.5 relative to the central value. The uncertainty related to the choice of parton shower and hadronisation model is estimated by comparing the nominal prediction with that obtained using an alternative sample generated with MadGraph5_aMC@NLO interfaced to Herwig 7. The effect of the PDF uncertainty on the signal MC prediction is calculated as the RMS of the signals from the 100 replicas of the \(\texttt {NNPDF30\_nlo\_as\_0118}\) PDF set following the PDF4LHC prescription [75]. Shape and normalisation variations due to the PDF uncertainty are found to be negligible.

7.3 Modelling uncertainties in irreducible background

Modelling uncertainties for the \(t\bar{t}W\)+jets, \(t\bar{t}Z\)+jets and \(t\bar{t}H\)+jets processes are evaluated in a similar way and include the uncertainty due to missing higher-order QCD corrections determined by varying the renormalisation and the factorisation scales simultaneously by factors of 2.0 and 0.5 relative to the central value, and a comparison with alternative generators.

For \(t\bar{t}Z\)+jets, the nominal MC prediction is compared with an NLO Sherpa sample, while for \(t\bar{t}H\)+jets the nominal simulation is compared with an NLO MadGraph5_aMC@NLO sample, both described in Sect. 4. A 1% uncertainty from the PDF is assigned to both the \(t\bar{t}Z\) and \(t\bar{t}H\) processes following the same procedure as described in Sect. 7.2. For \(t\bar{t}W\)+jets, uncertainties associated with the modelling of additional QCD radiation, with the choice of the ME generator and parton shower, are estimated by comparing the nominal prediction with that of an alternative sample that was generated at NLO with no additional partons using the MadGraph5_aMC@NLO generator with the same scale choice and PDF set as for the nominal sample (cf. Sect. 4).

An uncertainty of \(15\%\) (\(20\%\)) is applied to the \(t\bar{t}Z\) (\(t\bar{t}H\)) total cross section [52, 76]. Since the \(t\bar{t}W\)+jets normalisation is determined from the fit to data, no cross-section uncertainty is applied to this process. An additional \(125\%\) (300%) uncertainty is added for \(t\bar{t}W\) production with seven (eight or more) jets. These values correspond to the difference between data and prediction in a \(t\bar{t}W\)+jets validation region (cf. Sect. 8) where a data excess is observed for high jet multiplicities. Since the jet multiplicity distribution in a \(t\bar{t}Z\)+jets validation region shows good agreement between data and prediction, such uncertainty is not considered for \(t\bar{t}Z\)+jets or for \(t\bar{t}H\)+jets production due to similarity of their simulation.

The \(t\bar{t}W\)+jets, \(t\bar{t}Z\)+jets and \(t\bar{t}H\)+jets background processes enter the \(t\bar{t}t\bar{t}\) signal region if they have additional heavy-flavour jets. Such processes are difficult to model with the MC simulation. To account for this, an uncertainty of 50% is assigned to the events with three generator-level (‘true’) b-jets and a separate 50% uncertainty to the events with four or more true b-jets. These estimates are based on the measurement of \(t\bar{t}\) production with additional heavy-flavour jets [77] and on comparisons between data and prediction in \(t\bar{t} \gamma \) events with three and four b-tagged jets. They are treated as uncorrelated between the three backgrounds due to the different MC setups used to simulate the \(t\bar{t}W\)+jets, \(t\bar{t}Z\)+jets and \(t\bar{t}H\)+jets backgrounds.

The \(t\bar{t}t\) events have similar kinematics to the \(t\bar{t}t\bar{t}\) signal, although the rate is expected to be much smaller. However, it is currently unexplored experimentally. Thus a large ad hoc uncertainty of \(100\%\) is assigned to its cross section and an additional \(50\%\) uncertainty is applied to \(t\bar{t}t\) events with four true b-jets.

The uncertainty in the tZ and tWZ single-top-quark cross sections is set to 30% [78, 79] and that for the \(t\bar{t}WW\), \(t\bar{t}ZZ\), \(t\bar{t}WZ\), \(t\bar{t}HH\) and \(t\bar{t}WH\) cross sections to 50% [12]. The uncertainty in diboson production is set to 40%, based on studies of the \(WZ+b\) process. For each of the other small background processes a large ad hoc cross-section uncertainty of \(50\%\) is applied. For all small backgrounds except \(t\bar{t}t\) an additional 50% uncertainty is assigned to the events with three true b-jets and separately a 50% uncertainty for events with four or more true b-jets.

Fig. 3
figure 3

Comparison between data and prediction after the fit (‘Post-Fit’) for the distribution of the BDT score in the SR. The band includes the total uncertainty of the post-fit computation. The ratio of the data to the total post-fit computation is shown in the lower panel. See Sect. 5 for the definitions of the different background categories

7.4 Modelling uncertainties in reducible background

Uncertainties in the charge misassignment background arise from the following contributions: the statistical uncertainty of the fit to data used to determine the rates; the rate variation due to variation of the dielectron invariant mass requirement; and the rate variation due to a difference between the observed and the predicted misidentification rates when the method is applied to MC simulated events. This uncertainty is determined separately for the material conversion control region, for the \(t\bar{t}W\)+jets, and for all other control regions, and it is treated as correlated between the regions.

Table 2 Normalisation factors for various backgrounds determined from the fit to the control regions. The uncertainties include both the statistical and systematic uncertainties

Since the overall normalisations of the material conversion and the virtual photon backgrounds are free parameters in the fit, their uncertainty comes only from the shape of the distributions used in the template fit (cf. Sect. 5.1). For each of these sources, the uncertainty is obtained by comparing data with the \(\textsc {Powheg} {}+\textsc {Pythia} 8\) simulation of \(Z(\rightarrow \mu \mu )+\gamma \) and \(Z(\rightarrow \mu \mu )+\)jets production in a region enriched in \(Z(\rightarrow \mu \mu )+\gamma \) events. An uncertainty of 25% is applied to the material conversion and to the virtual photon background events fulfilling \(m_{ee}^{\mathrm {CV}}>0.1~\text {GeV}\) in all control and signal regions to cover the extrapolation from the ‘CR Conv. ’ region with \(0<m_{ee}^{\mathrm {CV}}<0.1~\text {GeV}\) to the regions with events with larger \(m_{ee}^{\mathrm {CV}}\).

The uncertainty in the shape of the distributions of the heavy-flavour non-prompt lepton background is estimated by comparing data with the background prediction, normalised to data, for a loose lepton selection with the isolation requirements dropped and the identification criteria relaxed. The shape uncertainty is derived for each region included in the fit, but these variations are treated as correlated between regions since the physics origin of the uncertainty is common to all of them. This systematic uncertainty is derived separately for electrons and muons.

A normalisation uncertainty of \(100\%\) is assigned to the background arising from light-flavour non-prompt leptons. This uncertainty was found to cover any difference between data and prediction in loose lepton regions [70]. An ad hoc uncertainty of \(30\%\) is applied to the normalisation of the background arising from the other minor sources of non-prompt leptons from \(t\bar{t}\) production. No uncertainty in the shape of the distributions of these backgrounds is considered since their contribution is very small.

Since the main source of reducible background is \(t\bar{t}\)+jets production, the systematic uncertainty in the modelling of its heavy-flavour content can affect the shape of the template distributions used in the fit. To account for this effect an uncertainty of 30%, based on the measurement of \(t\bar{t}\) production with additional heavy-flavour jets [77], is assigned to the events with three true b-jets and a separate 30% uncertainty to events with four or more true b-jets. A small contribution to the reducible background from \(V+\)jets production is determined from simulation and a normalisation uncertainty of \(30\%\) is assigned to this background.

8 Results

The \(t\bar{t}t\bar{t}\) production cross section and the normalisation factors of the backgrounds are determined via a binned likelihood fit to the BDT score distribution in the signal region and to the yields, or to the discriminating variable distributions, in the four control regions as listed in Table 1. The systematic uncertainties in both the signal and background predictions are included as nuisance parameters in the likelihood function. The maximum-likelihood fit is performed using the RooFit package [80] based on a likelihood function built on the Poisson probability that the observed data are compatible with the model prediction. The value of each nuisance parameter is constrained by a penalty factor present in the likelihood function, while all normalisation factors are unconstrained.

Table 3 Post-fit background and signal yields in the full signal region as well as for events in which the BDT score is also greater than zero. The total systematic uncertainty differs from the sum in quadrature of the different uncertainties due to correlations. Q mis-id refers to the charge misassignment background. Mat. Conv. and Low \(m_{\gamma ^*}\) refer respectively to events with one non-prompt electron originating from photon conversion in the detector material and to events with a virtual photon leading to an \(e^{+} e^{-}\) pair. HF e (HF \(\mu \)) refers to events with one non-prompt electron (muon) from heavy-flavour hadron decay, LF refers to events with a lepton originating from light-meson decay, and ‘Other \(t\bar{t}X\)’ includes events coming from \(t\bar{t}WZ\), \(t\bar{t}ZZ\), \(t\bar{t}WH\), \(t\bar{t}HH\)
Fig. 4
figure 4

Comparison between data and prediction after the fit (‘Post-Fit’) for the yields or distributions of the discriminating variables used in the fit in each CR (see Table 1). The band includes the total uncertainty of the post-fit computation. The ratio of the data to the total post-fit computation is shown in the lower panel. The first and last bins contain underflow and overflow events, respectively. See Sect. 5 for the definitions of the different background categories

Fig. 5
figure 5

Post-fit comparison between data and prediction in the \(t\bar{t}W\)+jets validation region for the multiplicity of jets (left) and the BDT score (right). The y-axis label \(N_{+} - N_{-}\) represents the difference between the number of events with a positive sum and the number of events with a negative sum of the charges of the selected leptons. The band includes the total uncertainty of the post-fit computation. The ratio of the data to the total post-fit computation is shown in the lower panel. The first and last bins contain underflow and overflow events, respectively

Fig. 6
figure 6

Post-fit comparison between data and prediction for signal region events for the distributions of: the sum of b-tagging pseudo-continuous scores of the jets in the event (top left), the minimum distance between two leptons among all possible pairs (top right), the multiplicity of jets (bottom left) and the multiplicity of b-tag jets (bottom right). The band includes the total uncertainty of the post-fit computation. The ratio of the data to the total post-fit computation is shown in the lower panel. The first and last bins contain underflow and overflow events, respectively. See Sect. 5 for the definitions of the different background categories

Fig. 7
figure 7

Post-fit comparison between data and prediction for signal region events with a BDT score greater than zero for the distributions of: the sum of b-tagging pseudo-continuous scores of the jets in the event (top left), the minimum distance between two leptons among all possible pairs (top right), the multiplicity of jets (bottom left) and the multiplicity of b-tag jets (bottom right). The band includes the total uncertainty of the post-fit computation. The ratio of the data to the total post-fit computation is shown in the lower panel. The first and last bins contain underflow and overflow events, respectively. See Sect. 5 for the definitions of the different background categories

The fit determines the best value of the signal strength \(\mu \), defined as a ratio of the \(t\bar{t}t\bar{t}\) cross section to the SM expectation, its uncertainty, and five normalisation factors: \(\text {NF}_{\text {HF}~e}\) (\(\text {NF}_{\text {HF}~\mu }\)) for the non-prompt electron (muon) background from heavy-flavour decays, \(\text {NF}_{\text {Mat.\ Conv.}}\) for the background from detector material conversions, \(\text {NF}_{\text {Low}~m_{\gamma ^*}}\) for the contribution of low-mass electron pairs, and \(\text {NF}_{t\bar{t}W}\) for the \(t\bar{t}W\)+jets contribution. For each free parameter, the uncertainty is derived following the asymptotic approximation [81]. An uncertainty of 20% is assigned to the \(t\bar{t}t\bar{t}\) cross section predicted by the SM. The prediction corresponding to all parameters maximising the full likelihood is referred to as the post-fit model.

The best-fit value of \(\mu \) is:

$$\begin{aligned} \mu =2.0 \pm 0.4({\mathrm {stat}})\,^{+0.7}_{-0.4} ({\mathrm {syst}})=2.0\,^{+0.8}_{-0.6}. \end{aligned}$$

The systematic uncertainty is determined by subtracting in quadrature the statistical uncertainty, obtained from a fit where all NPs are fixed to their post-fit values, from the total uncertainty. The measured \(\mu \) value is consistent within 1.7 standard deviations with the SM prediction corresponding to \(\mu = 1\). The probability for the background-only hypothesis to result in a signal-like excess at least as large as seen in data is derived using the profile-likelihood ratio following the procedure described in Ref. [81]. From this, the significance of the observed signal is found to be 4.3 standard deviations, while 2.4 standard deviations are expected. Figure 3 shows the distribution of the BDT score in the signal region after performing the fit. Good agreement is observed between data and the fitted prediction.

The fitted signal strength is converted into an inclusive cross section using the SM \(t\bar{t}t\bar{t}\) predicted cross section of \(\sigma _{t\bar{t}t\bar{t}} = 12.0 \pm 2.4 ~\text {fb}\) computed at NLO in QCD and electroweak couplings [11] and excluding its uncertainty. The measured \(t\bar{t}t\bar{t}\) production cross section is then:

$$\begin{aligned} \sigma _{t\bar{t}t\bar{t}}=24 \pm 5 ({\mathrm {stat}})\,^{+5}_{-4}({\mathrm {syst}}) \,\text {fb}=24\,^{+7}_{-6}\,\text {fb}. \end{aligned}$$

The normalisation factors of the different background sources determined from the fit are shown in Table 2. The post-fit background and signal yields are shown in Table 3. The normalisation factor for the \(t\bar{t}W\)+jets background is compatible with the observation from the previous ATLAS \(t\bar{t}H\) search [70] where the reference theoretical \(t\bar{t}W\)+jets background cross section was scaled up by 20% to account for extra jet production and EW effects compared to the theoretical cross section used in this analysis. The post-fit value of the nuisance parameter associated with the systematic uncertainty in the \(t\bar{t}W\) background with \(N_{\mathrm {jets}} =7\) is \(0.18^{+0.73}_{-0.61}\). This corresponds to a 22% increase in the number of \(t\bar{t}W\) events with seven jets. The post-fit value of the nuisance parameter for the systematic uncertainty of the \(t\bar{t}W\)+jets background with \(N_{\mathrm {jets}} \ge 8\) is \(0.22^{+0.56}_{-0.42}\), corresponding to a 65% increase in the number of \(t\bar{t}W\) events with eight or more jets. As a result of these increases and of the change in the \(t\bar{t}W\) background normalisation factor \(\text {NF}_{t\bar{t}W}\), the overall \(t\bar{t}W\) background yield in the signal-enriched region with a BDT score above zero increased from the \(12.4 \pm 8.8\) events predicted to \(23.2 \pm 10.1\) events after the fit to data. Apart from the uncertainties discussed above, no other nuisance parameters are found to be significantly adjusted or constrained by the fit.

Figure 4 shows the yields or the discriminating variable distributions used in the fit in each CR. Good agreement is observed between data and the post-fit computation.

In order to check the \(t\bar{t}W\)+jets background normalisation and modelling, a validation region is defined, based on the fact that the \(t\bar{t}W\)+jets process is charge asymmetric. The difference between the number of events with a positive sum and the number of events with a negative sum of the charges of the selected leptons is built in the region with at least four jets with at least two being b-tagged. This procedure removes the charge-symmetric processes and allows construction of distributions where \(t\bar{t}W\)+jets events dominate. The jet multiplicity and the BDT score distributions are displayed in Fig. 5 and show good agreement between data and post-fit computations.

The distributions for some of the key analysis variables are shown in Fig. 6 for the events in the signal region and in Fig. 7 for events in a signal-enriched region with a BDT score above zero.

The uncertainties impacting \(\mu \) are summarised in Table 4. Apart from the theoretical uncertainty of the signal cross section, the largest systematic uncertainty comes from the modelling of the \(t\bar{t}W\)+jets process. Within the uncertainties of the background modelling, the impact of the uncertainty in \(t\bar{t}t\) production is also significant. The expected cross section of this process is only of the order of 10% of \(\sigma _{t\bar{t}t\bar{t}}\). However, the shape of the BDT score distribution for \(t\bar{t}t\) production is similar to the one for signal, which leads to this uncertainty having a sizeable impact on \(\mu \). In order to test the sensitivity of the \(t\bar{t}t\bar{t}\) measurement to the value of the \(t\bar{t}t\) cross section, the fit was also performed assuming a \(t\bar{t}t\) cross section five times larger than the SM cross section. In that case the \(t\bar{t}t\bar{t}\) signal strength \(\mu \) decreases by 10% while the fitted normalisation factors are mostly unaffected.

Table 4 List of the uncertainties in the signal strength \(\mu \), grouped in categories. The quoted values are obtained by repeating the fit, fixing a set of nuisance parameters of the sources corresponding to the considered category, and subtracting in quadrature the resulting uncertainty from the total uncertainty of the nominal fit presented in the last line. The total uncertainty is different from the sum in quadrature of the components due to correlations between nuisance parameters. See Sect. 7 for the description of the uncertainty sources

The stability of the result has been checked. The fit was repeated with the data split according to year or by splitting the signal region into two regions with either same-sign dilepton events or events with at least three leptons. Different fits were also performed by using only positively charged same-sign lepton pairs or only negatively charged same-sign lepton pairs. All these tests showed compatible \(\mu \) values.

An additional test was performed by splitting the SR into five regions according to the number of leptons and b-tagged jets and by fitting the \(H_{\text {T}}\) distribution in each region. The BDT score is therefore not used in this test. The observed (expected) significance is found to be 4.3 (2.1) and the fitted signal strength is \(2.2^{+0.9}_{-0.6}\). This result is consistent with the result from the default fit.

9 Conclusion

A search is presented for four-top-quark production using an integrated luminosity of 139 fb\(^{-1}\) of proton–proton collision data at \(\sqrt{s}=13~\text {TeV}\) collected by the ATLAS detector at the LHC. The observed (expected) significance with respect to the background-only hypothesis is 4.3 (2.4) standard deviations, providing evidence for this process assuming the Standard Model four-top-quark production properties. The four-top-quark production cross section is measured to be \(24^{+7}_{-6}\) fb which is consistent within 1.7 standard deviations with the Standard Model expectation of \(\sigma _{t\bar{t}t\bar{t}} = 12.0 \pm 2.4 ~\text {fb}\).