1 Introduction

In the standard model (SM) the production of four top quarks (\(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) is a rare process, with representative leading-order (LO) Feynman diagrams shown in Fig. 1. Many beyond-the-SM (BSM) theories predict an enhancement of the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) cross section, \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)), such as gluino pair production in the supersymmetry framework [1,2,3,4,5,6,7,8,9,10], the pair production of scalar gluons [11, 12], and the production of a heavy pseudoscalar or scalar boson in association with a \(\mathrm {t}\overline{\mathrm {t}}\) pair in Type II two-Higgs-doublet models (2HDM) [13,14,15]. In addition, a top quark Yukawa coupling larger than expected in the SM can lead to a significant increase in \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) production via an off-shell SM Higgs boson [16]. The SM prediction for \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) at \(\sqrt{s} = 13 \,\text {TeV} \) is \(9.2^{+2.9}_{-2.4}\) \(\,\text {fb}\) at next-to-leading order (NLO) [17]. An alternative prediction of \(12.2^{+5.0}_{-4.4}\) \(\,\text {fb}\) is reported in Ref. [16], obtained from a LO calculation of \(9.6^{+3.9}_{-3.5}\) \(\,\text {fb}\) and an NLO/LO K-factor of 1.27 based on the \(14\,\text {TeV} \) calculation of Ref. [18].

Fig. 1
figure 1

Representative Feynman diagrams for \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) production at LO in the SM

After the decays of the top quarks, the final state contains several jets resulting from the hadronization of light quarks and b quarks (b jets), and may contain isolated leptons and missing transverse momentum depending on the decays of the \(\mathrm {W}\) bosons [19]. Among these final states, the same-sign dilepton and the three- (or more) lepton final states, considering \(\ell = \mathrm {e}, \mu \), correspond to branching fractions in \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) events of 8 and 1%, respectively, excluding the small contribution from \(\mathrm {W}\rightarrow \tau \nu \), which is included in selected events. However, due to the low level of backgrounds, these channels are the most sensitive to \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) production in the regime with SM-like kinematic properties. The ATLAS and CMS Collaborations at the CERN LHC have previously searched for SM \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) production in \(\sqrt{s}= 8\) and 13\(\,\text {TeV}\) \(\mathrm {p}\) \(\mathrm {p}\) collisions [20,21,22,23,24]. The most sensitive of these results is a re-interpretation of the CMS same-sign dilepton search for BSM physics at 13 \(\,\text {TeV}\) [23], with an observed (expected) \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) cross section upper limit (assuming no SM \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal) of 42 (\(27^{+13}_{-8}\))\(\,\text {fb}\) at the 95% confidence level (CL).

The previous search is inclusive, exploring the final state with two same-sign leptons and at least two jets, using an integrated luminosity of 35.9\(\,\text {fb}^{-1}\) [23]. The analysis described in this paper is based on the same data set and improves on the previous search by optimizing the signal selection for sensitivity to SM \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) production, by using an improved b jet identification algorithm, and by employing background estimation techniques that are adapted to take into account the higher jet and b jet multiplicity requirements in the signal regions.

2 Background and signal simulation

Monte Carlo (MC) simulations at NLO are used to evaluate the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal acceptance and to estimate the background from diboson (\(\mathrm {W}\) \(\mathrm {Z}\), ZZ, \(\mathrm {Z} \mathrm {\gamma }\), \(\mathrm {W}^{\pm }\mathrm {W}^{\pm }\)) and triboson (\(\mathrm {W}\) \(\mathrm {W}\) \(\mathrm {W}\), \(\mathrm {W}\) \(\mathrm {W}\) \(\mathrm {Z}\), \(\mathrm {W}\mathrm {Z} \mathrm {Z} \), \(\mathrm {Z} \mathrm {Z} \mathrm {Z} \), \(\mathrm {W}\mathrm {W}\mathrm {\gamma }\), \(\mathrm {W}\mathrm {Z} \mathrm {\gamma }\)) processes, as well as from production of single top quarks (\(\mathrm {t} \mathrm {Z} \mathrm {q} \), \(\mathrm {t} \mathrm {\gamma }\)), or \(\mathrm {t}\overline{\mathrm {t}}\) produced in association with a boson (\(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\), \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\)). These samples are generated using the NLO MadGraph 5_amc@nlo 2.2.2 [17] program with up to one additional parton in the matrix-element calculation, except for \(\mathrm {W}^{\pm }\mathrm {W}^{\pm }\) which is generated with up to two additional partons, and the \(\mathrm {W}\) \(\mathrm {Z}\), ZZ and \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) samples, which are generated with no additional partons with the powheg box  v2 [25, 26] program. The \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) sample with \(\mathrm {Z}/\gamma ^{*}\rightarrow \ell \ell \) is generated with a dilepton invariant mass greater than \(1\,\text {GeV} \). The LO MadGraph 5_amc@nlo generator, scaled to NLO cross sections, is used to estimate the \(\mathrm {W}\mathrm {\gamma }\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {\gamma }\) processes with up to three additional partons. Other rare backgrounds, such as \(\mathrm {t}\overline{\mathrm {t}}\) production in association with dibosons (\(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\mathrm {W}\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\mathrm {Z} \), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z} \mathrm {Z} \), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\mathrm {H} \), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z} \mathrm {W}\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {H} \mathrm {H} \)), triple top quark production (\(\mathrm {t}\overline{\mathrm {t}} \mathrm {t} \), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t} \mathrm {W}\)), and \(\mathrm {t} \mathrm {W}\mathrm {Z} \) are generated using LO MadGraph 5_amc@nlo without additional partons, and scaled to NLO cross sections [27]. The NNPDF3.0LO (NNPDF3.0NLO) [28] parton distribution functions (PDFs) are used to generate all LO (NLO) samples. Parton showering and hadronization, as well as \(\mathrm {W}^{\pm }\mathrm {W}^{\pm }\) from double-parton-scattering, are modeled by the pythia  8.205 [29] program, while the MLM [30] and FxFx [31] prescriptions are employed in matching additional partons in the matrix-element calculations to parton showers in the LO and NLO samples, respectively. The top quark mass in the generators is set to \(172.5\,\text {GeV} \). The Geant4 package [32] is used to model the response of the CMS detector. Additional proton–proton interactions (pileup) within the same or nearby bunch crossings are also included in the simulated events.

To improve the MC modeling of the multiplicity of additional jets from initial-state radiation (ISR), simulated \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) events are reweighted based on the number of ISR jets (\(N_{\text {jets}}^\mathrm {ISR}\)). The reweighting is based on a comparison of the light-flavor jet multiplicity in dilepton \(\mathrm {t}\overline{\mathrm {t}}\) events in data and simulation. The method requires exactly two jets identified as originating from b quarks in dilepton \(\mathrm {t}\overline{\mathrm {t}}\) events, and assumes that all other jets are from ISR. Weighting factors are obtained as a function of \(N_{\text {jets}}^\mathrm {ISR}\) to bring data and MC into agreement. These weights are then applied, keeping the total cross section constant, to \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) MC as a function of the number of jets not originating from top quark, \(\mathrm {W}\), or \(\mathrm {Z}\) decays. To improve the modeling of the flavor of additional jets, the simulation is also corrected to account for the measured ratio of \(\mathrm {t}\overline{\mathrm {t}} \mathrm {b} \overline{\mathrm {b}} / \mathrm {t}\overline{\mathrm {t}} \text {jj}\) cross sections reported in Ref. [33]. More details on these corrections and their uncertainties are provided in Sect. 6.

3 The CMS detector and event reconstruction

The central feature of the CMS detector is a superconducting solenoid of 6\(\,\text {m}\) internal diameter, providing a magnetic field of 3.8\(\,\text {T}\). Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (\(\eta \)) coverage provided by the barrel and endcap detectors. Muons are measured in gas-ionization detectors embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [34].

Events of interest are selected using a two-tiered trigger system [35]. The first level (L1), composed of custom hardware processors, uses information from the calorimeters and muon detectors to select events at a rate of around 100\(\,\text {kHz}\) within a time interval of less than 4\(\,\mu \text {s}\). The second level, known as the high-level trigger (HLT), consists of a farm of processors running a version of the full event reconstruction software optimized for fast processing, and reduces the event rate to less than 1\(\,\text {kHz}\) before data storage.

Events are processed using the particle-flow (PF) algorithm [36], which reconstructs and identifies each individual particle with an optimized combination of information from the various elements of the CMS detector. The energy of photons is directly obtained from the ECAL measurement. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with the electron track [37]. The momentum of muons is obtained from the curvature of the corresponding track, combining information from the silicon tracker and the muon system [38]. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for the response function of the calorimeters to hadronic showers. The energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energies.

Hadronic jets are clustered from neutral PF candidates and charged PF candidates associated with the primary vertex, using the anti-\(k_{\mathrm {T}}\) algorithm [39, 40] with a distance parameter of 0.4. The jet momentum is determined as the vectorial sum of all PF candidate momenta in the jet. An offset correction is applied to jet energies to take into account the contribution from pileup. Jet energy corrections are derived from simulation, and are improved with in situ measurements of the energy balance in dijet, multijet, \(\mathrm {\gamma }\)+jet and leptonically decaying Z+jet events [41, 42]. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated noise patterns in certain HCAL regions. Jets originating from \(\mathrm {b}\) quarks are identified as b-tagged jets using a deep neural network algorithm [43], with a working point chosen such that the efficiency to identify a b jet is 55–70% for a jet transverse momentum (\(p_{\mathrm {T}}\)) between 20 and 400\(\,\text {GeV}\). The misidentification rate for a light-flavor jet is 1–2% in the same jet \(p_{\mathrm {T}}\) range. The vector \({\vec p}_{\mathrm {T}}^{\text {miss}}\) is defined as the projection on the plane perpendicular to the beams of the negative vector sum of the momenta of all reconstructed PF candidates in an event [44]. Its magnitude, called missing transverse momentum, is referred to as \(p_{\mathrm {T}} ^\text {miss}\). The scalar \(p_{\mathrm {T}}\) sum of all jets in an event is referred to as \(H_{\mathrm {T}}\).

4 Event selection and search strategy

The definitions of objects and the baseline event selection follow closely those of Refs. [23, 45]. Electron identification is based on a multivariate discriminant using shower shape and track quality variables, while for muons it is based on the quality of the geometrical matching between the tracker and muon system measurements. Isolation and impact parameter requirements are applied to both lepton flavors, as well as specific selections designed to improve the accuracy of the charge reconstruction. The combined reconstruction and identification efficiency is in the range of 45–70% (70–90%) for electrons (muons), increasing as a function of \(p_{\mathrm {T}}\) and converging to the maximum value for \(p_{\mathrm {T}}>\) 60\(\,\text {GeV}\). The number of leptons (\(N_\ell \)), the number of jets (\(N_\text {jets}\)), and the number of b-tagged jets (\(N_\text {b}\)) are counted after the application of the basic kinematic requirements summarized in Table 1.

Table 1 Kinematic requirements for leptons and jets

Signal events are selected using triggers that require two leptons with \(p_{\mathrm {T}} > 8\,\text {GeV} \) and \(H_{\mathrm {T}} > 300\,\text {GeV} \). The trigger efficiency is greater than 95% for di-electron (\(\mathrm {e}\mathrm {e}\)) and electron-muon (\(\mathrm {e}\mathrm {\mu }\)) events and about 92% for di-muon (\(\mathrm {\mu }\mathrm {\mu }\)) events. The baseline selections require \(H_{\mathrm {T}} > 300\,\text {GeV} \) and \(p_{\mathrm {T}} ^\text {miss} >50\,\text {GeV} \), at least two jets (\(N_\text {jets} \ge 2\)), at least two b-tagged jets (\(N_\text {b} \ge 2\)), a leading lepton with \(p_{\mathrm {T}} > 25\,\text {GeV} \), and a second lepton of the same charge with \(p_{\mathrm {T}} > 20\,\text {GeV} \). To reduce the background from Drell–Yan with a charge-misidentified electron, events with same-sign electron pairs with mass below 12\(\,\text {GeV}\) are rejected. Events where a third lepton with \(p_{\mathrm {T}}\) larger than 5 (7)\(\,\text {GeV}\) for muons (electrons) forms an opposite-sign (OS) same-flavor pair with mass below 12\(\,\text {GeV}\) or between 76 and 106\(\,\text {GeV}\) are also rejected. If the third lepton has \(p_{\mathrm {T}} > 20\,\text {GeV} \) and the invariant mass of the pair is between 76 and 106\(\,\text {GeV}\), these rejected events are used to populate a \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}\) background control region (CRZ). The signal acceptance in the baseline region, including the leptonic \(\mathrm {W}\) boson branching fraction, is approximately 1.5%. After these requirements, we define eight mutually exclusive signal regions (SRs) and a control region for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) background (CRW), based on \(N_\text {jets}\), \(N_\text {b}\), and \(N_\ell \), as detailed in Table 2. The observed and predicted yields in the control and signal regions are used to measure \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)), following the procedure described in Sect. 7.

Table 2 Definitions of the eight SRs and the two control regions for \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) (CRW) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}\) (CRZ)

5 Backgrounds

The main backgrounds to the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) process in the same-sign dilepton and three- (or more) lepton final states arise from rare multilepton processes, such as \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\), and \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) (\(\mathrm {H} \rightarrow \mathrm {WW}\)), and single-lepton or OS dilepton processes with an additional “nonprompt lepton”. Nonprompt leptons consist of electrons from conversions of photons in jets and leptons from the decays of heavy- or light-flavor hadrons. In this category we include also hadrons misidentified as leptons. The minor background from OS dilepton events with a charge-misidentified lepton is also taken into account.

Rare multilepton processes are estimated using simulated events. Control regions are used to constrain the normalization of the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}\) backgrounds, as described in Sect. 7, while for other processes the normalization is based on the NLO cross sections referenced in Sect. 2. Processes such as the associated production of a \(\mathrm {t}\overline{\mathrm {t}}\) pair with a pair of bosons (\(\mathrm {W}\), \(\mathrm {Z} \), \(\mathrm {H}\)) are grouped into a “\(\mathrm {t}\overline{\mathrm {t}} \mathrm {VV}\)” category. Associated photon production processes such as \(\mathrm {W}\mathrm {\gamma }\), \(\mathrm {Z} \mathrm {\gamma }\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {\gamma }\), and \(\mathrm {t} \mathrm {\gamma }\), where an electron is produced in an unidentified photon conversion, are grouped into a “X\(\mathrm {\gamma }\)” category. All residual processes with very small contributions, including diboson (\(\mathrm {W}\) \(\mathrm {Z}\), ZZ, \(\mathrm {W}^{\pm }\mathrm {W}^{\pm }\) from single- and double-parton scattering), triboson (\(\mathrm {W}\) \(\mathrm {W}\) \(\mathrm {W}\), \(\mathrm {W}\) \(\mathrm {W}\) \(\mathrm {Z}\), \(\mathrm {W}\mathrm {Z} \mathrm {Z} \), \(\mathrm {Z} \mathrm {Z} \mathrm {Z} \), \(\mathrm {W}\mathrm {W}\mathrm {\gamma }\), \(\mathrm {W}\mathrm {Z} \mathrm {\gamma }\)), and rare single top quark (\(\mathrm {t} \mathrm {Z} \mathrm {q} \), \(\mathrm {t} \mathrm {W}\mathrm {Z} \)) and triple top quark processes (\(\mathrm {t}\overline{\mathrm {t}} \mathrm {t} \) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t} \mathrm {W}\)), are grouped into a “Rare” category.

The nonprompt lepton and charge-misidentified lepton backgrounds are estimated following the methods described in Ref. [23]. For nonprompt leptons, an estimate referred to as the “tight-to-loose” method defines two control regions by modifying the lepton identification (including isolation) and event kinematic requirements, respectively. An “application region” is defined for every SR by requiring at least one lepton to fail the standard identification (“tight”) while satisfying a more relaxed one (“loose”). To obtain the nonprompt lepton background estimate in the corresponding SR, the event yield in each application region is weighted by a factor of \(\epsilon _\mathrm {TL} / (1-\epsilon _\mathrm {TL})\) for each lepton failing the tight requirement. The \(\epsilon _\mathrm {TL}\) parameter is the probability that a nonprompt lepton that satisfies a loose lepton selection also satisfies the tight selection. It is extracted as a function of lepton flavor and kinematic properties from a “measurement region” that consists of a single-lepton events with event kinematic properties designed to suppress the \(\mathrm {W}\rightarrow \ell \nu \) contribution.

For charge-misidentified leptons, an OS dilepton control region is defined for each same-sign dilepton signal region. Its yield is then weighted by the charge misidentification probability estimated in simulation, which ranges between \(10^{-5}\) and \(10^{-3}\) for electrons and is negligible for muons.

6 Systematic uncertainties

The sources of experimental and theoretical uncertainty for the data and simulations are summarized in Table 3. The uncertainty in the integrated luminosity is 2.5% [46]. The simulation is reweighted to match the distribution in the number of pileup collisions per event in data. The uncertainty in the inelastic cross section propagated to the final yields provides an uncertainty of at most 6%.

Trigger efficiencies are measured with an uncertainty of 2% in an independent data sample selected using single-lepton triggers. Lepton-efficiency scale factors, used to account for differences in the reconstruction and identification efficiencies between data and simulation, are measured using a “tag-and-probe” method in data enriched in \(\mathrm {Z} \rightarrow \ell \ell \) events [37, 38]. The scale factors are applied to all simulated processes with an uncertainty per lepton of approximately 3% for muons and 4% for electrons.

The uncertainty in the calibration of the jet energy scale depends on the \(p_{\mathrm {T}}\) and \(\eta \) of the jet and results in a 1–15% variation in the event yield in a given SR. The uncertainty due to the jet energy resolution is estimated by broadening the resolution in simulation [42], and the resulting effect is a change of 1–5% in the SR yields. The b tagging efficiency in simulation is corrected using scale factors determined from efficiencies measured in data and simulation [47]. The uncertainty in the measured scale factors results in an overall effect between 1 and 15%, again depending on the SR.

As mentioned in Sect. 2, \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) simulated events are reweighted to match the number of additional jets observed in data. The reweighting factors vary between 0.92 for \(N_{\text {jets}}^\mathrm {ISR}=1\) and 0.77 for \(N_{\text {jets}}^\mathrm {ISR} \ge 4\). Half of the difference from unity is taken as a systematic uncertainty in these reweighting factors to cover differences observed between data and simulation when the factors are used to reweight simulation in a control sample enriched in single-lepton \(\mathrm {t}\overline{\mathrm {t}}\) events. Uncertainties in the reweighting factors are treated as correlated among regions. Simulated \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) events with two b quarks not originating from top quark decay are also weighted to account for the CMS measurement of the ratio of cross sections \(\sigma ({\mathrm {t}\overline{\mathrm {t}} \mathrm {b} \overline{\mathrm {b}} })/\sigma ({\mathrm {t}\overline{\mathrm {t}} \mathrm {jj}})\), which was found to be a factor of \(1.7\pm 0.6\) larger than the MC prediction [33]. In signal regions requiring four b-tagged jets, where the effect is dominant, this results in a systematic uncertainty of up to 15% on the total background prediction. In signal regions requiring three b-tagged jets, the dominant origin of the additional b-tagged jet is a charm quark from a \(\mathrm {W}\) decay, so the effect is negligible.

Uncertainties in the renormalization and factorization scales (varied by a factor of two) and from the choice of PDF [48, 49] affect the number of events expected (normalization) in the simulated background processes, as well as the acceptance for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal. The effects of these uncertainties on the relative distribution of events in the signal regions (shape) are also considered. For the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) backgrounds, the normalization uncertainty is 40%, while for \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) a 50% normalization uncertainty reflects the signal strength of \(1.5 \pm 0.5\) measured by CMS [50]. The processes in the Rare category along with X\(\mathrm {\gamma }\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {VV}\), many of which have never been observed, are expected to give small contributions to the event yields in the signal regions. We assign separate 50% normalization uncertainties to each of these three categories. The shape uncertainty resulting from variations of the renormalization and factorization scales is as large as 15% for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\), \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\), and \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) backgrounds, and 10% for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal, while the effect from the PDF is only 1%. For the signal, the uncertainty in the acceptance from variations of the scales (PDFs) is 2% (1%). In addition, for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal, the scales that determine ISR and final-state radiation (FSR) in the parton shower are also varied, resulting in a 6% change in the acceptance and shape variations as large as 15%.

For nonprompt and charge-misidentified lepton backgrounds, the statistical uncertainty from the application region depends on the SR considered. The background from misidentified charge is assigned a systematic uncertainty of 20%, based on comparisons of the expected number of same-sign events estimated from an OS control sample and the observed same-sign yield in a control sample enriched in \(\mathrm {Z} \rightarrow \mathrm {e}^+\mathrm {e}^-\) events with one electron or positron having a misidentified charge.

In addition to the statistical uncertainty, the nonprompt lepton background is assigned an overall normalization uncertainty of 30% to cover variations observed in closure tests performed with simulated multijet and \(\mathrm {t}\overline{\mathrm {t}}\) events. This uncertainty is increased to 60% for electrons with \(p_{\mathrm {T}} > 50\,\text {GeV} \), to account for trends observed at high \(p_{\mathrm {T}}\) in the closure tests. We also include an uncertainty related to the subtraction of events with prompt leptons (from electroweak processes with a \(\mathrm {W}\) or \(\mathrm {Z}\) boson) in the measurement region, which has an effect between 1% and 50%, depending on the SR. The prompt lepton contamination was also checked in the application region, where it was found to be below 1%.

Experimental uncertainties are treated as correlated among signal regions for all signal and background processes. Systematic uncertainties in data-driven estimates and theoretical uncertainties are treated as uncorrelated between processes, but correlated among signal regions. Statistical uncertainties from the limited number of simulated events or in the number of events in data control regions are considered uncorrelated.

Table 3 Summary of the sources of uncertainty and their effect on signal and background yields. The first group lists experimental and theoretical uncertainties in simulated signal and background processes. The second group lists normalization uncertainties in the estimated backgrounds

7 Results and interpretation

The properties of events in the signal regions (SR 1–8 as defined in Table 2) are shown in Fig. 2, where distributions of the main kinematic variables in the data (\(N_\text {jets}\), \(N_\text {b}\), \(H_{\mathrm {T}}\), and \(p_{\mathrm {T}} ^\text {miss}\)) are compared to SM background predictions. The \(N_\text {jets}\) and \(N_\text {b}\) distributions for CRW and CRZ are shown in Fig. 3. In both figures we overlay the expected SM \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal, scaled by a factor of 5. The SM predictions are generally consistent with the observations, with some possible underestimation in CRW and CRZ.

The yields from SR 1–8, CRW, and CRZ are combined in a maximum-likelihood fit, following the procedures described in Ref. [51], to estimate a best-fit cross section for \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \), the significance of the observation relative to the background-only hypothesis, and the upper limit on \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)). The experimental and theoretical uncertainties described in Sect. 6 are incorporated in the likelihood as “nuisance” parameters and are profiled in the fit. Nuisance parameters corresponding to systematic uncertainties are parameterized as log-normal distributions. The fitted values of the nuisance parameters are found to be consistent with their initial values within uncertainties. The nuisance parameters corresponding to the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}/\gamma ^{*}\) normalizations are scaled by \(1.2\pm 0.3\) and \(1.3\pm 0.3\), respectively, while other background contributions including \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) are scaled up by 1.1 or less. The signal and control region results after the maximum-likelihood fit (post-fit) are shown in Fig. 4, with the fitted \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal contribution added to the background predictions, which are given in Table 4. The \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) cross section is measured to be \(16.9^{+13.8}_{-11.4}\) \(\,\text {fb}\), where the best-fit value of the parameter and an approximate 68% CL confidence interval are extracted following the procedure described in Sec. 3.2 of Ref. [52]. The observed and expected significances relative to the background-only hypothesis are found to be 1.6 and 1.0 standard deviations, respectively, where the expectation is based on the central value of the NLO SM cross section of \(9.2^{+2.9}_{-2.4}\) \(\,\text {fb}\) [17]. The observed 95% CL upper limit on the cross section, based on an asymptotic formulation [53] of the modified frequentist \(\hbox {CL}_\mathrm {s}\) criterion [54, 55], is found to be 41.7\(\,\text {fb}\). The corresponding expected upper limit, assuming no SM \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) contribution to the data, is \(20.8^{+11.2}_{-6.9}\) \(\,\text {fb}\), showing a significant improvement relative to the value of 27\(\,\text {fb}\) of Ref. [23].

The \(\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) process has contributions from diagrams with virtual Higgs bosons, as shown in Fig. 1. Experimental information on \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) can therefore be used to constrain the Yukawa coupling, \(y_{\mathrm {t}}\), between the top quark and the Higgs boson. We constrain \(y_{\mathrm {t}}\) assuming that the signal acceptance is not affected by the relative contribution of the virtual Higgs boson diagrams. As the cross section for the \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) background also depends on the top quark Yukawa coupling, for the purpose of constraining \(y_{\mathrm {t}}\) the fit described above is repeated with the \(\mathrm {t}\overline{\mathrm {t}} \text{ H }\) contribution scaled by the square of the absolute value of the ratio of the top quark Yukawa coupling to its SM value (\(|y_{\mathrm {t}}/y_{\mathrm {t}}^{\mathrm {SM}} |^2\)), where \(y_{\mathrm {t}}^{\mathrm {SM}}= m_{\mathrm {t}}( \sqrt{2} G_\mathrm {F})^{1/2} \approx 1\). This results in a dependence of the measured \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) on \(|y_{\mathrm {t}}/y_{\mathrm {t}}^{\mathrm {SM}} |\) which is shown in Fig. 5 and is compared to its theoretical prediction. The prediction is obtained from the LO calculation of Ref. [16], with an NLO/LO K-factor of 1.27 [18]. The LO calculation is used instead of the NLO one, as Ref. [16] provides a breakdown of the contributions to the cross section according to powers of \(y_{\mathrm {t}}\). The prediction also includes the uncertainty associated with varying the renormalization and factorization scales in the LO calculation by a factor of 2. The central, upper and lower values of the theoretical cross section provide respective 95% CL limits for \(|y_{\mathrm {t}}/y_{\mathrm {t}}^{\mathrm {SM}} | < 2.1\), <1.9 and <2.4.

Fig. 2
figure 2

Distributions in \(N_\text {jets}\) (upper left), \(N_\text {b}\) (upper right), \(H_{\mathrm {T}}\) (lower left), and \(p_{\mathrm {T}} ^\text {miss}\) (lower right) in the signal regions (SR 1–8), before fitting to data, where the last bins include the overflows. The hatched areas represent the total uncertainties in the SM background predictions, while the solid lines represent the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal, scaled up by a factor of 5, assuming the SM cross section from Ref. [17]. The upper panels show the ratios of the observed event yield to the total background prediction. Bins without a data point have no observed events

Fig. 3
figure 3

Distributions in \(N_\text {jets}\) and \(N_\text {b}\) in \(\mathrm {t}\overline{\mathrm {t}} \mathrm {W}\) (upper) and \(\mathrm {t}\overline{\mathrm {t}} \mathrm {Z}\) (lower) control regions, before fitting to data. The hatched area represents the uncertainty in the SM background prediction, while the solid line represents the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) signal, scaled up by a factor of 5, assuming the SM cross section from Ref. [17]. The upper panels show the ratios of the observed event yield to the total background prediction. Bins without a data point have no observed events

Fig. 4
figure 4

Observed yields in the control and signal regions (upper, in log scale), and signal regions only (lower, in linear scale), compared to the post-fit predictions for signal and background processes. The hatched areas represent the total uncertainties in the signal and background predictions. The upper panels show the ratios of the observed event yield and the total prediction of signal and background. Bins without a data point have no observed events

Table 4 The post-fit background, signal, and total yields with their total uncertainties and the observed number of events in the control and signal regions in data
Fig. 5
figure 5

The predicted SM value of \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) [16], calculated at LO with an NLO/LO K-factor of 1.27, as a function of \(|y_{\mathrm {t}}/y_{\mathrm {t}}^{\mathrm {SM}} |\) (dashed line), compared with the observed value of \(\sigma (\mathrm {p}\mathrm {p}\rightarrow \mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \)) (solid line), and with the observed 95% CL upper limit (hatched line)

8 Summary

The results of a search for standard model (SM) production of \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) at the LHC have been presented, using data from \(\sqrt{s} = 13\,\text {TeV} \) proton–proton collisions corresponding to an integrated luminosity of 35.9\(\,\text {fb}^{-1}\), collected with the CMS detector in 2016. The analysis strategy uses same-sign dilepton as well as three- (or more) lepton events, relying on jet multiplicity and jet flavor to define search regions that are used to probe the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) process. Combining these regions yields a significance of 1.6 standard deviations relative to the background-only hypothesis, and a measured value for the \(\mathrm {t}\overline{\mathrm {t}} \mathrm {t}\overline{\mathrm {t}} \) cross section of \(16.9^{+13.8}_{-11.4}\) \(\,\text {fb}\), in agreement with the standard model predictions. The results are also re-interpreted to constrain the ratio of the top quark Yukawa coupling to its SM value, yielding \(|y_{\mathrm {t}}/y_{\mathrm {t}}^{\mathrm {SM}} | < 2.1\) at 95% confidence level.