1 Introduction

Many types of beyond-the-standard-model (BSM) theories can produce multilepton events (three or more leptons) with a wide array of unique signatures [1,2,3,4,5], including a number of supersymmetric (SUSY) models [6,7,8,9,10,11,12,13,14,15]. In these models, multilepton final states can arise from the decay of multiple vector bosons, e.g., in \(\mathrm{t}\overline{\mathrm{t}}\) production with \(\mathrm{t} \rightarrow \mathrm{c} \mathrm{H} \) followed by \(\mathrm{H} \rightarrow \mathrm{W}\mathrm{W}^*\) or \(\mathrm{H} \rightarrow \mathrm{Z}\mathrm{Z}^*\), or in strong production of pairs of squarks or gluinos, which often initiate complex decay chains that can result in multiple \(\mathrm{W}\) and/or \(\mathrm{Z}\) bosons. The standard model (SM) processes that produce this final state are also characterized by multiple bosons and are well-understood both theoretically [16,17,18,19,20,21,22,23,24,25,26,27,28,29,30] and experimentally [31,32,33,34,35].

This paper describes a search for new physics in final states with three or more leptons, electrons or muons, produced at the CERN LHC, in proton–proton (\(\mathrm{p}\mathrm{p}\)) collisions at a center-of-mass energy of 13\(\,\text {TeV}\), with the CMS detector. The data correspond to an integrated luminosity of \(2.3\,\text {fb}^{-1} \) collected in 2015. The expected irreducible backgrounds come from diboson production (\(\mathrm{W}\mathrm{Z}\) and \(\mathrm{Z}\) \(\mathrm{Z}\)) or other SM processes, including \(\mathrm{t}\overline{\mathrm{t}}\)W, \(\mathrm{t}\overline{\mathrm{t}}\) \(\mathrm{Z}\), and \(\mathrm{t}\overline{\mathrm{t}}\) \(\mathrm{H}\). These backgrounds are modeled using Monte Carlo (MC) simulations that have appropriate corrections applied to match the behavior of reconstructed objects in data. Reducible backgrounds are processes that produce one or more misidentified or nonprompt leptons, i.e. those that arise from jets or meson decays, that pass all reconstruction, identification, and isolation criteria. Estimates of the probabilities of observing misidentified or nonprompt leptons based on control samples in data are used.

As an example of the type of BSM models for which this search has sensitivity, we interpret the results of this analysis in the context of SUSY models that feature strong production of pairs of squarks (\(\widetilde{\mathrm{q}}\)) or gluinos (\(\widetilde{\mathrm{g}}\)). In addition to multiple leptons, these models predict that events can contain multiple jets, b-tagged jets, and missing transverse momentum. Searches probing similar models have been carried out by the ATLAS and CMS Collaborations using pp collisions at 8\(\,\text {TeV}\)  [36,37,38,39,40,41,42,43,44], and at 13\(\,\text {TeV}\)  [45,46,47,48,49,50,51,52]. Previous searches exclude models with gluino mass less than approximately 1500\(\,\text {GeV}\), for a neutralino mass of 50\(\,\text {GeV}\), and models with bottom squark mass less than 830 \(\,\text {GeV}\).

The result of the search, which is consistent with SM expectation, can also be used to constrain other BSM models not explicitly considered in this paper. To this end, we also provide upper limits on possible BSM contributions in the kinematic tail of the search variables in terms of the product of cross section, detector acceptance, and selection efficiency.

2 The CMS detector

The CMS detector features a superconducting solenoid of 6 m internal diameter that creates a magnetic field of 3.8 T. Inside the magnet volume are a silicon pixel and strip tracker, an electromagnetic calorimeter (ECAL) made of lead tungstate crystals, and a hadron calorimeter (HCAL) made of brass and scintillator, each composed of a barrel and two endcap sections. Forward calorimeters provide additional pseudorapidity (\(\eta \)) coverage for the HCAL. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. The first level of the CMS trigger system, composed of specialized hardware processors, uses information from the calorimeters and muon detectors to select the most interesting events in a fixed time interval of less than 4 \(\upmu \)s. The high-level trigger (HLT) processor farm further decreases the event rate from approximately 100 kHz to less than 1 kHz, before data storage. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [53].

3 Event selection and Monte Carlo simulation

Events used in this analysis are selected by the triggers that collect dilepton and multilepton events for later study, using variables constructed by the HLT. One set of triggers requires two leptons satisfying loose isolation criteria and transverse momentum \(p_{\mathrm{T}} > 17\,\text {GeV} \) for the leading lepton and \(p_{\mathrm{T}} > 12\,(8)\,\text {GeV} \) for the subleading lepton in the case of electrons (muons). The second set of triggers places no requirements on the isolation, has a lower \(p_{\mathrm{T}}\) threshold for the two leptons, \(p_{\mathrm{T}} > 8 \,\text {GeV} \), and also requires that the scalar sum of jets with \(p_{\mathrm{T}} > 40\) \(\,\text {GeV}\) reconstructed in the HLT be greater than 300\(\,\text {GeV}\).

Electron candidates are reconstructed using tracking and electromagnetic calorimeter information by combining Gaussian sum filter tracks and ECAL energy deposits [54]. The electron identification is performed using a multivariate discriminant built with shower shape, track cluster matching, and track quality variables. The working point for the selection is chosen to maintain approximately 90% efficiency for accepting electrons produced in the decays of W and Z bosons and also to efficiently reject candidates originating from jets. To reject electrons originating from photon conversions, electrons are required to have hits in all possible inner layers of the tracker and to be incompatible with any secondary vertices containing only another electron. The selected electron candidates must have \(|{\eta }| < 2.5\).

Muon candidates are reconstructed in a global fit to the combined information from both the silicon tracker and the muon spectrometer [55]. An identification is performed using the quality of the geometrical matching between measurements in the tracker and the muon system. To ensure the candidates are within the fiducial volume of the detector, we require that the candidate pseudorapidities satisfy \(|{\eta }| < 2.4\).

The reconstructed vertex with the largest value of summed physics-object \(p_{\mathrm{T}} ^2\) is taken to be the primary \(\mathrm{p}\mathrm{p}\) interaction vertex. The physics objects are the objects returned by a jet finding algorithm [56, 57] applied to all charged tracks associated with the vertex, plus the corresponding associated missing transverse momentum. Both electron and muon candidates are required to have a transverse (longitudinal) impact parameter of less than 0.5 (1.0) mm from the primary vertex. In addition, a requirement on the three-dimensional impact parameter significance is applied. This variable is the value of the impact parameter divided by its uncertainty and is required to be less than 4 for both electrons and muons. The rejection of nonprompt leptons is more efficient using the impact parameter significance than the value of impact parameter for similar prompt-lepton acceptance.

Lepton isolation is constructed using three different variables. The mini isolation, \(I_\text {mini} \), is the ratio of the amount of measured energy in a cone to the transverse momentum of the lepton. The radius is \(p_{\mathrm{T}}\)-dependent: \(R_{\text {iso}} = 10 \,\text {GeV}/\text {min}(\text {max}(p_{\mathrm{T}} (\ell ), 50\,\text {GeV}), 200\,\text {GeV})\), resulting in radii between 0.05 and 0.2. Requiring \(I_\text {mini} \) to be below a given threshold ensures that the lepton is locally isolated, even in Lorentz-boosted topologies.

The second variable is the ratio of the lepton \(p_{\mathrm{T}}\) and the \(p_{\mathrm{T}}\) of the jet matched to the lepton: \(p_{\mathrm{T}} ^\text {ratio} =p_{\mathrm{T}} (\ell ) / p_{\mathrm{T}} (\text {jet})\). This jet must be separated by no more than 0.4 in \(\Delta R\) from the lepton it is matched to, where \(\Delta R = \sqrt{\smash [b]{\Delta \phi ^2 + \Delta \eta ^2}}\). In most cases, this is the jet containing the lepton. If no jet is found within \(\Delta R < 0.4\), then \(p_{\mathrm{T}} ^\text {ratio} = 1\). The use of \(p_{\mathrm{T}} ^\text {ratio}\) is a simple way to identify nonprompt low-\(p_{\mathrm{T}}\) leptons originating from low-\(p_{\mathrm{T}}\) b-quarks that decay with larger opening angles than the one used in the mini isolation.

The last variable is \(p_{\mathrm{T}} ^\text {rel}\), which is calculated by subtracting the lepton momentum from the momentum vector of the geometrically matched jet described above and then finding the component of the lepton momentum that is transverse to this new vector. If there is no matched jet, \(p_{\mathrm{T}} ^\text {rel} = 0\). This variable allows us to recover leptons from accidental overlap with jets in events where some of the final state particles are close together in Lorentz-boosted topologies.

Using the three variables above, a lepton is considered isolated if \(I_\text {mini} < I_1\) and that either \(p_{\mathrm{T}} ^\text {ratio} > I_2\) or \(p_{\mathrm{T}} ^\text {rel} > I_3\). The \(I_i\) values depend on the flavor of the lepton. The probability to misidentify a jet is higher for electrons, so tighter isolation values are used. The logic behind this isolation is that a lepton should be locally isolated (\(I_\text {mini} \)) and should carry the major part of the energy of the corresponding jet (\(p_{\mathrm{T}} ^\text {ratio}\)) unless its overlap with the jet is accidental (\(p_{\mathrm{T}} ^\text {rel}\)). For electrons (muons), the tight selection requirements are \(I_1 = 0.12\,(0.16)\), \(I_2 = 0.76\,(0.69)\), and \(I_3 = 7.2\,(6.0)\) \(\,\text {GeV}\). The loose lepton isolation is relaxed to \(I_\text {mini} < 0.4\), and the other requirements are dropped. The loose leptons are used for background estimates. These selection requirements were optimized using MC simulations.

The offline selection requires at least three well-identified leptons in the event and any pair of opposite sign and same flavor (OSSF) leptons having an invariant mass greater than 12\(\,\text {GeV}\) to reject low mass Drell–Yan and quarkonium processes. The leptons must pass offline \(p_{\mathrm{T}}\) thresholds of 20, 15, and 10\(\,\text {GeV}\) for the first, second, and third lepton, respectively, when \(p_{\mathrm{T}}\)-ordered. For this offline selection, the trigger efficiency is above 99%.

Jets are reconstructed from particle-flow candidates [58] clustered using the anti-\(k_{\mathrm{T}}\) algorithm [56] with a distance parameter of 0.4 as implemented in the FastJet package [57]. Only jets with \(p_{\mathrm{T}} >30\,\text {GeV} \) and within the tracker acceptance \(|{\eta }|<2.4\) are considered. Additional criteria are applied to reject events containing noise and mismeasured jets [59,60,61]. To avoid double counting, the closest matching jets to leptons are not considered if they are separated from the lepton by less than 0.4 in \(\Delta R\). From those selected jets, the quantity \(H_{\mathrm{T}}\) is defined by \( H_{\mathrm{T}} = \sum _\text {jets} |\mathbf {p_{\mathrm{T}}}|\), for all jets that satisfy the above-mentioned criteria. Jet energies are corrected for a shift in the energy scale, contributions from additional, simultaneous \(\mathrm{p}\mathrm{p}\) collisions (pileup), and residual nonuniformity and nonlinearity differences between data and simulation [60].

The combined secondary vertex algorithm [62, 63] is used to assess the likelihood that a jet originates from a bottom quark (b jet). Jets in this analysis are considered to be b tagged if they pass the algorithm’s medium working point, which has a tagging efficiency of approximately 70% and a mistag rate of approximately 1% for light quarks and gluons.

The missing transverse momentum \({\mathbf {p}}_{\mathrm{T}}^{\text {miss}}\) is defined as the negative vector sum of transverse momenta of all particle-flow candidates reconstructed in the event. Its magnitude is referred to as \(p_{\mathrm{T}} ^\text {miss}\). Jet energy corrections are propagated to the \(p_{\mathrm{T}} ^\text {miss}\) following the procedure described in Ref. [64].

To estimate the contribution of SM processes to the signal regions (described in Sect. 4) and to calculate the efficiency for new physics models, MC simulations are used. All the SM samples are generated using the MadGraph 5_amc@nlo 2.2.2 [65,66,67] program at leading order (LO) or next-to-leading order (NLO) in perturbative QCD, with the exception of the diboson production samples (\(\mathrm{W}\mathrm{Z}\) and \(\mathrm{Z}\) \(\mathrm{Z}\)) that are generated using powheg v2 [68,69,70,71,72] at NLO precision. The NNPDF3.0 [73] LO (NLO) parton distribution function (PDF) set is used in MC simulations generated at LO (NLO). Parton showering and hadronization are simulated using pythia 8.205 [74] with the underlying event tune CUETP8M1 [75]. The CMS detector response is determined using a Geant4-based model [76].

Events corresponding to the production of SUSY processes are generated with MadGraph 5_amc@nlo at LO precision, allowing up to two additional partons in the matrix element calculations. The SUSY particle decays, parton showering, and hadronization are simulated with pythia. The detector response for signal events is simulated using a CMS fast-simulation package [77] that is validated against the Geant4-based model. Cross sections for SUSY signal processes, calculated at NLO with next-to-leading-log (NLL) gluon resummation, are taken from the LHC SUSY Cross Section Working Group [78,79,80,81,82,83]. All simulated events are processed with the same reconstruction procedure as data. They include the effects of additional interactions, which can occur in the same or adjacent beam crossings (pileup). The distribution of additional interactions is matched to that observed in data. The pileup interactions are simulated by overlaying the primary interaction with additional minimum bias events, which are generated with the same pythia configuration as described above.

4 Search strategy

The goal of this analysis is to search for possible excesses over the expected yields from SM processes in different categories of events with three or more leptons. With the 2.3\(\,\text {fb}^{-1}\) data sample at \(\sqrt{s} = 13\,\text {TeV} \), the search is focused on strongly produced SUSY particles, which benefit most from the increase of the production cross section with respect to 8 TeV. A few examples of diagrams of simplified models of SUSY processes [84, 85] that can give rise to multilepton final states are shown in Fig. 1. In these models, SUSY particles that are not directly included in the diagrams are assumed to be too heavy to be accessible at the LHC. Therefore, the free parameters in these models are usually the mass of the produced particles: gluinos and squarks, as well as the mass of the lightest supersymmetric particle (LSP).

Typical SUSY processes relevant for this work include T1tttt, which corresponds to gluino pair production where each gluino decays to a \(\mathrm{t}\overline{\mathrm{t}} \) pair and the LSP (Fig. 1—top). Another model, referred to as T5qqqqWZ, involves gluino pair production, where each gluino decays to a pair of light quarks (u, d, s, and c) and a neutralino (\(\widetilde{\chi }^0 _2\)) or chargino (\(\widetilde{\chi }^\pm _1\)), followed by decay of the neutralino or the chargino to a W or Z boson, respectively, and the LSP (Fig. 1—middle). The probability for the decay to proceed via \(\widetilde{\chi }^+ _1\), \(\widetilde{\chi }^- _1\), or \(\widetilde{\chi }^0 _2\) is 1/3 for each case, leading to the probabilities of having \(\mathrm{W}\) \(\mathrm{W}\), \(\mathrm{Z}\) \(\mathrm{Z}\) or \(\mathrm{W}\) \(\mathrm{Z}\) bosons in the final state to be about 44.5, 11.1, and 44.5%, respectively. Only the final state with \(\mathrm{W}\) \(\mathrm{Z}\) bosons contributes significantly to the acceptance of this search. Final states with \(\mathrm{W}\) \(\mathrm{W}\) bosons do not contribute, and the contribution from \(\mathrm{Z}\) \(\mathrm{Z}\) final states decaying to four leptons is negligibly small. In this scenario the neutralino and chargino are assumed to be mass-degenerate. A model called T6ttWW, features bottom squark pair production with their subsequent cascade decays via top quarks and W bosons (Fig. 1—bottom). The LSP is a neutralino in all of these models.

Fig. 1
figure 1

Diagrams for gluino and bottom squark pair production leading to multilepton events for simplified models of supersymmetry: (top) T1tttt, (middle) T5qqqqWZ, and (bottom) T6ttWW

For the definition of the signal regions (SRs) we use several event variables: the number of b-tagged jets (\(N_{\mathrm{b}} \)), \(H_{\mathrm{T}} \), \(p_{\mathrm{T}} ^\text {miss} \), and a classification depending on whether the event contains any OSSF dilepton pairs with an invariant mass between 76 and 106 \(\,\text {GeV}\), i.e. consistent with the Z boson (called “on-Z” if so and “off-Z” otherwise in the following). Events that do not contain any OSSF pairs are included in the off-Z sample.

The separation in b-tagged jet multiplicities maximizes signal-to-background ratios for different signal models. For example, the T1tttt model features several b jets, which would be categorized into SRs which are almost free of WZ background owing to the b-tagged jet requirement. Including the zero b-tagged SRs keeps the analysis sensitive to signatures such as the T5qqqqWZ model. Additionally, a categorization in \(H_{\mathrm{T}} \) and \(p_{\mathrm{T}} ^\text {miss} \) is useful to distinguish between compressed and noncompressed SUSY spectra, i.e. models with small or large mass differences between the SUSY particles in the decay chain.

A baseline selection is applied to the data set to select events of interest: three or more electrons or muons satisfying the requirements \(p_{\mathrm{T}} \ge 20\), 15, and 10\(\,\text {GeV}\); \(m_{\ell \ell } \ge 12\,\text {GeV} \); at least two jets; \(H_{\mathrm{T}} \ge 60\,\text {GeV} \); and \(p_{\mathrm{T}} ^\text {miss} \ge 50\,\text {GeV} \). Events containing additional leptons with \(p_{\mathrm{T}} > 10 \,\text {GeV} \) are included in the event selection. Table 1 shows the definition of the subdivision of the baseline selection into two sets of SRs for events that contain on-Z and off-Z dilepton pairs. There are 15 SRs for each of the two groups, hence in total 30 SRs. A set of four SRs with low or medium \(H_{\mathrm{T}} \) and low or medium \(p_{\mathrm{T}} ^\text {miss} \) are defined for each of the b-tagged jet multiplicities 0, 1, and 2. Motivated by the low expected yield of events with \(N_{\mathrm{b}} \ge 3\), SR 13 is defined for high b-tagged jet multiplicities and also has \(p_{\mathrm{T}} ^\text {miss} <300\,\text {GeV} \) and \(H_{\mathrm{T}} <600\,\text {GeV} \). Two additional SRs with large \(H_{\mathrm{T}} \) (SR 14) and large \(p_{\mathrm{T}} ^\text {miss} \) (SR 15), respectively, have been defined as nearly background-free SRs, since noncompressed SUSY models can yield events with very large values of \(p_{\mathrm{T}} ^\text {miss} \) or \(H_{\mathrm{T}} \). Both of these SRs are inclusive in the number of b-tagged jets, and every selected event with \(p_{\mathrm{T}} ^\text {miss} \ge 300\,\text {GeV} \) is categorized in SR 15, while SR 14 is populated with events with \(p_{\mathrm{T}} ^\text {miss} < 300\,\text {GeV} \) and \(H_{\mathrm{T}} \ge 600\,\text {GeV} \).

Table 1 Definition of multilepton signal regions. These regions are the same for the on-Z and off-Z regions

5 Background estimation

Backgrounds in the multilepton final states can be divided in three categories:

  1. 1.

    Nonprompt or misidentified leptons are those arising from heavy-flavor decays, misidentified hadrons, electrons from unidentified photon conversions, or muons from light-meson decays in flight. For this analysis, \(\mathrm{t}\overline{\mathrm{t}} \) events can enter the SRs if nonprompt leptons are present in addition to the prompt leptons from the W boson decays. These nonprompt leptons typically originate from semileptonic decays of hadrons containing a b quark, which, in this case, is not reconstructed as a jet. Therefore, \(\mathrm{t}\overline{\mathrm{t}} \) events typically have low \(H_{\mathrm{T}} \) and \(p_{\mathrm{T}} ^\text {miss} \) and predominately populate SR 1 and SR 5, with 0 and 1 b-tagged jets, respectively. In addition to \(\mathrm{t}\overline{\mathrm{t}}\), Drell–Yan events can enter the baseline selection, although they are largely suppressed by the \(p_{\mathrm{T}} ^\text {miss} > 50\) \(\,\text {GeV}\) requirement. Processes that yield only one prompt lepton, e.g. W+jets and single top quark production, are effectively suppressed by the three-lepton requirement because of the low probability that the two nonprompt leptons pass the tight identification and isolation requirements.

  1. 2.

    Diboson production could yield multilepton final states with up to three prompt leptons in WZ production and up to four prompt leptons in ZZ production. Especially in signal regions without b-tagged jets, WZ production has a sizable contribution. The normalization of this background is obtained from a dedicated control region enriched in WZ events.

  2. 3.

    Other SM processes that can yield three or more leptons are \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\), \(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\), and triboson production VVV where V stands for a W or \(\mathrm{Z}\) boson. We also include the contribution from the SM Higgs boson produced in association with a vector boson or a pair of top quarks in this category of backgrounds. Processes that produce additional leptons from internal conversions, which are events that contain a virtual photon that decays to leptons, are also included here as \(\mathrm{X}+\gamma \), where X is predominantly \(\mathrm{t}\overline{\mathrm{t}} \) or \(\mathrm{Z}\). Those backgrounds are obtained from simulation and appropriate systematic uncertainties are assigned.

The background contribution from nonprompt and misidentified leptons is estimated using the “tight-to-loose ratio” method [52]. The tight-to-loose ratio f is the probability for a nonprompt lepton that satisfies the loose requirements to also satisfy the full set of requirements. The nonprompt background contribution is obtained from the number of events in an application region containing events with at least one of the leptons failing the full set of tight identification and isolation requirements, but passing the loose requirements, weighted by \(f / (1-f)\). This ratio is measured in a control sample of QCD multijet events that is enriched in nonprompt leptons (measurement region), by requiring exactly one lepton passing the loose object selection and one recoiling jet with \(\Delta R(\text {jet},\ell )>1.0\). To suppress events with leptons from W and Z boson decays, \(p_{\mathrm{T}} ^\text {miss} < 20 \,\text {GeV} \) and \(M_\mathrm{T} <20 \,\text {GeV} \) are also required, where \(M_\mathrm{T} = \sqrt{\smash [b]{2p_{\mathrm{T}} ^\text {miss} p_{\mathrm{T}} (\ell ) (1-\cos {\Delta \phi })}}\) and \(\Delta \phi \) is the difference in azimuthal angle between the lepton and \({\mathbf {p}}_{\mathrm{T}}^{\text {miss}}\). The remaining contribution from these electroweak processes within the measurement region is subtracted using estimates from MC simulations.

The dependence of the tight-to-loose ratio on the flavor of the jet from which the nonprompt lepton originates is reduced by parameterizing the ratio as a function of a variable that is more strongly correlated with the parent parton \(p_{\mathrm{T}}\) than with lepton \(p_{\mathrm{T}}\). This variable is calculated by correcting the lepton \(p_{\mathrm{T}}\) as a function of the energy in the isolation cone around it. This definition leaves the \(p_{\mathrm{T}}\) of the leptons passing the isolation requirement unchanged and modifies the \(p_{\mathrm{T}}\) of those failing the requirement, so that it is a better proxy for the parent parton \(p_{\mathrm{T}}\) and results in a flatter tight-to-loose ratio as a function of the parent parton \(p_{\mathrm{T}}\). The cone correction significantly improves the results of the method when applying it in simulation. The flavor dependence, which is much more important for the case of electrons, is also reduced by adjusting the loose object selection to obtain similar ratios for nonprompt electrons that originate from both light- and heavy-flavor jets. To avoid experimental biases, the tight-to-loose ratio is also measured as a function of \(\eta \).

The tight-to-loose ratio method of estimating the nonprompt background is validated in a control region exclusive to our baseline selection with minimal signal contamination. This region is defined by having three tight leptons, one or two jets, \(20< p_{\mathrm{T}} ^\text {miss} < 50\) \(\,\text {GeV}\), and an off-Z dilepton pair. We find agreement of the order of 20% between the predicted and observed yields in this control region in data, which validates the predictions and uncertainties of this method.

Table 2 Summary of the sources of uncertainties and their magnitudes. The third column provides the changes in yields of signal and background induced by one s.d. changes in the magnitude of uncertainties

The WZ process is one of the main backgrounds in the regions with zero b-tagged jets. The relative contribution of this process in various SRs is estimated from the MC simulation at NLO, but the normalization is taken from a control region that is highly enriched for this process: three leptons pass nominal identification and isolation requirements, two leptons form an OSSF pair with \(|{m_{\ell \ell } - m_\mathrm{Z}}| < 15\) \(\,\text {GeV}\), the number of jets is zero or one, the number of b-tagged jets is zero, \(30< p_{\mathrm{T}} ^\text {miss} < 100 \,\text {GeV} \), and the \(M_\mathrm{T} \) of the third lepton (not in the pair forming the \(\mathrm{Z}\)) is required to be at least 50\(\,\text {GeV}\) to suppress contamination from Drell–Yan processes. The expected WZ purity in the selected sample is 84%. Using this control region, we find that the WZ background predictions from simulation are consistent with data. The ratio between the prediction and data obtained with 2.3\(\,\text {fb}^{-1}\) of data is \(1.13 \pm 0.17\). The uncertainty on the normalization of the WZ background includes the statistical uncertainty related to the event yield in the CR and a systematic component related to a small contamination of the CR due to other processes.

6 Systematic uncertainties

Systematic uncertainties are characterized as either experimental, theoretical, or arising from the limited size of simulated event samples. These sources of uncertainties and their magnitudes are described below, and are summarized in Table 2. The table also provides the effect of varying the uncertainties by \(\pm 1\) standard deviation (s.d.) on the signal and background yields. The jet energy scale uncertainty and the uncertainty in the b tagging efficiency are the only ones that can cause simulated events to migrate between signal regions.

The major experimental source of uncertainty is the knowledge of the jet energy scale (JES), which accounts for differences between kinematical variables from data and simulation and affects signal and background events that are taken from simulation samples [60, 61]. For the data set used in this analysis, the uncertainties on the JES vary from 2 to 8%, depending on the \(p_{\mathrm{T}}\) and \(\eta \) of the jet. The impact of these uncertainties is assessed by shifting the jet energy correction factors for each jet up and down by ±1 s.d. and recalculating all of the kinematic quantities. The JES uncertainties are propagated to the missing transverse momentum and all variables derived from jets (numbers of jets and b-tagged jets, and \(H_{\mathrm{T}}\)) used in this analysis; this propagation results in 1–20% variation in the MC background estimation in the regions with higher data yields.

A similar approach is used for the uncertainties associated with the corrections for the \(\mathrm{b} \) tagging efficiencies for light-, charm-, and bottom-flavour jets, which are parametrized as a function of \(p_{\mathrm{T}} \) and \(\eta \) [62, 63]. The variation of the scale factor correcting for the differences between data and simulation is at maximum 5–10%, and leads to an effect of 1–20% on the yields, depending on the SR and on the topology of the events under study. If one considers only highly populated SRs to get an overview of the main effects on the background yields, the bulk of the \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\) yield varies by \(\sim \)10% and the \(\mathrm{W}\mathrm{Z}\) yield by \(\sim \)13%.

Lepton identification scale factors have been measured by comparing efficiencies in data and simulation using the “tag-and-probe” method [54, 55] and are applied as a function of lepton \(p_{\mathrm{T}} \) and \(\eta \). The corresponding uncertainties on the scale factors have been evaluated and are approximately 2% for both electrons and muons. Trigger efficiency scale factors have been found to be very close to unity. An uncertainty of 3% in the scale factors has, however, been assigned to cover the difference between trigger efficiencies measured in simulation over a large number of samples.

All these uncertainties related to corrections of the simulation (JES corrections, b tagging efficiency scale factors, lepton identification and trigger scale factors) have been estimated also for the fast simulation used for the signal samples. We propagate them to the expected signal yields following the same procedure.

The uncertainties in the renormalization (\(\mu _{\text {R}}\)) and factorization scales (\(\mu _\mathrm {F}\)) and the PDF are considered for some of the rare processes, namely \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\), \(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\), and \(\mathrm{t}\overline{\mathrm{t}} \mathrm{H} \). Both the changes in the acceptance and cross sections due to those effects are taken into account.

For the study of the renormalization and factorization scale uncertainties, variations up and down by a factor of two with respect to the nominal values of \(\mu _{\text {R}}\) and \(\mu _\mathrm {F}\) are considered. The maximum difference in the yields with respect to the nominal case is observed when both scales are varied simultaneously up and down. The effect on the overall cross section is found to be about \(13\%\) for \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\) and about \(11\%\) for \(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\). An additional uncertainty in the acceptance corresponding to different signal regions is included. This is found to be between 3 and 18% depending on the SR and process.

Table 3 Off-Z SRs: Comparison of observed event yields in data with predicted background yields
Table 4 On-Z SRs: Comparison of observed event yields in data with predicted background yields

The uncertainty related to the PDFs is estimated from the 100 NNPDF 3.0 replicas by computing the deviation with respect to the nominal yields for each of them, and for each signal region (the cross section and acceptance effects are considered together) [86]. The root mean square of the variations is taken as the value of the systematic uncertainty. Since no significant variations among the different signal regions are seen, a flat uncertainty of \(3 (2) \%\) is applied to the \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\) (\(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\)) background. This value also includes the deviation resulting from varying the strong coupling strength \(\alpha _{S}(M_{\mathrm{Z}})\), which is added in quadrature, and whose magnitude is similar to or smaller than that of the PDF set uncertainty. For the \(\mathrm{t}\overline{\mathrm{t}} \mathrm{H} \) process, the same uncertainties as estimated for \(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\) are applied. A theoretical uncertainty of 50% is assigned to the remaining rare processes.

In signal samples, the uncertainty due to initial-state radiation is computed as a function of the \(p_{\mathrm{T}} \) of the gluino pair using the methods described in Ref. [87]. For values below 400\(\,\text {GeV}\), no uncertainty is applied. For values between 400 and 600\(\,\text {GeV}\), a 15% uncertainty is assigned, and above 600\(\,\text {GeV}\) this uncertainty is increased to 30%.

The limited size of the generated MC samples represents an additional source of uncertainty. The uncertainty in signal processes and backgrounds such as \(\mathrm{t}\overline{\mathrm{t}} \mathrm{W}\), \(\mathrm{t}\overline{\mathrm{t}} \mathrm{Z}\), and \(\mathrm{t}\overline{\mathrm{t}} \mathrm{H} \), is calculated from the number of MC events entering each of the signal regions.

For the nonprompt and misidentified lepton background, we assign several systematic uncertainties. The statistical uncertainty resulting from the limited number of events in the application region used to estimate this background contribution varies from 1 to 100%. The regions where these uncertainties are large are generally regions where the overall contribution of this background is small. When no events are observed in the application region, the upper limit of the background expectation is set to 0.35, which is found by applying the most probable tight-to-loose ratio as if the application region contained an event count equal to the variance of a Poisson distribution with a mean of zero.

The systematic uncertainties related to the extrapolation from the control regions to the SRs for the nonprompt lepton background are estimated to be 30%. This magnitude has been extracted from the level of closure achieved in a test that was performed with MC samples yielding nonprompt leptons to validate background predictions based on control samples in data, as described in Sect. 5.

The uncertainty associated with the electroweak (EW) background subtraction in the tight-to-loose ratio computation is propagated through the full analysis process by replacing the nominal tight-to-loose ratio with another value obtained when the scale factor applied to the electroweak processes in the measurement region is varied by 100% of its difference from unity. The overall effect on the nonprompt background yield lies between 1 and 5% depending on the SR considered.

The estimate of the \(\mathrm{W}\mathrm{Z}\) background is assigned a 15% normalization uncertainty using the measurement in a dedicated control region. This uncertainty is compatible with the one quoted for the experimental measurement of this process in Ref. [33]. Additional uncertainties for the extrapolation from the control region to the signal regions of 10– 30% are taken into account depending on the SR. These uncertainties are dominated by the JES and b tagging uncertainties described earlier.

Finally the uncertainty on the integrated luminosity is 2.7% [88].

7 Results and interpretations

Expected event yields are compared to the observation in Tables 3 and 4. Comparisons of distributions of \(H_{\mathrm{T}} \), \(p_{\mathrm{T}} ^\text {miss} \), \(N_\mathrm{j} \), \(N_{\mathrm{b}} \), leading lepton \(p_{\mathrm{T}} \), subleading lepton \(p_{\mathrm{T}} \), and trailing lepton \(p_{\mathrm{T}} \) measured in data with those predicted by the background estimation methods are shown in Fig. 2 (Fig. 3), using all the events satisfying the off-Z (on-Z) SR selection criteria. The nonprompt lepton background comes from the technique described in Sect. 5. The hatched band represents the total background uncertainty in each bin. A graphical summary of predicted backgrounds and observed event yields in individual SRs is also shown. In these figures, the “rare” component is the sum over several SM processes, such as triboson production, associated Higgs production, \(\mathrm{t}\overline{\mathrm{t}} \mathrm{t}\overline{\mathrm{t}} \), and other lower cross section processes.

The number of events observed in data is found to be consistent with predicted SM background yields. The results are used to calculate upper limits on the production cross section of gluinos or squarks for the various models discussed in Sect. 4, as a function of the gluino or squark, and the chargino or neutralino masses. For each mass hypothesis, the observation, background predictions, and expected signal yields from all on-Z and off-Z SRs are combined to extract an upper limit on the cross section, at 95% confidence level (CL) using the asymptotic formulation of the LHC-style CL\(_{\text {s}}\) method [89,90,91,92]. Log-normal nuisance parameters are used to describe the systematic uncertainties listed in Sect. 6.

These upper limits are used to calculate exclusion contours on the concerned sparticles mass plane, shown in Fig. 4 for the simplified models under consideration. In these figures, the thick black lines delineate the observed exclusion region, which is at the lower masses side. The uncertainty in the observed limit, represented by the thinner black lines, is the propagation of the NLO \(+\) NLL cross section uncertainties for the relevant signal process [78,79,80,81]. The red dashed lines represent the expected limits with the uncertainties reflecting those discussed in Sect. 6.

Fig. 2
figure 2

Off-Z samples: from left to right, top to bottom, distributions of \(H_{\mathrm{T}} \), \(p_{\mathrm{T}} ^\text {miss} \), \(N_\mathrm{j} \), \(N_{\mathrm{b}} \), \(p_{\mathrm{T}} \) of leptons for the predicted backgrounds and for the data in the off-Z baseline selection region, in these plots the rightmost bin contains the overflow from counts outside the range of the plot. On the bottom-right corner the total predicted background and the number of events observed in the 15 off-Z SRs is shown

Fig. 3
figure 3

On-Z samples: from left to right, top to bottom, distributions of \(H_{\mathrm{T}} \), \(p_{\mathrm{T}} ^\text {miss} \), \(N_\mathrm{j} \), \(N_{\mathrm{b}} \), \(p_{\mathrm{T}} \) of leptons for the predicted backgrounds and for the data in the on-Z baseline selection region, in these plots the rightmost bin contains the overflow from counts outside the range of the plot. On the bottom-right corner the total predicted background and the number of events observed in the 15 on-Z SRs is shown

Fig. 4
figure 4

Exclusion contours as a function of \(m_{\widetilde{\mathrm{g}}}\) or \(m_{\widetilde{\mathrm{b}}}\), and \(m_{\widetilde{\chi }^{0}}\) or \(m_{\widetilde{\chi }^\pm }\), for the simplified SUSY models (top-left) T1tttt, (top-right) T6ttWW, and (bottom) T5qqqqWZ. The color scale indicates the 95% CL observed upper limits on the cross section. The observed (expected) exclusion curves are indicated by the solid (dashed) lines using NLO+NLL production cross sections, along with the corresponding ±1 s.d. theoretical (experimental) uncertainties

The yields and background predictions can be used to test additional BSM physics scenarios. To facilitate such reinterpretations, we provide limits on the number of multilepton events as a function of the \(p_{\mathrm{T}} ^\text {miss} \) threshold in the kinematic tails of this search. These limits are obtained based on the tails of our SRs, in particular we consider events with \(H_{\mathrm{T}} > 400\,\text {GeV} \), both with and without an on-Z lepton pair, employing the LHC-style CL\(_{\text {s}}\) method carried out with pseudo-experiments [89,90,91]. They are shown in Fig. 5 in terms of the product of cross section (\(\sigma \)), detector acceptance (A), and selection efficiency (\(\epsilon \)). As we increase the \(p_{\mathrm{T}} ^\text {miss} \) threshold, the observed and expected limits converge to 1.3 fb.

Fig. 5
figure 5

Limits on the product of cross section, detector acceptance, and selection efficiency, \(\sigma A \epsilon \), for the production of multilepton events with (left) or without (right) an on-Z lepton pair as a function of the \(p_{\mathrm{T}} ^\text {miss} \) threshold

8 Summary

We have presented the search for beyond-the-standard-model physics in final states with at least 3 leptons, electrons or muons, using proton–proton data collected with the CMS detector at \(\sqrt{s} = 13\,\text {TeV} \), corresponding to an integrated luminosity of 2.3\(\,\text {fb}^{-1}\). The analysis makes use of techniques based on control samples in data to estimate reducible backgrounds and to validate the simulation for use in estimating irreducible backgrounds. To maximize sensitivity to a broad range of possible signal models, we investigate 30 exclusive signal regions. The event yields observed in data are in agreement with the standard model background predictions.

This search is designed to be sensitive to multiple BSM models. As an example, we interpret the result in the context of a gluino-pair production model that features cascade decays producing four top quarks in the final state. In this simplified model, we exclude gluinos with a mass of up to 1175\(\,\text {GeV}\) in the case of a massless lightest supersymmteric particle (LSP). For gluino masses up to approximately 1150\(\,\text {GeV}\), neutralino masses below 650\(\,\text {GeV}\) are excluded. These are the first CMS results reported in this final state at \(\sqrt{s} = 13\,\text {TeV} \).

In a bottom squark pair production model with cascade decays that contain two top quarks and two additional \(\mathrm{W}^{\pm }\) bosons, we also set limits on the masses of the bottom squark and the chargino. We exclude bottom squarks with a mass of up to 450\(\,\text {GeV}\) in the case of a chargino with a mass of 200\(\,\text {GeV}\). For bottom squark masses up to approximately 450\(\,\text {GeV}\), neutralino masses below 300\(\,\text {GeV}\) are excluded. In a similar search at \(\sqrt{s} = 8\,\text {TeV} \) [42], the bottom squark mass limit was slightly larger and the chargino mass limit was approximately the same.

An additional interpretation is presented in a gluino pair production model with four light quarks and two vector bosons in the final state. For the case of one W and one Z boson in the final state, we exclude gluino masses up to 825\(\,\text {GeV}\) when the LSP mass is 100\(\,\text {GeV}\), and LSP masses up to 500\(\,\text {GeV}\) for 700\(\,\text {GeV}\) gluinos.

Finally, limits on the number of multilepton events with \(H_{\mathrm{T}} > 400\) \(\,\text {GeV}\) as a function of \(p_{\mathrm{T}} ^\text {miss} \) threshold are also presented in terms of the product of cross section, detector acceptance, and selection efficiency. For a \(p_{\mathrm{T}} ^\text {miss} \) threshold greater than 500\(\,\text {GeV}\), the observed and expected limits are 1.3 fb.