1 Introduction

Top-quark pair production final states in proton–proton (pp) collisions at the Large Hadron Collider (LHC) often include additional jets not directly produced in the top-quark decays. The uncertainties associated with these processes are significant in precision measurements, such as the measurement of the top-quark mass [1] and the inclusive \(t\bar{t}\) production cross-section [2].

These additional jets arise mainly from hard gluon emissions from the hard-scattering interaction beyond \(t\bar{t}\) production and are described by quantum chromodynamics (QCD). The higher centre-of-mass energy of the pp scattering process in LHC Run 2 opens a large kinematic phase space for QCD radiation. Several theoretical approaches are available to model the production of these jets in \(t\bar{t}\) processes, including next-to-leading-order (NLO) QCD calculations, parton-shower models, and methods matching fixed-order QCD with the parton shower. The aim of this analysis is to test the predictions of extra jet production in these approaches and to provide data to adjust free parameters of the models to optimise their predictions.

The jet activity is measured in events with at least two b-tagged jets, i.e. jets tagged as containing b-hadrons, and exactly one electron and exactly one muon of opposite electrical charge in the final state. Additional jets are defined as jets produced in addition to the two b-tagged jets required for the event selection, without requiring any matching of jets to partons. In order to probe the \(p_{\text {T}}\) dependence of the hard-gluon emission, this analysis measures the normalised differential \(t\bar{t}\) cross-sections as a function of the jet multiplicity for different transverse momentum (\(p_{\text {T}}\)) thresholds of the additional jets. The \(p_{\text {T}}\) of the leading additional jet is measured, as well as the \(p_{\text {T}}\) of the leading and sub-leading jets initiated by b-quarks (“b-jets”), which are top-quark decay products in most of the events.

Furthermore, the gap fraction defined as the fraction of events with no jet activity in addition to the two b-tagged jets above a given \(p_{\text {T}}\) threshold in a rapidity region in the detector, is measured as a function of the additional jets’ minimum \(p_{\text {T}}\) threshold as defined in Refs. [3, 4]. The results are presented in a fiducial phase space in which all selected final-state objects are produced within the detector acceptance following the definitions in Ref. [5].

This paper provides a measurement of additional jets in \(t\bar{t}\) events in the dilepton channel for the new centre-of-mass energy of 13 \(\text {TeV}\). Measurements similar to those presented in this paper were performed by ATLAS at 7 \(\text {TeV}\) [3, 5] and have been used to tune parameters in Monte Carlo (MC) generators for LHC Run 2 [6,7,8]. These earlier measurements were performed in the lepton+jets channel where the inclusive jet multiplicity was measured, since it is difficult to distinguish jets originating in W decays from additional jets produced by QCD radiation. Recent measurements of jet multiplicity were performed in the single lepton channel by CMS at 13 \(\text {TeV}\) [9] and in the dilepton channel, including also the gap fractions, by ATLAS and CMS at 8 \(\text {TeV}\)  [4, 10].

2 ATLAS detector

The ATLAS detector [11] at the LHC covers nearly the entire solid angleFootnote 1 around the interaction point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer incorporating three large superconducting toroid magnets. The inner-detector system is immersed in a 2T axial magnetic field and provides charged-particle tracking in the range \(|\eta | < 2.5\).

The high-granularity silicon pixel detector covers the interaction region and provides four measurements per track. The closest layer, known as the Insertable B-Layer (IBL) [12], was added in 2014 and provides high-resolution hits at small radius to improve the tracking performance. The pixel detector is followed by the silicon microstrip tracker, which provides four three-dimensional measurement points per track. These silicon detectors are complemented by the transition radiation tracker, which enables radially extended track reconstruction up to \(|\eta | = 2.0\). The transition radiation tracker also provides electron identification information based on the fraction of hits (typically 30 in total) passing a higher charge threshold indicative of transition radiation.

The calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\). Within the region \(|\eta |< 3.2\), electromagnetic calorimetry is provided by barrel and endcap high-granularity lead/liquid-argon (LAr) electromagnetic calorimeters, with an additional thin LAr presampler covering \(|\eta | < 1.8\) to correct for energy loss in material upstream of the calorimeters. Hadronic calorimetry is provided by the steel/scintillator-tile calorimeter, segmented into three barrel structures within \(|\eta | < 1.7\), and two copper/LAr hadronic endcap calorimeters. The solid angle coverage is completed with forward copper/LAr and tungsten/LAr calorimeter modules, which are optimised for electromagnetic and hadronic measurements, respectively.

The muon spectrometer comprises separate trigger and high-precision tracking chambers, measuring the deflection of muons in a magnetic field generated by superconducting air-core toroids. The precision chamber system surrounds the region \(|\eta | < 2.7\) with three layers of monitored drift tubes, complemented by cathode strip chambers in the forward region, where the background is highest. The muon trigger system covers the range \(|\eta | < 2.4\) with resistive plate chambers in the barrel, and thin-gap chambers in the endcap regions.

A two-level trigger system is used to select interesting events [13, 14]. The Level-1 trigger is implemented in hardware and uses a subset of detector information to reduce the event rate to a design value of at most 100 kHz. This is followed by the high-level software-based trigger (HLT), which reduces the event rate to 1 kHz.

3 Data and simulation samples

The proton–proton (pp) collision data used in this analysis were collected during 2015 by the ATLAS detector and correspond to an integrated luminosity of 3.2 fb\(^{-1}\) at \(\sqrt{s} = 13\) \(\text {TeV}\). The data considered in this analysis were collected under stable beam conditions, requiring that all detectors were operational. Each selected event includes interactions from an average of 14 inelastic pp collisions in the same proton bunch crossing, as well as residual signals from previous bunch crossings with a 25 ns bunch spacing. These two effects are collectively referred to as “pile-up”. Events are required to pass a single-lepton trigger, either electron or muon. Multiple triggers are used to select events: either triggers with low lepton \(p_{\text {T}}\) thresholds of 24 \(\text {GeV}\) which utilise isolation requirements to reduce the trigger rate, or triggers with higher \(p_{\text {T}}\) thresholds but looser isolation requirements to increase event acceptance. The higher \(p_{\text {T}}\) thresholds were 50 \(\text {GeV}\) for muons and 60 \(\text {GeV}\) or 120 \(\text {GeV}\) for electrons.

MC simulations are used to model background processes and to correct the data for detector acceptance and resolution effects. The nominal \(t\bar{t}\) sample is simulated using the NLO Powheg-Box v2 matrix-element (ME) generator [15,16,17], referred to as Powheg in the following, and Pythia6 [18] (v6.427) for the parton shower (PS), hadronisation and underlying event. Powheg is interfaced to the CT10 [19] NLO parton distribution function (PDF) set, while Pythia6 uses the CTEQ6L1 PDF set [20]. Pythia simulates the underlying event and parton shower using the P2012 set of tuned parameters (tune) [21]. The “\(h_{\text {damp}}\)” parameter, which controls the \(p_{\text {T}}\) of the first additional emission beyond the Born configuration, is set to the mass of the top quark (\(m_{t}\)). The main effect of this is to regulate the high-\(p_{\text {T}}\) emission against which the \(t\bar{t}\) system recoils. The choice of this \(h_{\text {damp}}\) value has been found to improve the modelling of the \(t\bar{t}\) system kinematics with respect to data in previous analyses [6]. In order to investigate the effects of initial- and final-state radiation, alternative Powheg+Pythia6 samples are generated with the renormalisation and factorisation scales varied by a factor of 2 (0.5) and using low (high) radiation variations of the Perugia 2012 tune and an \(h_{\text {damp}}\) value of \(m_{t}\) (\(2m_{t}\)), corresponding to less (more) parton-shower radiation [6]. These samples are called RadHi and RadLo in the following. These variations are selected to cover the uncertainties in the measurements of differential distributions in 7 \(\text {TeV}\) data [22]. Alternative samples are generated using Powheg and MadGraph5_aMC@NLO[23] (v2.2.1) with CKKW-L, referred to as MG5_aMC@NLO hereafter, both interfaced to Herwig++ [24] (v2.7.1), in order to estimate the effects of the choice of matrix-element generator. These \(t\bar{t}\) samples are described in Ref. [6].

Additional \(t\bar{t}\) samples are generated for comparisons with unfolded data as follows. The predictions of the ME generators Powheg and MG5_aMC@NLO are interfaced to Herwig7 [24, 25] and Pythia8. In all Powheg and MG5_aMC@NLO samples mentioned above, the first emission is calculated from the leading-order real emission term, and further additional jets are simulated from parton showering, which is affected by significant theoretical uncertainties. Improved precision is expected from using Sherpa v2.2 [26], which models the inclusive and the one-additional-jet process using an NLO matrix element and up to four additional jets at leading-order (LO) accuracy using the ME + PS@NLO prescription [27]. The sample used to compare to particle-level results presented here is generated with the central scale set to \(\mu ^2 = m_t ^2 + 0.5 \times (p_{\mathrm {T},t}^2 + p_{{\mathrm {T},\overline{t}}}^2)\), where \(p_{\mathrm {T},t}\) and \(p_{\mathrm {T},\overline{t}}\) refer to the \(p_{\text {T}}\) of the top and antitop quark, respectively, and with the matching scale set to 30 \(\text {GeV}\). Furthermore, the NNPDF 3.0 PDF [28] at next-to-next-to-leading order (NNLO) is used.

All \(t\bar{t}\) samples are normalised to the cross-section calculated with the Top++2.0 program to NNLO in perturbative QCD, including soft-gluon resummation to NNLL [29], assuming a top-quark mass of 172.5 \(\text {GeV}\).

Background processes are simulated using a variety of MC generators, as described below. Details of the background estimation are described in Sect. 5. Single top-quark production in association with a W boson (Wt) is simulated using Powheg-Box v1+Pythia6 with the same parameters and PDF sets as those used for the nominal \(t\bar{t}\) sample and is normalised to the approximate NNLO cross-section (\(71.7\pm 3.8\) pb) described in Ref. [30]. At NLO, part of the final state of Wt production is identical to the final state of \(t\bar{t}\) production. The “diagram removal” (DR) generation scheme [31] is used to remove this part of the phase space from the background calculation. A sample generated using an alternative “diagram subtraction” (DS) method [31] is used to evaluate systematic uncertainties. Both samples are normalised to the generator cross-section.

The majority of backgrounds with at least one misidentified lepton in the selected sample arise from \(t\bar{t}\) production in which only one of the top quarks decays semileptonically, which is simulated in the same way as the \(t\bar{t}\) production in which both top quarks decay leptonically.

Sherpa v2.1, interfaced to the CT10 PDF set, is used to model Drell–Yan production, specifically \(Z/\gamma ^*\rightarrow \tau ^+\tau ^-\). For this process, Sherpa calculates matrix elements at NLO for up to two partons and at LO for up to four partons using the OpenLoops [32] and Comix [33] matrix-element generators. The matrix elements are merged with the Sherpa PS [34] using the ME + PS@NLO prescription [35]. The total cross-section is normalised to NNLO predictions calculated using the FEWZ program [36] with the MSTW2008NNLO PDF [37]. Sherpa v2.1 with the CT10 PDF set is also used to simulate electroweak diboson production [38] (WW, WZ, ZZ), where both bosons decay leptonically. For diboson production, Sherpa v2.1 calculates matrix elements at NLO for zero additional partons, at LO for one to three additional partons (with the exception of ZZ production, for which the one additional parton is also NLO), and using PS for all parton multiplicities of four or more.

The ATLAS detector response is simulated [39] using Geant 4 [40]. A “fast simulation” [41], utilising parameterised showers in the calorimeter, is used in the samples chosen to estimate \(t\bar{t}\) modelling uncertainties. Additional pp interactions are generated using Pythia8.186 [42] with tune A2 and overlaid with signal and background processes in order to simulate the effect of pile-up. The MC simulations are reweighted to match the distribution of the average number of interactions per bunch crossing that are observed in data, referred to as “pile-up reweighting”. Corrections are applied to the MC simulation in order to improve agreement with data for the efficiencies of reconstructed objects. The same reconstruction algorithms and analysis procedures are then applied to both data and MC simulation.

4 Object reconstruction

This analysis selects reconstructed electrons, muons and jets. Electron candidates are identified by matching an inner-detector track to an isolated energy deposit in the electromagnetic calorimeter, within the fiducial region of transverse momentum \(p_{\text {T}}>25\) \(\text {GeV}\) and pseudorapidity \(|\eta |<2.47\). Electron candidates are excluded if the energy cluster is within the transition region between the barrel and the endcap of the electromagnetic calorimeter, \(1.37< |\eta | < 1.52\), and if they are also reconstructed as photons. Electrons are selected using a multivariate algorithm and are required to satisfy a likelihood-based quality criterion, in order to provide high efficiency and good rejection of fake and non-prompt electrons [43, 44]. Electron candidates must have tracks that pass the requirements of transverse impact parameter significanceFootnote 2 \(|d_0^\text {sig}|<5\) and longitudinal impact parameter \(|z_0 \sin \theta | < 0.5\) mm. Electrons must also pass isolation requirements based on inner-detector tracks and topological energy clusters varying as a function of \(\eta \) and \(p_{\text {T}}\). The track isolation cone size is given by the smaller of \(\Delta R = 10\) \(\text {GeV}\)/\(p_{\text {T}}\) and \(\Delta R = 0.2\), i.e. a cone which increases in size at lower \(p_{\text {T}}\)values, up to a maximum of 0.2. These requirements result in a 95% efficiency of the isolation cuts for electrons from \(Z\rightarrow e^+e^-\) decays with \(p_{\text {T}}\) of 25 \(\text {GeV}\) and 99% for electrons with \(p_{\text {T}}\) above 60 \(\text {GeV}\); when estimated in simulated \(t\bar{t}\) events, this efficiency is smaller by a few percent, due to the increased jet activity. Electrons that share a track with a muon are discarded. Double counting of electron energy deposits as jets is prevented by removing the closest jet with an angular distance \(\Delta R < 0.2\) from a reconstructed electron. Following this, the electron is discarded if a jet exists within \(\Delta R < 0.4\) of the electron, to ensure sufficient separation from nearby jet activity.

Muon candidates are identified from a track in the inner detector matching a track in the muon spectrometer; the combined track is required to have \(p_{\text {T}}> 25\) \(\text {GeV}\) and \(|\eta | < 2.5\) [45]. The tracks of muon candidates are required to have a transverse impact parameter significance \(|d_0^\text {sig}|<3\) and a longitudinal impact parameter below 0.5 mm. Muons are required to meet quality criteria and the same isolation requirement as applied to electrons, to obtain the same isolation efficiency performance as for electrons. These requirements reduce the contributions from fake and non-prompt muons. Muons may leave energy deposits in the calorimeter that could be misidentified as a jet, so jets with fewer than three associated tracks are removed if they are within \(\Delta R < 0.4\) of a muon. Muons are discarded if they are separated from the nearest jet by \(\Delta R < 0.4\), to reduce the background from muons originating in heavy-flavour decays inside jets.

Jets are reconstructed with the anti-\(k_t\) algorithm [46, 47], using a radius parameter of \(R = 0.4\), from topological clusters of energy deposits in the calorimeters. Jets are accepted within the range \(p_{\text {T}}> 25\) \(\text {GeV}\) and \(|\eta | < 2.5\), and are calibrated using simulation with corrections derived from data [48]. Jets likely to originate from pile-up are suppressed using a multivariate jet-vertex-tagger (JVT) [49] for candidates with \(p_{\text {T}}< 60\) \(\text {GeV}\) and \(|\eta | < 2.4\). Jets containing b-hadrons are b-tagged using a multivariate discriminant [50], which uses track impact parameters, track invariant mass, track multiplicity and secondary vertex information to discriminate those jets from light quark or gluon jets (“light jets”). The average b-tagging efficiency is 77% for b-jets in simulated dileptonic \(t\bar{t}\) events with a purity of 95%. The tagging algorithm gives a rejection factor of about 130 against light jets and about 4.5 against jets originating from charm quarks (“charm jets”).

5 Event selection and background estimates

Signal events are selected by requiring exactly one electron and one muon of opposite electric charge (“opposite sign”), and at least two b-tagged jets. With this selection, almost all of the selected events are \(t\bar{t}\) events. The other processes that pass the signal selection are events with single top quarks (Wt), \(t\bar{t}\) events in the single-lepton decay channel with a misidentified (fake) lepton, \(Z/\gamma ^{*}\rightarrow \tau ^{+}\tau ^{-}(\rightarrow e\mu )\) and diboson events. Other backgrounds, including processes with two misidentified leptons, are negligible for the event selections used in this analysis.

Additional jets are defined as those produced in addition to the two highest-\(p_{\text {T}}\) b-tagged jets. They are identified as jets above \(p_{\text {T}}\) thresholds of 25, 40, 60 and 80 \(\text {GeV}\), independent of the jet flavour. In very rare cases, b-jets may also be produced in addition to the top-quark pair, for example through splitting of a very high momentum gluon, or through the decay of a Higgs boson into a bottom–antibottom pair, leading to events with more than two b-tagged jets. In this case, the two selected b-tagged jets with the highest \(p_{\text {T}}\) are assumed to originate from \(t\bar{t}\) decay, and the others are considered as additional jets. This procedure ignores that occasionally a b-jet which is not the decay product of a top quark might have higher \(p_{\text {T}}\) than those from the top-quark decays. This is a negligible effect within the uncertainties of this measurement.

The single-top background is estimated from simulation, as described in Sect. 3. The background from \(t\bar{t}\) events in the lepton+jets channel with a fake lepton is estimated from a combination of data and simulation, as in Ref. [2]. This method uses the observation that samples with a same-sign \(e\mu \) pair and two b-tagged jets are dominated by events with a misidentified lepton, with a rate comparable to those in the opposite-sign sample. The contributions of events with misidentified leptons are therefore estimated as same-sign event counts in data, after subtraction of predicted prompt same-sign contributions multiplied by the ratio of opposite-sign to same-sign fake leptons, as predicted from the nominal \(t\bar{t}\) sample.

The backgrounds from \(Z/\gamma ^{*}\rightarrow \tau ^{+}\tau ^{-}\) and from diboson events are estimated from simulation and are below 1%. The normalisation for the \(Z/\gamma ^{*}\rightarrow \tau ^{+}\tau ^{-}\) contribution is estimated from events with \(Z/\gamma ^{*}\rightarrow e^+ e^- \) or \(\mu ^{+} \mu ^{-}\) and two \(b\text {-}\mathrm{jet}\)s within the acceptance of this analysis. The Monte Carlo prediction is scaled by \(1.37\pm 0.30\) to fit the observed rate.

After the event selection, only about 4.5% of the events are background, as listed in Table 1. The background is dominated by single top production (3.1%) and fake leptons (1.6%). The event yields and the relative background contributions vary with jet multiplicity and jet \(p_{\text {T}}\) as shown in Figs. 1 and 2, respectively. The single-top background dominates across all jet \(p_{\text {T}}\) values and at low additional jet multiplicities. At high jet multiplicities (\({\ge }3\) additional jets) the fake-lepton background exceeds the number of single-top events. While the number of events observed in the 0-jet bin agrees with the prediction within the uncertainties, the data exceed the predictions increasingly with jet multiplicity, reaching a 25% deviation for events with at least four additional jets above 25 \(\text {GeV}\).

The table and figures also list the contribution of \(t\bar{t}\) events with at least one additional jet identified as originating from pile-up (pile-up jets). These are signal events, but a few pile-up jets are still in the sample after object and event selection, as the background suppression of the JVT cut is very high but not 100%. Due to the presence of at least one jet that does not originate from the hard interaction, these events may appear in the wrong jet multiplicity bin. In the jet \(p_{\text {T}}\) spectra, pile-up jets contribute at low additional-jet \(p_{\text {T}}\) as the pile-up jets are generally softer than the jets in \(t\bar{t}\) events. For the same reason, pile-up jets only contribute significantly to the jet multiplicity distributions with the 25 \(\text {GeV}\) threshold. In most of the events with remaining pile-up jets, only one of the additional jets is caused by pile-up. Any remaining pile-up jets can be identified in the simulation, but not in data. Therefore the data are corrected for pile-up jets in the unfolding procedure, as described later.

Table 1 Yields of data and MC events fulfilling the selection criteria
Fig. 1
figure 1

Multiplicity of additional jets with a \(p_{\text {T}}>25\) \(\text {GeV}\), b \(p_{\text {T}}>40\) \(\text {GeV}\), c \(p_{\text {T}}>60\) \(\text {GeV}\), and d \(p_{\text {T}}>80\) \(\text {GeV}\) for selected events at reconstruction level in data and simulation. Simulated signal events with at least one additional jet identified as pile-up are indicated in grey. The contribution of pile-up jets to the backgrounds is negligible. The lower panel shows the ratio of the total prediction to the data (solid line), the grey band represents the statistical uncertainty of the measurement, and the error bars on the solid line show the statistical uncertainty in the signal MC sample

Fig. 2
figure 2

a Leading \(b\text {-}\mathrm{tagged jet}\) \(p_{\text {T}}\), b sub-leading \(b\text {-}\mathrm{tagged jet}\) \(p_{\text {T}}\), and c leading additional-jet \(p_{\text {T}}\) for selected events at reconstruction level. The last bin includes overflows. Jets identified as pile-up in the \(t\bar{t}\) signal sample are indicated in grey. The contribution of pile-up jets to the backgrounds is negligible. The lower panel shows the ratio of the total prediction to the data (solid line), the grey band represents the statistical uncertainty of the measurement, and the error bars on the solid line shows the statistical uncertainty in the signal MC sample

6 Sources of systematic uncertainty

The systematic uncertainties of the reconstructed objects, in the signal modelling and in the background estimates, are evaluated as described in the following.

The jet energy scale (JES) uncertainty is evaluated by varying 19 uncertainty parameters derived from in situ analyses at \(\sqrt{s} = 8\) \(\text {TeV}\) and extrapolated to data at \(\sqrt{s} = 13\) \(\text {TeV}\)[48]. The JES uncertainty is 5.5% for jets with \(p_{\text {T}}\) of 25 \(\text {GeV}\) and quickly decreases with increasing jet \(p_{\text {T}}\), falling to below 2% for jets above 80 \(\text {GeV}\). The uncertainty in the jet energy resolution (JER) is calculated by extrapolating the uncertainties derived at \(\sqrt{s} = 8\) \(\text {TeV}\) to \(\sqrt{s} = 13\) \(\text {TeV}\) [48]. The uncertainty in JER is at most \(3.5\%\) at \(p_{\text {T}}\) of 25 \(\text {GeV}\), quickly decreasing with increasing jet \(p_{\text {T}}\) to below \(2\%\) for jets above 50 \(\text {GeV}\).

Uncertainties on the efficiency for tagging b-jets were determined using the methods described in Ref. [51] applied to dileptonic ttbar events in \(\sqrt{s} =13\) \(\text {TeV}\) data. The uncertainties on mistagging of charm and light jets were determined using \(\sqrt{s}=8\) \(\text {TeV}\) data as described in Refs. [52, 53]. Additional uncertainties are assigned to take into account the presence of the new IBL detector and the extrapolation to \(\sqrt{s}=13\) \(\text {TeV}\) [50].

The lepton-related uncertainties are assessed mostly using \(Z\rightarrow \mu ^{+} \mu ^{-} \) and \(Z\rightarrow e^+e^-\) decays measured in \(\sqrt{s}=13\) \(\text {TeV}\) data. The differences between the topologies of Z and \(t\bar{t}\) pair production events are expected not to be significant for the estimation of uncertainties.

The uncertainty associated with the amount of QCD initial- and final-state radiation is evaluated as the difference between the baseline MC sample and the corresponding RadHi and RadLo samples described in Sect. 3. The uncertainty due to the choice of parton-shower and hadronisation algorithms in the signal modelling is assessed by comparing the baseline MC sample (Powheg+Pythia6) with Powheg+Herwig++. The uncertainty due to the use of a specific NLO MC sample with its particular matching algorithm is derived from the comparison of Powheg+Herwig++ to the MG5_aMC@NLO+Herwig++ sample.

The uncertainty due to the particular PDF used for the signal model prediction is evaluated by taking the standard deviation of variations from 100 eigenvectors of the recommended Run-2 PDF4LHC [54] set and adding them in quadrature with the difference between the central predictions from CT10 and CT14 [55].

The uncertainty in the single top-quark background is evaluated based on the 5.3% error in the approximate NNLO cross-section prediction and by comparing samples with diagram removal and diagram subtraction schemes, as described in Sect. 3. The uncertainty in the background from fake leptons is estimated to be 100% from the statistical uncertainty of the same-sign event counts in data and an interpolation error using the envelope of the differences of individual subcomponents (such as photon-conversion, heavy-flavour decay leptons, for example) of misidentified lepton background between the same-sign and the opposite-sign sample.

For Z+jets backgrounds, the scale factor derived in the \(e^+e^-\) and \(\mu ^{+} \mu ^{-} \) channels and used to reweight the signal-region distribution is varied by 22%, corresponding to the difference in the scale factors derived in subsamples with and without an additional jet. This value covers the variations of the correction factor derived from subsets of events with different jet multiplicities. No theoretical uncertainty is applied to the Z+jets background normalisation as this is scaled to data.

The uncertainty in the amount of pile-up is estimated by changing the nominal MC reweighting factors to vary the number of interactions per bunch crossing in data up and down by 10%. Two methods were used to estimate the amount of interactions per bunch crossing. The first method calculated the number of interactions using the instantaneous luminosity and the inelastic proton-proton cross section [56, 57]. The results of the calculation were compared to results from a data-driven method based on the number of reconstructed vertices. The difference between the correlation of the two methods in data and MC is taken as the uncertainty.

The uncertainty due to the 2–3% loss of hard-scatter jets due to the JVT cut is estimated using Z+jet events. The uncertainty in the efficiency of the JVT cut to reduce pile-up jets is estimated by using a sideband method. The JVT cut is inverted in simulation to estimate the number of pile-up jets and derive a scale factor to describe the number of pile-up jets in data. This factor is then used to scale the predicted number of pile-up jets in the signal region (with the JVT cut applied). Scale factors are also derived using the samples with increased and decreased pile-up mentioned above, and the larger of two variations is taken as systematics.

7 Definition of the fiducial phase space

For the measurement of the jet multiplicity, the jet \(p_{\text {T}}\) spectra and the gap fractions, the data are corrected to particle level by comparing to events from MC generators in the fiducial volume described below. The fiducial volume, i.e., the object definitions and the kinematic phase space at particle level, is designed to match the reconstruction level as closely as possible and follow closely the definitions in Refs. [4, 5]. Leptons and jets are defined using particles with a mean lifetime greater than \(0.3 \times 10^{-10}\) s, directly produced in pp interactions or from subsequent decays of particles with a shorter lifetime. Leptons from W boson decays (e, \(\mu , \nu _e, \nu _{\mu }, \nu _{\tau })\) are identified as such by requiring that they are not hadron decay products. Electron and muon four-momenta are calculated after the addition of photon four-momenta within a cone of \(\Delta R =0.1\) around their original directions.

Jets are defined using the anti-\(k_t\) algorithm with a radius parameter of 0.4. All particles are considered for jet clustering, except for leptons from W decays as defined above (i.e., neutrinos from hadron decays are included in jets) and any photons associated with the selected electrons or muons. Jets initiated by b-quarks are identified as such, i.e., identified as b-jets if a hadron with \(p_{\text {T}}>5\) \(\text {GeV}\) containing a b-quark is associated with the jet through a ghost-matching technique as described in Ref. [58].

The cross-section is defined using events with exactly one electron and one muon with opposite-sign directly from W boson decays, i.e. excluding electrons and muons from decay of the \(\tau \) leptons. In addition, at least two \(b\text {-}\mathrm{jet}\)s each with \(p_{\text {T}}> 25\) \(\text {GeV}\) and \(|\eta |<2.5\) are required. Following the reconstructed object selection, events with jet–electron pairs or jet–muon pairs with \(\Delta R < 0.4\) are excluded. Additional jets are considered within \(| \eta |<2.5\) for \(p_{\text {T}}\) thresholds of 25 \(\text {GeV}\) or higher, independently of their flavour.

8 Measurement of jet multiplicities and \(p_{\text {T}}\) spectra

The multiplicities of additional reconstructed jets with different \(p_{\text {T}}\) thresholds are corrected to particle level within the fiducial volume as defined above. Even though the kinematic range of the measurement is chosen to be the same for particle-level and reconstruction-level objects, corrections are necessary due to the efficiencies and detector resolutions that cause differences between reconstruction-level and particle-level jet distributions. Examples include events in which one or more particle-level jets do not pass the \(p_{\text {T}}\) threshold for reconstruction-level jets and when the selection efficiency for inclusive \(t\bar{t}\) events changes as a function of jet multiplicity. Furthermore, additional reconstructed jets without a corresponding particle-level jet may appear due to pile-up, or if a jet migrates into the fiducial volume due to an upward fluctuation caused by the \(p_{\text {T}}\) resolution, or if a single particle-level jet is reconstructed as two separate jets. These effects lead to migrations between bins and are taken into account within an iterative Bayesian unfolding [59].

The reconstructed jet multiplicity measurements are corrected separately for each additional-jet \(p_{\text {T}}\) threshold according to

$$\begin{aligned} N^i_{{\mathrm {unfold}}} =\frac{1}{f^i_{{\mathrm {eff}}}} \cdot \sum _j (M^{-1})^{{\mathrm {part}},i}_{{\mathrm {reco}},j} \cdot f^j_{ {\mathrm {accept}}} (N_{{\mathrm {data}}}^{j}- N_{{\mathrm {bg}}}^j), \end{aligned}$$
(1)

where \(N^i_{{\mathrm {unfold}}}\) is the total number of fully corrected particle-level events with particle-level jet multiplicity i. The term \(f^i_{{\mathrm {eff}}}\) represents the efficiency to reconstruct an event with i additional jets, defined as the ratio of events with i particle-level jets that fulfil both the fiducial volume selection at particle-level and the reconstruction-level selection, \(N^i_{\mathrm{{reco}}\wedge \mathrm{{part}}}\), to the number of events that fulfil the particle-level selection, \(N^i_{{\mathrm {part}}}\):

$$\begin{aligned} f^i_{{\mathrm {eff}}}=\frac{N^i_{{\mathrm {reco}} \wedge {\mathrm {part}}}}{N^i_{{\mathrm {part}}}}. \end{aligned}$$
(2)

The resulting ratio \(f^i_{{\mathrm {eff}}}\) is approximately 0.33 and has very small dependence on the jet multiplicity. The analysis of different \(t\bar{t}\) MC samples results in values of \(f^i_{{\mathrm {eff}}}\) which vary by up to 10%. The variations of \(f^i_{{\mathrm {eff}}}\) between different \(p_{\text {T}}\) thresholds are less than 2%. The function \( f^j_{{\mathrm {accept}}}\) is the probability of an event fulfilling the reconstruction-level selection and with j reconstructed jets, \(N^{j}_{{\mathrm {reco}}}\), to also be within the particle-level acceptance defined in Sect. 7:

$$\begin{aligned} f^j_{{\mathrm {accept}}}=\frac{N^{j}_{{\mathrm {reco}} \wedge {\mathrm {part}}}}{N^{j}_{{\mathrm {reco}}}}. \end{aligned}$$
(3)

The variable \(N^{j}_{{\mathrm {data}}}\) is the number of events in data with j reconstructed jets and \(N^j_{{\mathrm {bg}}}\) is the number of background events, as evaluated in Sect. 5. The resulting \(f^j_{{\mathrm {accept}}}\) decreases from around 0.85 for events without additional jets to about 0.76 for the highest jet multiplicities. The MC predictions of \(f^j_{{\mathrm {accept}}}\) agree within 1% for events without any additional jets and within 5% at high jet multiplicities. Only MG5_aMC@NLO+Herwig++ predicts a smaller change as a function of the number of jets.

The response matrix \(M^{{\mathrm {part}},i}_{{\mathrm {reco}},j}\) represents the probability \(P(N^j_{{\mathrm {reco}}} | N^i_{{\mathrm {part}}})\) of finding an event with true particle-level jet multiplicity i with a reconstructed jet multiplicity j. As shown in Fig. 3, at the higher jet \(p_{\text {T}}\) thresholds, at least 77% of the events have the same jet multiplicity at particle level and at reconstruction level. At the 25 \(\text {GeV}\) threshold, the agreement still exceeds 64%. The worse agreement can be explained in part by the presence of pile-up jets, which leads to events with more reconstructed than particle-level jets. There are almost no events with a difference of more than one jet between particle and reconstruction-level multiplicity.

As part of the Bayesian unfolding using Eq. (1), \(M^{{\mathrm {part}},i}_{{\mathrm {reco}},j}\) is calculated iteratively, i.e., the result of the first iteration is used as the reconstruction-level jet multiplicity for the following one. The corrected spectra are found to converge after four iterations of the Bayesian unfolding algorithm.

The unfolded additional-jet multiplicity distributions are normalised after the last iteration according to

$$\begin{aligned} \frac{1}{\sigma }\frac{{\text {d}}\sigma }{{\text {d}}N^{i}} = \frac{N^{i}_{{\mathrm {unfold}}}}{ \sum _{i} N^{i}_{{\mathrm {unfold}}}}, \end{aligned}$$
(4)

where \(N^{i}_{{\mathrm {unfold}}}\), as defined in Eq. (1), corresponds to the number of events with i jets after full unfolding and \(\sigma \) is the measured \(t\bar{t}\) production cross section in the fiducial volume.

A potential bias of the unfolded results due to data statistics and the unfolding procedure is investigated using pseudo-experiments by performing Gaussian sampling of the reconstruction-level distributions with statistical power equivalent to that present in data. The size of the bias, defined as the relative difference between the unfolded and predicted particle-level distributions, is found to be within the statistical uncertainty of the data. To check the size of a potential bias of the unfolding due to the relation between reconstructed and particle level distributions, the particle-level distributions are reweighted to alternative MC samples. Pseudo-experiments are performed based on the resulting alternative spectrum at reconstruction level. The pseudo-experiments are unfolded using the original correction procedure. The relative difference between the unfolded particle-level distribution and the predicted particle-level distribution from the alternative MC sample is found to be well within the modelling uncertainty. In addition, it is ensured that differences between the nominal and alternative particle-level distributions are at least as large as the difference between data and the predicted reconstruction-level distributions.

The effect of the uncertainties listed in Sect. 6 on the unfolded multiplicity and jet spectra is evaluated as follows. The uncertainties due to detector-related effects, such as JES, JER and b-tagging and data statistics, are propagated through the unfolding by varying the reconstructed objects for each uncertainty component by \(\pm 1\sigma \). The modified spectrum is then used as \(N_{{\mathrm {data}}}^{j}\) in Eq. (1) for the iterative unfolding and the difference on the particle-level distribution is taken as the systematic uncertainty.

The uncertainties due to the MC modelling of the QCD initial- and final-state radiation (ISR/FSR) and the parton-shower uncertainty are evaluated by replacing the data with the corresponding alternative MC sample and using the response matrix and the correction factors from the baseline \(t\bar{t}\) MC sample for unfolding. The result is compared to the particle-level distribution of the alternative MC sample and the difference is taken as a systematic uncertainty. The uncertainties due to the MC modelling of the NLO matrix element and the matching algorithm are estimated in a similar way by replacing the data with the MG5_aMC@NLO+Herwig++ sample but using the response matrix and correction factors from Powheg+Herwig++. The resulting uncertainties are symmetrised for each component.

Fig. 3
figure 3

Unfolding response matrices to match distributions (jet multiplicity, jet \(p_{\text {T}}\)) at reconstruction level to particle-level distributions in the fiducial phase space. Only events that fulfil the reconstruction- (particle-) level selection are included. Matrices to unfold a jet multiplicity for additional jets with \(p_{\text {T}}> 25\) \(\text {GeV}\), b jet multiplicity for additional jets with \(p_{\text {T}}>40\) \(\text {GeV}\), c jet \(p_{\text {T}}\) of the leading additional jet, and d jet \(p_{\text {T}}\) of the leading \(b\text {-}\mathrm{jet}\)

To unfold the leading and sub-leading \(b\text {-}\mathrm{jet}\) \(p_{\text {T}}\) and the leading additional-jet \(p_{\text {T}}\), the same ansatz is used as for the jet multiplicity measurement, but with the jet \(p_{\text {T}}\) instead of the jet multiplicity in the matrix, the acceptance and the efficiency formula. The binning is chosen to limit the migration, such that most events have reconstruction-level jet \(p_{\text {T}}\) in the same region as the particle-level jet \(p_{\text {T}}\), and to limit the uncertainty due to data statistics. The efficiency correction \(f^i_{{\mathrm {eff}}}\) for the \(b\text {-}\mathrm{jet}\)s has a significant \(p_{\text {T}}\) dependence: it is around 0.2 for the lowest \(p_{\text {T}}\) bin and reaches approximately 0.35 at \(p_{\text {T}}\) of  80 \(\text {GeV}\). The efficiency for the additional jet varies only slightly between 0.28 and 0.31. The acceptance correction is between 0.8 and 0.9 for all jets and almost independent of \(p_{\text {T}}\), except at very low \(p_{\text {T}}\), at which it decreases significantly, to 0.56 for the leading additional jet. The unfolding response matrix presented in Fig. 3 shows that more than 60% of the jets are in the same \(p_{\text {T}}\) bin at particle and reconstruction level.

The spectra are normalised after the last iteration similarly to those in the jet multiplicity measurement:

$$\begin{aligned} \frac{1}{\sigma } \frac{{\text {d}} \sigma }{{\text {d}} p_{\text {T}}^i} = \frac{N^{i}_{p_{\text {T}},{{\mathrm {unfold}}}}}{ \Delta p_{\text {T}}^i \sum _{i} N^{i}_{ p_{\text {T}},{\mathrm {unfold}}}}, \end{aligned}$$
(5)

where \(N^{i}_{ p_{\text {T}},{\mathrm {unfold}}}\), as defined in Eq. (1), corresponds to the number of events with the jet \(p_{\text {T}}\) in bin i after full unfolding.

The measurement of the jet \(p_{\text {T}}\) spectra is as stable as the jet multiplicity measurements and the biases are small.

8.1 Jet multiplicity results

The unfolded normalised cross-sections are shown in Fig. 4 and are compared to different MC predictions. Events with up to three additional jets with \(p_{\text {T}}\) above 25 \(\text {GeV}\) are measured exclusively (four jets inclusively) and up to two additional jets exclusively (three inclusively) for the higher \(p_{\text {T}}\) thresholds. Tables 2, 3, 4 and 5 list the detailed composition of the uncertainties for 25 to 80 \(\text {GeV}\). The jet multiplicity distributions are measured with an uncertainty of 4–5% for one additional jet, about 10% for two additional jets, and around 20% for the highest jet multiplicity bin, except for the 80 \(\text {GeV}\) threshold where the statistical uncertainty is larger for higher jet multiplicity bins. Systematic uncertainties dominate in all the measurements. In almost all bins for all \(p_{\text {T}}\) thresholds, the JES uncertainty dominates, followed by the modelling uncertainty.

The data are compared to Powheg and MG5_aMC@NLO matched with different shower generators, namely Pythia8, Herwig++, and Herwig7 and to Sherpa, as shown in Figs. 4 and 5. Most predictions are within uncertainties and only slight deviations are visible except for Powheg+Herwig7, which deviates significantly from the data for all \(p_{\text {T}}\) thresholds. The MG5_aMC@NLO predictions agree within 5–10% regardless of which parton shower is used (except Herwig7), and the Powheg predictions vary slightly more. The variations are larger when using different matrix elements but the same parton shower.

The unfolded data are compared with different MC predictions using \(\chi ^2\) tests. Full covariance matrices are produced from the unfolding taking into account statistical and all systematic uncertainties. The correlation of the measurement bins is similar for all jet \(p_{\text {T}}\)thresholds: strong anti-correlations exist between events with no additional jet and events with any number of additional jets. Positive correlations exist between the bins with one and two additional jets. The \(\chi ^2\) is determined using:

$$\begin{aligned} \chi ^{2} = S^{{\mathrm {T}}}_{n-1} {\text {Cov}}^{-1}_{n-1} S_{n-1} \end{aligned}$$
(6)

where \(S_{n-1}\) is a column vector representing the difference between the unfolded data and the MC generator predictions of the normalised cross-section for one less than the total number of bins in the distribution, and \({\text {Cov}}_{n-1}\) is a matrix with \(n-1\) rows and the respective \(n-1\) columns of the full covariance matrix. The full covariance matrix is singular and non-invertible, as it is evaluated using normalised distributions. The p-values are determined using the \(\chi ^2\) and \(n-1\) degrees of freedom. Table 6 shows the \(\chi ^2\) and p-values.

A statistical comparison taking into account the bin correlations indicates that the agreement with data is slightly better for MG5_aMC@NLO+Herwig++, as shown in Table 6. The ratio of the data to predictions of Powheg+Pythia6 with different levels of QCD radiation both in the matrix-element calculation and in the parton shower is also shown. Powheg+Pythia6 (RadLo) does not describe the data well. The central prediction of Powheg+Pythia6 yields fewer jets than in data; however, the predictions are still within the experimental uncertainties. Powheg+Pythia6 (RadHi) describes the data most consistently, which is also confirmed by high p-values for all \(p_{\text {T}}\) thresholds. The Powheg+Pythia6 (RadLo) sample has p-values around 0.5 and the central sample mostly between 0.8 and 0.9.

Table 2 Summary of relative uncertainties in [%] for the jet multiplicity measurement using a jet \(p_{\text {T}}\) threshold of 25 \(\text {GeV}\). “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refers to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 3 Summary of relative uncertainties in [%] for the jet multiplicity measurement using a jet \(p_{\text {T}}\) threshold of 40 \(\text {GeV}\). “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refer to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 4 Summary of relative uncertainties in [%] for the jet multiplicity measurement using a jet \(p_{\text {T}}\) threshold of 60 \(\text {GeV}\). “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refer to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 5 Summary of relative uncertainties in [%] for the jet multiplicity measurement using a jet \(p_{\text {T}}\) threshold of 80 \(\text {GeV}\). “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refer to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 6 Values of \(\chi ^2/\text {NDF}\) and p-values between the unfolded normalised cross-section and the predictions for additional-jet multiplicity measurements. The number of degrees of freedom is equal to the number of bins minus one
Table 7 Summary of relative measurement uncertainties in [%] for the leading b-jet \(p_{\text {T}}\) distribution. “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refers to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 8 Summary of relative measurement uncertainties in [%] for the sub-leading b-jet \(p_\mathrm{T}\) distribution. \(\ddot{\mathrm{S}}\)ignal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refers to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF
Table 9 Summary of relative measurement uncertainties in [%] for the leading additional jet \(p_{\text {T}}\) distribution. “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refers to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF

8.2 Jet \(p_{\text {T}}\) spectra results

The particle-level normalised cross-sections differential in jet \(p_{\text {T}}\) are shown in Fig. 6 and are compared to different MC predictions. The total uncertainty in the \(p_{\text {T}}\) measurements is 5–11%, although higher at some edges of the phase space. The uncertainty is dominated by the statistical uncertainty in almost all bins. The systematic uncertainties are listed in Tables 7, 8 and 9. JES/JER, NLO generator modelling and PS/hadronisation are all significant and one of them is always the dominant source of systematic uncertainty. JES/JER is the main source of uncertainty in the lowest \(p_{\text {T}}\) bins of all measurements.

Table 10 Values of \(\chi ^2/\text {NDF}\) and p-values between the unfolded normalised cross-section and the predictions for the jet \(p_{\text {T}}\) measurements. The number of degrees of freedom is equal to one less than the number of bins in the distribution

The predictions agree with data for all jet \(p_{\text {T}}\) distributions as shown in Figs. 6 and 7, although the predictions of Powheg+Herwig++ and MG5_aMC@NLO+Pythia8 do not give a good description of the leading additional-jet \(p_{\text {T}}\) distribution, which is consistent with the jet multiplicity results. This is reflected by the statistical comparison as well (Table 10).

Fig. 4
figure 4

Unfolded jet multiplicity distribution for different \(p_{\text {T}}\) thresholds of the additional jets, for a additional jet \(p_{\text {T}}>25\) \(\text {GeV}\), b additional jet \(p_{\text {T}}>40\) \(\text {GeV}\), c additional jet \(p_{\text {T}}>60\) \(\text {GeV}\), and d additional jet \(p_{\text {T}}>80\) \(\text {GeV}\). Comparison to different MC predictions is shown for these distribution in first panel. The middle and bottom panels show the ratios of different MC predictions of the normalised cross-section to the measurement and the ratios of Powheg+Pythia6 predictions with variation of the QCD radiation to the measurement, respectively. The shaded regions show the statistical uncertainty (dark grey) and total uncertainty (light grey)

Fig. 5
figure 5

Ratios of jet multiplicity distribution for different \(p_{\text {T}}\) thresholds of the additional jets predicted by various MC generators to the unfolded data, for a additional jet \(p_{\text {T}}>25\) \(\text {GeV}\), b additional jet \(p_{\text {T}}>60\) \(\text {GeV}\). The shaded regions show the statistical uncertainty (dark grey) and total uncertainty (light grey)

Fig. 6
figure 6

Unfolded jet \(p_{\text {T}}\) distribution for a leading b-jet, b sub-leading b-jet and c leading additional jet. Comparison to different MC predictions is shown for these distribution in first panel. The middle and bottom panels show the ratios of different MC predictions of the normalised cross-section to the measurement and the ratios of Powheg+Pythia6 predictions with variation of the QCD radiation to the measurement, respectively. The shaded regions show the statistical uncertainty (dark grey) and total uncertainty (light grey)

Fig. 7
figure 7

Ratios of jet \(p_{\text {T}}\) distribution for a leading b-jet, b sub-leading b-jet and c leading additional jet predicted by various MC generators to the unfolded data. The shaded regions show the statistical uncertainty (dark grey) and total uncertainty (light grey)

9 Gap fraction measurements

The jet activity is also studied by measuring the gap fraction \(f_{{\mathrm {gap}}}\), defined as the fraction of events with no jet activity in addition to the two b-tagged jets above a given \(p_{\text {T}}\) threshold in a “veto region” defined as a rapidity region in the detector. The transverse momentum threshold is defined in two ways, and the gap fraction in two ways accordingly. First, the gap fraction is measured as the fraction of events without any additional jet in that rapidity region above a given \(p_{\text {T}}\) threshold \(Q_0\):

$$\begin{aligned} f_{{\mathrm {gap}}}(Q_0)=\frac{n(Q_0)}{N_{t\overline{t}}}, \end{aligned}$$
(7)

where \(N_{t\overline{t}}\) is the total number of selected events, \(Q_0\) is the \(p_{\text {T}}\) threshold for any additional jet in the veto region of these events, and \(n(Q_0)\) represents the subset of events with no additional jet with \(p_{\text {T}}>Q_0\).

The second type of gap fraction is defined as the fraction of events in which the scalar \(p_{\text {T}}\) sum of all additional jets in the given veto region does not exceed a given threshold \(Q_{\mathrm {sum}}\):

$$\begin{aligned} f_{{\mathrm {gap}}}(Q_{{\mathrm {sum}}})=\frac{n(Q_{{\mathrm {sum}}})}{N_{t\overline{t}}}. \end{aligned}$$
(8)

Here, \(n(Q_{{\mathrm {sum}}})\) represents the subset of events in which the scalar \(p_{\text {T}}\) sum of all additional jets in the veto region is less than \(Q_{{\mathrm {sum}}}\). The gap fraction defined using \(Q_0\) is mainly sensitive to the leading \(p_{\text {T}}\) emission accompanying the \(t\bar{t}\) system, whereas the gap fraction defined using \(Q_{{\mathrm {sum}}}\) is sensitive to all hard emissions accompanying the \(t\bar{t}\) system. In the following descriptions of the gap fraction measurement process, the same procedure is followed for \(Q_{{\mathrm {sum}}}\) as for \(Q_0\).

Both types of gap fraction are measured in four veto regions: \(|y|<0.8\), \(0.8<|y|<1.5\), \(1.5<|y|<2.1\) and the full central region \(|y|<2.1\), where y is calculated as

$$\begin{aligned} y=\frac{1}{2}\ln \left( \frac{E+p_z}{E-p_z}\right) . \end{aligned}$$
(9)

Furthermore, the gap fraction is measured considering jet activity in the full central region (\(|y|<2.1\)) for four different subsamples specified by the mass of the \(e\mu +2~b\)-tagged jets system, \(m_{e\mu bb}\). Both the rapidity region and the \(m_{e\mu bb}\) subsamples are chosen to correspond to those used in earlier publications at lower energies [3, 4].

The gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) (and analogously for \(f^{{\text {part}}}_{{\text {gap}}}(Q_{{\text {sum}}})\) in the following) is measured as defined in Eq. (10) by counting the number of selected data events \(N_{\text {data}}\) and the number \(n_{{\mathrm {data}}}(Q_0)\) of those that had no additional jets with \(p_{\text {T}}>Q_0\) within the veto region, where the sets of \(Q_0\) and \(Q_{{\text {sum}}}\) threshold values correspond approximately to one standard deviation of the jet energy resolution and are the same as in the earlier publications [3, 4]. The number of background events, \(N_{{\mathrm {bg}}}\) and \(n_{{\mathrm {bg}}}(Q_0)\), are then subtracted from these events:

$$\begin{aligned} f^{{\mathrm {data}}}(Q_0)=\frac{n_{{\mathrm {data}}}(Q_0)- n_{{\mathrm {bg}}}(Q_0)}{N_{{\mathrm {data}}} -N_{{\mathrm {bg}}}} \end{aligned}$$
(10)

and similarly for \(f^{{\text {part}}}_{{\text {gap}}}(Q_{{\text {sum}}})\). The measured gap fraction \(f^{{\text {data}}}(Q_0)\) is then corrected for detector effects to particle level by multiplying it by a correction factor \(C(Q_0)\) to obtain \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\). The correction factor \(C(Q_0)\) is determined from the baseline Powheg+Pythia6 \(t\bar{t}\) sample using the simulated gap fraction values at reconstruction level \(f^{{\text {reco}}}(Q_0)\), and at particle level \(f^{{\text {part}}}(Q_0)\):

$$\begin{aligned} C(Q_0)= \frac{f^{{\text {part}}}(Q_0)}{f^{{\text {reco}}}(Q_0)}. \end{aligned}$$
(11)

The values of the correction factors \(C(Q_0)\) and \(C(Q_{{\text {sum}}})\) deviate by less than 4% from unity at low \(Q_0\) and \(Q_{{\text {sum}}}\) values in the rapidity regions (less than 8% in the \(m_{e\mu bb}\) subsamples), and approach unity at higher threshold values. The small corrections reflect the high selection efficiency and high purity of the event samples. At each threshold \(Q_0\), the baseline simulation predicts that around 80% of the selected reconstructed events that do not have a jet with \(p_{\text {T}}>Q_0\) also have no particle-level jet with \(p_{\text {T}}>Q_0\). Therefore, a simple bin-by-bin correction method is considered adequate, rather than a full unfolding as used in Sect. 8.

Systematic uncertainties arise in this procedure from the uncertainties in \(C(Q_0)\) and the subtracted backgrounds. The uncertainties, as described in Sect. 6, are used to recalculate \(f^{{\text {data}}}(Q_0)\) and \(C(Q_0)\) to obtain the gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\). The corresponding quantities for \(Q_{{\text {sum}}}\) are calculated accordingly. Figure 8 and Table 11 list the resulting relative uncertainty in \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\), \(\Delta f/f\), for the different sources of uncertainty in the full central rapidity region.

Fig. 8
figure 8

Envelope of fractional uncertainties \(\Delta f/f\) in the gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\), centred around unity, for a \(|y|<0.8\) and b \(|y|<2.1\). The statistical uncertainty is shown by the shaded area, and the total uncertainty by the solid black line. The systematic uncertainty is shown broken down into several groups, each of which includes various individual components

Table 11 Sources of uncertainty in the gap fraction measurement as a function of \(Q_0\) for the full central region \(|y|<2.1\), for a selection of \(Q_0\) thresholds. “Signal modelling” sources of systematic uncertainty includes the hadronisation, parton shower and NLO modelling uncertainties. “Other” sources of systematic uncertainty refer to lepton and jet selection efficiencies, background (including pile-up jets) estimations, and the PDF

9.1 Gap fraction results in rapidity regions

Figure 9 shows the measured gap fractions \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) in data, corrected to the particle level. The gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) is compared to various MC generator predictions in Fig. 10, and Fig. 11 shows the measured gap fractions \(f^{{\text {part}}}_{{\text {gap}}}(Q_{{\text {sum}}})\) compared to various MC generators, corrected to the particle level. The predictions of Sherpa and MG5_aMC@NLO +Herwig++ agree well with each other and are within the uncertainties of the data, while Powheg+Pythia8 has slightly higher gap fractions, i.e., predicts too little radiation. Similarly to the jet multiplicity measurements, Powheg+Pythia6 (RadHi) agrees well with data, while the nominal and the Powheg+Pythia6 (RadLo) samples give similar but too high predictions compared to data. The results in Fig. 9d can directly be compared with the jet multiplicity results in Figs. 4 and 5 in the one additional jet bin. Here the Powheg+Pythia8 predictions are below data for all distributions which proves the consistency of the measurements. The \(p_{\text {T}}\) distribution of the first additional jet shown in Fig. 6 contains only events with at least one additional jet and differs in this respect from the gap fraction distribution which includes events with no additional jet. However, the results are also consistent as Powheg+Pythia8 predicts a slightly softer \(p_{\text {T}}\) spectrum for the additional jet which leads to the observed effect that less jets above the 25 GeV threshold are observed.

The matrix of statistical and systematic correlations is shown in Fig. 12 for the gap fraction measurement at different values of \(Q_0\) for the full central \(|y|<2.1\) rapidity region. Nearby points in \(Q_0\) are highly correlated, while well-separated \(Q_0\) points are less correlated. The full covariance matrix, including correlations, is used to calculate a \(\chi ^2\) value for the compatibility of each of the NLO generator predictions with the data in each veto region. The results are given in Tables 12 and 13. An analysis of the p-values confirms that Powheg+Herwig++, MG5_aMC@NLO+Herwig7, MG5_aMC@NLO+Pythia8 and Powheg+Pythia6 (RadLo) are not consistent with the data. Powheg+Pythia6 (RadHi) has the best p-values among the QCD shower variations of Powheg+Pythia6.

Fig. 9
figure 9

The measured gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) as a function of \(Q_0\) in different rapidity veto regions, a \(|y|<0.8\), b \(0.8<|y|<1.5\), c \(1.5<|y|<2.1\) and d \(|y|<2.1\). The data are shown by the points with error bars indicating the total uncertainty, and compared to the predictions from various \(t\bar{t}\) simulation samples shown as smooth curves. The lower plots show the ratio of predictions to data, with the data uncertainty indicated by the shaded band, and the \(Q_0\) thresholds corresponding to the left edges of the histogram bins, except for the first bin

Fig. 10
figure 10

Ratios of prediction to data of the measured gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) as a function of \(Q_0\) in different rapidity veto regions, a \(|y|<0.8\) and b \(|y|<2.1\). The predictions from various \(t\bar{t}\) simulation samples are shown as ratios to data, with the data uncertainty indicated by the shaded band, and the \(Q_0\) thresholds corresponding to the left edges of the histogram bins, except for the first bin

Fig. 11
figure 11

The measured gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_{{\text {sum}}})\) as a function of \(Q_{{\text {sum}}}\) in different rapidity veto regions, a \(|y|<0.8\) and b \(|y|<2.1\), followed by ratios of prediction to data of the measured gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_{{\text {sum}}})\) as a function of \(Q_{{\text {sum}}}\) in the same two rapidity regions. The data in a and b are shown by the points with error bars indicating the total uncertainty, and compared to the predictions from various \(t\bar{t}\) simulation samples shown as smooth curves. The lower plots in a and b and the set of ratio plots in c and d show the ratio of predictions to data, with the data uncertainty indicated by the shaded band, and the \(Q_{{\text {sum}}}\) thresholds corresponding to the left edges of the histogram bins, except for the first bin

Fig. 12
figure 12

The correlation matrix (including statistical and systematic correlations) for the gap fraction measurement at different values of \(Q_0\) for the full central rapidity region \(|y|<2.1\)

9.2 Gap fraction results in \(m_{e\mu bb}\) subsamples

The gap fraction is also measured over the full central veto region \(|y|<2.1\) after dividing the data sample into four regions of \(m_{e\mu bb}\). The distribution of reconstructed \(m_{e\mu bb}\) in the selected \(e\mu +2~b\)-tagged jets events is reasonably well-reproduced by the nominal \(t\bar{t}\) simulation sample, as shown in Fig. 13. The distribution is divided into four regions at both reconstruction and particle level: \(m_{e\mu bb}\) \(<300\) \(\text {GeV}\), \(300~\text {GeV}< m_{e\mu bb}<425\) \(\text {GeV}\), \(425~\text {GeV}<m_{e\mu bb}<600\) \(\text {GeV}\) and \(m_{e\mu bb}>600\) \(\text {GeV}\). These boundaries are chosen to minimise migration between the regions. In the baseline simulation, around 85% of the reconstructed events in each \(m_{e\mu bb}\) region belong to the corresponding region at particle level. The corresponding correction factors \(C_{m}(Q_0)\) which translate the measured gap fraction in the reconstruction-level \(m_{e\mu bb}\) region to the corresponding particle-level gap fractions \(f_{m}(Q_0)\)  are of similar size to \(C(Q_0)\), with the exception of the highest \(m_{e\mu bb}\) region, in which they reach about 1.1 at low \(Q_0\).

Table 12 Values of \(\chi ^2\) for the comparison of the measured gap fraction distributions with the predictions from various \(t\bar{t}\) generator configurations, for the four rapidity regions as a function of \(Q_0\). The \(\chi ^2\) and p-values correspond to 18 degrees of freedom
Table 13 Values of \(\chi ^2\) for the comparison of the measured gap fraction distributions with the predictions from various \(t\bar{t}\) generator configurations, for the four rapidity regions as a function of \(Q_{{\text {sum}}}\). The \(\chi ^2\) and p-values correspond to 22 degrees of freedom
Table 14 Measurements of \(\chi ^2\) comparing the measured gap fraction distributions with predictions from various \(t\bar{t}\) generator configurations, for the four invariant mass \(m_{e\mu bb}\) regions as a function of \(Q_0\). The \(\chi ^2\) and p-values correspond to 18 degrees of freedom
Table 15 Measurements of \(\chi ^2\) comparing the measured gap fraction distributions with predictions from various \(t\bar{t}\) generator configurations, for the four invariant mass \(m_{e\mu bb}\) regions as a function of \(Q_{{\text {sum}}}\). The \(\chi ^2\) and p-values correspond to 22 degrees of freedom
Fig. 13
figure 13

Distribution of the reconstructed invariant mass of the \(e\mu +2~b\text {-}\mathrm{{jets}}\) system \(m_{e\mu bb}\) in data, compared to simulation. The shaded band represents the statistical uncertainty in data. The lower plot shows the ratio of the distribution of invariant mass in simulation compared to data

Figures 14 and 15 show the measured gap fractions as a function of \(Q_0\) in the four \(m_{e\mu bb}\) regions in data, compared to the same set of predictions as shown in Figs. 9 and 10. Tables 14 and 15 give the \(\chi ^2\) and p-values taking into account bin-by-bin correlations of the gap fractions compared to the predictions from the different generators. Figure 16 gives an alternative presentation of the gap fraction \(f_{m}(Q_0)\) as a function of \(m_{e\mu bb}\) for four different \(Q_0\) values. The level of agreement between the data and the various predictions is consistent with the results of the gap fraction in rapidity bins. Only in the lowest mass region the Powheg+Pythia8 prediction agrees very well, while MG5_aMC@NLO+Herwig++ and Sherpa are at the lower edge of the uncertainties.

Fig. 14
figure 14

The measured gap fraction \(f_{m}(Q_0)\) as a function of \(Q_0\) in the full central veto region \(|y|<2.1\) for the invariant mass regions a \(m_{e\mu bb}<300\) \(\text {GeV}\), b \(300~\text {GeV}<m_{e\mu bb}<425\) \(\text {GeV}\), c \(425~\text {GeV}<m_{e\mu bb}<600\) \(\text {GeV}\) and d \(m_{e\mu bb}>600\) \(\text {GeV}\). The data are shown by the points with error bars indicating the total uncertainty, and compared to the predictions from various \(t\bar{t}\) simulation samples shown as smooth curves. The lower plots show the ratio of predictions to data, with the data uncertainty indicated by the shaded band, and the \(Q_0\) thresholds corresponding to the left edges of the histogram bins, except for the first bin

Fig. 15
figure 15

Ratios of prediction to data of the measured gap fraction \(f^{{\text {part}}}_{{\text {gap}}}(Q_0)\) as a function of \(Q_0\) in the full central veto region \(|y|<2.1\) for the invariant mass regions a \(m_{e\mu bb}<300\) \(\text {GeV}\) and b 425 \(\text {GeV}\) \(<m_{e\mu bb}<600\) \(\text {GeV}\). The predictions from various \(t\bar{t}\) simulation samples are shown as ratios to data, with the data uncertainty indicated by the shaded band, and the \(Q_0\) thresholds corresponding to the left edges of the histogram bins, except for the first bin

Fig. 16
figure 16

The gap fraction measurement \(f_{m}(Q_0)\) as a function of the invariant mass \(m_{e\mu bb}\), for several different values of \(Q_0\). The data are shown as points with error bars indicating the statistical uncertainties and shaded boxes the total uncertainties. The data are compared to the predictions from various \(t\bar{t}\) simulation samples

10 Conclusions

Studies of additional jet activity, using differential cross-section and gap fraction measurements, are presented for dileptonic \(t\bar{t}\) events identified by the presence of an opposite-sign \(e\mu \) pair and at least two b-tagged jets. These measurements are performed using 3.2 \(\mathrm{fb}^{-1}\) of \(\sqrt{s}=13\) \(\text {TeV}\) pp collision data collected by the ATLAS detector in 2015 at the LHC. The measurements are corrected back to the particle level using full unfolding or correction factors, for well-defined fiducial regions and various \(p_{\text {T}}\) thresholds for the additional jets.

The different measurements are compared to various Monte Carlo predictions and give consistent results. Even though many predictions are within the uncertainty band of the measurements, the proper evaluation of the compatibility of the models, taking into account the bin-by-bin correlations within each measurement, revealed that Powheg+Pythia6 (RadHi), MG5_aMC@NLO+Herwig++ and Sherpa describe the data best for all observables. Powheg+Pythia6 (RadLo), MG5_aMC@NLO+Pythia8 and all predictions involving Herwig7 do not describe the data well.

All studied combinations of the matrix element generators MG5_aMC@NLO and Powheg with the shower generators Herwig++, Pythia6 and Pythia8 provided no systematic trend indicating that one of the matrix element generators describes the data better for all parton shower generators. We also have no indication that one of the parton shower generators describes the data systematically better for both matrix element generators. This observation suggests that the matching between the parton shower and matrix element calculation plays an important role, and motivates further study in this area. The predictions of Sherpa which use NLO matrix elements consistently matched with up to four additional jets at LO show similar good agreement with data as the best of the MG5_aMC@NLO and Powheg predictions.