1 Introduction

The top quark plays a key role in precision measurements of the standard model (SM) because of its large Yukawa coupling to the Higgs boson. Top quark loops provide the dominant contribution to radiative corrections to the Higgs boson mass, and accurate measurements of both the top quark mass (\(m_{{\text {t}}} \)) and the Higgs boson mass allow consistency tests of the SM [1]. In addition, the decision whether the SM vacuum is stable or meta-stable needs a precise measurement of \(m_{{\text {t}}} \) as the Higgs boson quartic coupling at the Planck scale depends heavily on \(m_{{\text {t}}} \) [2].

The mass of the top quark has been measured with increasing precision using the invariant mass of different combinations of its decay products [3]. The measurements by the Tevatron collaborations lead to a combined value of \(m_{{\text {t}}} =174.30 \pm 0.65\,\text {GeV} \) [4], while the ATLAS and CMS Collaborations measured \(m_{{\text {t}}} =172.84 \pm 0.70\,\text {GeV} \) [5] and \(m_{{\text {t}}} =172.44 \pm 0.49\,\text {GeV} \) [6], respectively, from the combination of their most precise results. In parallel, the theoretical interpretation of the measurements and the uncertainties in the measured top quark mass derived from the modeling of the selected variables has significantly improved  [7,8,9,10,11,12,13].

Since the publication of the CMS measurements [6] for proton-proton (\(\mathrm {p}\)\(\mathrm {p}\)) collisions at center-of-mass energies of 7 and 8\(\,\text {TeV}\) (Run 1), new theoretical models have become available and a data set has been collected at \(\sqrt{s}=13\)\(\,\text {TeV}\) that is larger than the Run 1 data set. At this higher center-of-mass energy, new data and simulated samples are available for this analysis. The method closely follows the strategy of the most precise CMS Run 1 measurement [6]. While the selected final state, the kinematic reconstruction, and mass extraction technique have not changed, the new simulations describe the data better and allow a more refined estimation of the modeling uncertainties. In contrast to the Run 1 analysis, the renormalization and factorization scales in the matrix-element (ME) calculation and the scales in the initial- and final-state parton showers (PS) are now varied separately for the evaluation of systematic effects. In addition, we evaluate the impact of different models of color reconnection that were not available for the Run 1 measurements.

The pair-produced top quarks (\({{\text {t}}\overline{{\text {t}}}}\)) are assumed to decay weakly into \(\mathrm {W}\) bosons and bottom (\({\text {b}}\)) quarks via \({\text {t}}\rightarrow {\text {b}}\mathrm {W}\), with one \(\mathrm {W}\) boson decaying into a muon or electron and its neutrino, and the other into a quark–antiquark (\({\text {q}} \overline{{\text {q}}} ^\prime \)) pair. Hence, the minimal final state consists of a muon or electron, at least four jets, and one undetected neutrino. This includes events where a muon or electron from a \(\tau \) lepton decay passes the selection criteria. The analysis employs a kinematic fit of the decay products to a \({{\text {t}}\overline{{\text {t}}}}\) hypothesis and two-dimensional likelihood functions for each event to estimate simultaneously the top quark mass and a scale factor (JSF) to be applied to the momenta of all jets. The invariant mass of the two jets associated with the \(\mathrm {W}\rightarrow {\text {q}} \overline{{\text {q}}} ^\prime \) decay serves as an observable in the likelihood functions to estimate the JSF directly, exploiting the precise knowledge of the \(\mathrm {W}\) boson mass from previous measurements [3]. The analysis is performed on the data sample collected in 2016 and includes studies of the dependence of the measured mass value on the kinematic properties of the events.

2 The CMS detector and event reconstruction

The central feature of the CMS apparatus is a superconducting solenoid of 6\(\text { m}\) internal diameter, providing a magnetic field of 3.8\(\text { T}\). Within the solenoid volume are a silicon pixel and strip tracker, a lead tungstate crystal electromagnetic calorimeter (ECAL), and a brass and scintillator hadron calorimeter (HCAL), each composed of a barrel and two endcap sections. Forward calorimeters extend the pseudorapidity (\(\eta \)) coverage provided by the barrel and endcap detectors. Muons are detected in gas-ionization chambers embedded in the steel flux-return yoke outside the solenoid. A more detailed description of the CMS detector, together with a definition of the coordinate system used and the relevant kinematic variables, can be found in Ref. [14].

The particle-flow event algorithm [15] reconstructs and identifies each individual particle with an optimized combination of information from the various elements of the CMS detector. The energy of photons is directly obtained from the ECAL measurement, corrected for zero-suppression effects. The energy of electrons is determined from a combination of the electron momentum at the primary interaction vertex as determined by the tracker, the energy of the corresponding ECAL cluster, and the energy sum of all bremsstrahlung photons spatially compatible with originating from the electron track. The energy of muons is obtained from the curvature of the corresponding track. The energy of charged hadrons is determined from a combination of their momentum measured in the tracker and the matching ECAL and HCAL energy deposits, corrected for zero-suppression effects and for the response function of the calorimeters to hadronic showers. Finally, the energy of neutral hadrons is obtained from the corresponding corrected ECAL and HCAL energy.

The missing transverse momentum \({\vec {p}}_{\mathrm {T}}^{\text {miss}}\) is calculated as the negative of the vectorial sum of transverse momenta (\(p_{\mathrm {T}}\)) of all particle-flow objects in the event. Jets are clustered from particle-flow objects using the anti-\(k_{\mathrm {T}}\) algorithm with a distance parameter of 0.4 [16,17,18]. The jet momentum is determined as the vectorial sum of all particle momenta in the jet, and is found from simulation to be within 5 to 10% of the true momentum over the whole \(p_{\mathrm {T}}\) spectrum and detector acceptance. An offset correction is applied to jet energies to take into account the contribution from additional \(\mathrm {p}\)\(\mathrm {p}\) interactions within the same or nearby bunch crossings (pileup) [19]. All jets are corrected by jet energy corrections (JECs) based on simulations. Residual JECs which are derived from the energy balance in \(\gamma \)/\({\text {Z}}\) boson + jet, dijet, and multijet events [20] are applied to the jets in data. The JECs are also propagated to improve the measurement of \({\vec {p}}_{\mathrm {T}}^{\text {miss}}\). The reconstructed vertex with the largest value of summed physics-object \(p_{\mathrm {T}} ^2\) is taken to be the primary \(\mathrm {p}\mathrm {p}\) interaction vertex. The physics objects chosen are those that have been defined using information from the tracking detector, including jets, \({\vec {p}}_{\mathrm {T}}^{\text {miss}}\), and charged leptons. Additional selection criteria are applied to each event to remove spurious jet-like features originating from isolated noise patterns in certain HCAL regions [21].

3 Data samples, event generation, and selection

The data sample collected with the CMS detector during 2016 at a center-of-mass energy \(\sqrt{s} = 13\)\(\,\text {TeV}\) has been analyzed. This corresponds to an integrated luminosity of \(35.9 \pm 0.9\)\(\,\text {fb}^{-1}\)  [22]. Events are required to pass a single-muon trigger with a minimum threshold on the \(p_{\mathrm {T}}\) of an isolated muon of 24\(\,\text {GeV}\) or a single-electron trigger with a \(p_{\mathrm {T}}\) threshold for isolated electrons of 32\(\,\text {GeV}\).

Fig. 1
figure 1

Invariant mass \(m_\mathrm {W}^\text {reco}\) of the two untagged jets (left) and invariant mass \(m_{{\text {t}}} ^\text {reco}\) of the two untagged jets and one of the \({\text {b}}\)-tagged jets (right) after the \({\text {b}}\) tagging requirement. For the simulated \({{\text {t}}\overline{{\text {t}}}}\) events, the jet-parton assignments are classified as correct, wrong, and unmatched permutations as described in the text. The vertical bars show the statistical uncertainty on the data and the hatched bands show the systematic uncertainties considered in Sect. 5. The lower portion of each panel shows the ratio of the yields between data and the simulation. The simulations are normalized to the integrated luminosity

Simulated \({{\text {t}}\overline{{\text {t}}}}\) signal events are generated at next-to-leading order (NLO) with powheg v2 [23,24,25,26] and the pythia  8.219 PS generator [27] using the CUETP8M2T4 tune [28, 29] for seven different top quark mass values of 166.5, 169.5, 171.5, 172.5, 173.5, 175.5, and 178.5\(\,\text {GeV}\). The single top quark background is also simulated using powheg  v2 [30, 31] interfaced with pythia  8. The background stemming from single vector boson production is generated at leading order (LO) or NLO with MadGraph 5_amc@nlo v2.2.2 [32] matched to the pythia  8 PS using the MLM prescription [33] for \(\mathrm {W}\)+jets and the FxFx prescription [34] for \({\text {Z}}\) +jets, respectively. Finally, diboson (\(\mathrm {W}\)\(\mathrm {W}\), \(\mathrm {W}\)\({\text {Z}}\), and \({\text {Z}}\)\({\text {Z}}\)) and multijet events from quantum chromodynamics (QCD) processes are generated with pythia  8 for ME generation, PS simulation, and hadronization. These background samples use the pythia  8 tune CUETP8M1. The parton distribution function (PDF) set NNPDF3.0 NLO derived with the strong coupling strength \(\alpha _S =0.118\) [35] and its corresponding LO version are used as the default parametrization of the PDFs in all simulations, respectively. The samples are normalized to the theoretical predictions described in Refs. [27, 36,37,38,39]. All events are further processed by a full simulation of the CMS detector based on Geant4  [40]. The simulation includes effects of pileup with the same multiplicity distribution as in data. The response and the resolution of simulated jets is corrected to match the data [20].

We select events that have exactly one isolated muon with \(p_{\mathrm {T}} >26\,\text {GeV} \) and \(|\eta | <2.4\) or exactly one isolated electron with \(p_{\mathrm {T}} >34\,\text {GeV} \) and \(|\eta |<2.1\) [41, 42]. The isolation of a lepton candidate from nearby jet activity is evaluated from the sum of the pileup-corrected \(p_{\mathrm {T}}\) of neutral hadrons, charged hadrons, and photon PF candidates within a cone of \(\varDelta R = \sqrt{\smash [b]{(\varDelta \eta )^2+(\varDelta \phi )^2}} = 0.4\) for muons and \(\varDelta R = 0.3\) for electrons. Here \(\varDelta \eta \) and \(\varDelta \phi \) are the differences in the pseudorapidity and azimuthal angles (in radians) between the particles and the lepton candidate. The sum of the \(p_{\mathrm {T}}\) of the particles is required to be less than 15% of the muon \(p_{\mathrm {T}}\) and 10% of the electron \(p_{\mathrm {T}}\), respectively.

In addition, at least four jets with \(p_{\mathrm {T}} >30\,\text {GeV} \) and \(|\eta | <2.4\) are required. Only the four leading among the jets passing these \(p_{\mathrm {T}} \)- and \(\eta \)-criteria are used in the reconstruction of the \({{\text {t}}\overline{{\text {t}}}}\) system. Jets originating from \({\text {b}}\) quarks are identified (tagged) using an algorithm that combines reconstructed secondary vertices and track-based lifetime information. This has an efficiency of approximately 70% and a mistagging probability for light-quark and gluon jets of 1% [43]. We require exactly two \({\text {b}}\)-tagged jets among the four leading ones and select 669 109 \({{\text {t}}\overline{{\text {t}}}}\) candidate events in data. Figure 1 shows the distributions of the reconstructed mass \(m_\mathrm {W}^\text {reco}\) of the \(\mathrm {W}\) boson decaying to a \({\text {q}} \overline{{\text {q}}} ^\prime \) pair and the masses \(m_{{\text {t}}} ^\text {reco}\) computed from the two untagged jets and each of the two \({\text {b}}\)-tagged jets at this selection step. For simulated \({{\text {t}}\overline{{\text {t}}}}\) events, the parton-jet assignments can be classified as correct permutations (cp), wrong permutations (wp), and unmatched permutations (un), where, in the latter, at least one quark from the \({{\text {t}}\overline{{\text {t}}}}\) decay is not unambiguously matched within a distance of \(\varDelta R <0.4\) to any of the four selected jets.

To check the compatibility of an event with the \({{\text {t}}\overline{{\text {t}}}}\) hypothesis, and to improve the resolution of the reconstructed quantities, a kinematic fit [44] is performed. For each event, the inputs to the algorithm are the four-momenta of the lepton and of the four leading jets, \({\vec {p}}_{\mathrm {T}}^{\text {miss}}\), and the resolutions of these variables. The fit constrains these quantities to the hypothesis that two heavy particles of equal mass are produced, each one decaying to a bottom quark and a \(\mathrm {W}\) boson, with the invariant mass of the latter constrained to 80.4\(\,\text {GeV}\). The kinematic fit then minimizes \(\chi ^{2} \equiv \left( \mathbf {x}-\mathbf {x}^{m}\right) ^\mathrm {T}G\left( \mathbf {x}-\mathbf {x}^{m}\right) \) where \(\mathbf {x}^{m}\) and \(\mathbf {x}\) are the vectors of the measured and fitted momenta, respectively, and G is the inverse covariance matrix which is constructed from the uncertainties in the measured momenta. The two \({\text {b}}\)-tagged jets are candidates for the \({\text {b}}\) quarks in the \({{\text {t}}\overline{{\text {t}}}}\) hypothesis, while the two untagged jets serve as candidates for the light quarks from the hadronically decaying \(\mathrm {W}\) boson. This leads to two possible parton-jet assignments with two solutions for the longitudinal component of the neutrino momentum each, resulting in four different permutations per event.

To increase the fraction of correct permutations, we require the goodness-of-fit (gof) probability for the kinematic fit with two degrees of freedom \(P_\text {gof} = \exp \left( -\chi ^{2}/2\right) \) to be at least 0.2. This requirement selects 161 496 events in data, while the non-\({{\text {t}}\overline{{\text {t}}}}\) background in the simulated data is reduced from 7.6 to 4.3%. The remaining background consists mostly of single top quark events (2.5%). Any of the four permutations in an event that passes the selection criteria is weighted by its \(P_\text {gof}\) value and is used in the measurement. These steps improve the fraction of correct permutations from 14.9 to 48.0%. Figure 2 shows the final distributions after the \(P_\text {gof}\) selection of the reconstructed mass \(m_\mathrm {W}^\text {reco}\) of the \(\mathrm {W}\) boson decaying to a \({\text {q}} \overline{{\text {q}}} ^\prime \) pair and the invariant mass of the top quark candidates from the kinematic fit \(m_{{\text {t}}} ^\text {fit}\) for all selected permutations. These two observables are used in the mass extraction.

Fig. 2
figure 2

Reconstructed \(\mathrm {W}\) boson masses \(m_\mathrm {W}^\text {reco}\) (left) and fitted top quark masses \(m_{{\text {t}}} ^\text {fit}\) (right) after the goodness-of-fit selection and the weighting by \(P_\text {gof}\). Symbols and patterns are the same as in Fig. 1. The simulations are normalized to the integrated luminosity

4 Ideogram method

An ideogram method [45] is employed as described in Ref. [46]. The details of the procedure outlined below are identical with the approach taken in the Run 1 CMS measurement [6]. The observable used to measure \(m_{{\text {t}}} \) is the mass \(m_{{\text {t}}} ^\text {fit}\) evaluated after applying the kinematic fit. We take the reconstructed \(\mathrm {W}\) boson mass \(m_\mathrm {W}^\text {reco}\), before it is constrained by the kinematic fit, as an estimator for measuring the JSF to be applied in addition to the standard CMS JECs. The top quark mass and the JSF are determined simultaneously in a likelihood fit to the selected permutations, in order to reduce the uncertainty from the JECs.

The distributions of \(m_{{\text {t}}} ^\text {fit}\) and \(m_\mathrm {W}^\text {reco}\) are obtained from simulation for seven different \(m_{{\text {t}}} \) and five different JSF values. From these distributions, probability density functions \(P_{j}\) are derived separately for the different permutation cases j: cp, wp, or un. These functions depend on \(m_{{\text {t}}} \) and the JSF and are labeled \(P_{j}(m_{{\text {t}},i}^\text {fit}|m_{{\text {t}}},\text {JSF})\) and \(P_{j}(m_{\mathrm {W}, i}^\text {reco}|m_{{\text {t}}},\text {JSF})\), respectively, for the ith permutation of an event in the final likelihood. The observables \(m_{{\text {t}}} ^\text {fit}\) and \(m_\mathrm {W}^\text {reco}\) have a correlation coefficient with a size below 5% for each permutation case and are treated as uncorrelated. The most likely \(m_{{\text {t}}} \) and JSF values are obtained by minimizing \(-2\ln \left[ \mathcal {L}\left( \text {sample} | m_{{\text {t}}},\text {JSF} \right) \right] \). With an additional prior \(P(\text {JSF})\), the likelihood \(\mathcal {L}\left( \text {sample} | m_{{\text {t}}},\text {JSF}\right) \) is defined as:

$$\begin{aligned}&\mathcal {L}\left( \text {sample} | m_{{\text {t}}},\text {JSF} \right) = P(\text {JSF}) \,\prod _\text {events}\left( \phantom {\left[ \sum _{j}f_{j} \,P_{j}(m_{{\text {t}},i}^\text {fit}|m_{{\text {t}}},\text {JSF}) \,P_{j}(m_{\mathrm {W},i}^\text {reco}|m_{{\text {t}}},\text {JSF}) \right] }\sum _{i=1}^{n} P_\text {gof}\left( i\right) \right. \\&\qquad \left. \times \left[ \sum _{j}f_{j} \,P_{j}(m_{{\text {t}},i}^\text {fit}|m_{{\text {t}}},\text {JSF}) \,P_{j}(m_{\mathrm {W},i}^\text {reco}|m_{{\text {t}}},\text {JSF}) \right] \right) ^{w_\text {evt}}, \end{aligned}$$

where n denotes the number of the at-most four permutations in each event, j labels the permutation cases, and \(f_j\) represents their relative fractions. The event weight \(w_\text {evt}=c\,\sum _{i=1}^{n}P_\text {gof}\left( i\right) \) is introduced to reduce the impact of events without correct permutations, where c normalizes the average \(w_\text {evt}\) to 1.

Different choices are made for the prior \(P(\text {JSF})\) in the likelihood fit. When the JSF is fixed to unity, the \(P_{j}(m_{\mathrm {W},i}^\text {reco}|m_{{\text {t}}},\text {JSF})\) can be approximated by a constant as they hardly depend on \(m_{{\text {t}}}\). Hence, only the \(m_{{\text {t}}} ^\text {fit}\) observable is fit, and this approach is called the 1D analysis. The approach with an unconstrained JSF is called the 2D analysis. Finally, in the hybrid analysis, the prior \(P(\text {JSF})\) is a Gaussian centered at 1.0. Its width depends on the relative weight \(w_\text {hyb}\) that is assigned to the prior knowledge on the JSF, \(\sigma _{\text {prior}} = \delta \text {JSF}^{\text {2D}}_{\text {stat}} \sqrt{\smash [b]{1/w_\text {hyb} - 1}}\), where \(\delta \text {JSF}^{\text {2D}}_{\text {stat}}\) is the statistical uncertainty in the 2D result of the JSF. The optimal value of \(w_\text {hyb}\) is determined from the uncertainties in the 2D analysis and discussed in Sect. 5.

The 2D method is separately calibrated for the muon and electron channel by conducting 10,000 pseudo-experiments for each combination of the seven top quark masses and the five \(\text {JSF}\) values, using simulated \({{\text {t}}\overline{{\text {t}}}}\) and background events. We correct for deviations between the extracted mass and JSF and their input values. This bias correction amounts for the mass to an offset of 0.5\(\,\text {GeV}\) for an expected value of 172.5\(\,\text {GeV}\), with a slope of 3%. Corrections for the statistical uncertainty of the method are derived from the widths of the corresponding pull distributions and have a size of 5% for both the mass and the \(\text {JSF}\).

5 Systematic uncertainties

The systematic uncertainties in the final measurement are determined from pseudo-experiments. Taking into account new simulations, more variations of the modeling of the \({{\text {t}}\overline{{\text {t}}}}\) events are investigated than in the Run 1 analysis [6]. The scales used for the simulation of initial-state radiation (ISR) and final-state radiation (FSR) are varied independently from the renormalization and factorization scales. Furthermore, the effects of early resonance decays and alternative color-reconnection models [47, 48] are evaluated, while in Run 1 only the effect of an underlying event tune without color reconnection was studied. The relevant systematic uncertainties and the methods used to evaluate them are described below.

Method calibration: We consider the quadratic sum of statistical uncertainty and residual biases after the calibration of the ideogram method as a systematic uncertainty.

>JECs: As we measure a global JSF, we have to take into account the influence of the \(p_{\mathrm {T}}\)- and \(\eta \)-dependent JEC uncertainties. This is done by scaling the energies of all jets up and down according to their individual uncertainties [20], split into correlation groups (called InterCalibration, MPFInSitu and Uncorrelated) similarly to the procedure adopted at 8\(\,\text {TeV}\)  [49].

Jet energy resolution: The jet energy resolution (JER) in simulation is slightly degraded to match the resolutions measured in data [20]. To account for the resolution uncertainty, the JER in the simulation is modified by \({ \pm }1\) standard deviation with respect to the degraded resolution.

\({{\text {b}} {}}\)tagging: The events are weighted to account for the \(p_{\mathrm {T}}\)-dependent uncertainty of the \({\text {b}}\) tagging efficiencies and misidentification rates of the \({\text {b}}\) tagging algorithm [43].

Pileup: To estimate the uncertainties associated with the determination of the number of pileup events and with the weighting procedure, the inelastic \(\mathrm {p}\)\(\mathrm {p}\) cross section is varied by \({ \pm }4.6\)% for all simulations.

Non-\({{{\text {t}}\overline{{\text {t}}}}}\)background: The main uncertainty in the non-\({{\text {t}}\overline{{\text {t}}}}\) background stems from the uncertainty in the measurements of the cross sections used in the normalization. The normalization of the background samples is varied by \( \pm \)10% for the single top quark samples [50, 51], \( \pm \)30% for the \(\mathrm {W}\)+jets samples [52], \( \pm \)10% for the \({\text {Z}}\) +jets [53] and for the diboson samples [54, 55], and \( \pm \)100% for the QCD multijet samples. The uncertainty in the luminosity of 2.5% [22] is negligible compared to these variations.

JEC Flavor: The Lund string fragmentation implemented in pythia  6.422 [56] is compared to the cluster fragmentation of herwig++  2.4 [57]. Each model relies on a large set of tuning parameters that allow to modify the individual fragmentation of jets initiated from gluons, light quarks, and \({\text {b}}\) quarks. Therefore, the difference in jet energy response between pythia 6 and herwig++ is determined for each jet flavor [20]. In order to evaluate possible differences between the measured JSF (from light quarks with gluon contamination) and the \({\text {b}}\) jet energy scale, the flavor uncertainties for jets from light quarks, gluons, and bottom quarks are evaluated separately and added linearly.

\({{\text {b}} {}}\)jet modeling: This term has three components: The fragmentation into \({\text {b}}\) hadrons is varied in simulation within the uncertainties of the Bowler–Lund fragmentation function tuned to ALEPH [58] and DELPHI [59] data. In addition, the difference between the Bowler–Lund [60] and the Peterson [61] fragmentation functions is included in the uncertainty. Lastly, the uncertainty from the semileptonic \({\text {b}}\) hadron branching fraction is obtained by varying it by \(-\,0.45\)% and \(+\,0.77\)%, which is the range of the measurements from \({\mathrm {B}^0}\)/\({\mathrm {B}^{+}}\) decays and their uncertainties [3].

PDFs: The NNPDF3.0 NLO (\(\alpha _S =0.118\)) PDF is used in the generation of simulated events. We calculate the results with the different PDF replicas and use the variance of these predictions for the PDF uncertainty [35]. In addition, NNPDF3.0 sets with \(\alpha _S = 0.117\) and 0.119 are evaluated and the observed difference is added in quadrature [62,63,64].

Renormalization and factorization scales: The simulated events are weighted to match the event shape distributions generated with different renormalization and factorization scales. These scales are varied independently from each other by a factor of 0.5 and 2.

ME/PS matching: The model parameter \(h_{\text {damp}}=1.58^{+0.66}_{-0.59}\) [29] used in powheg to control the matching of the MEs to the pythia  8 PS is varied within its uncertainties.

ME generator: The influence of the NLO ME generator and its matching to the PS generator is estimated by using a sample from the NLO generator MadGraph 5_amc@nlo with FxFx matching [34], instead of the powheg  v2 generator used as default.

Table 1 Observed shifts with respect to the default simulation for different models of color reconnection. The “QCD inspired” and “gluon move” models are compared to the default model with ERDs. The statistical uncertainty in the JSF shifts is 0.1%
Table 2 Observed shifts with respect to the default simulation for different generator setups. The statistical uncertainty in the JSF shifts is 0.1%

ISR PS scale: The PS scale value used for the simulation of ISR in pythia  8 is scaled up by 2 and down by 0.5 in dedicated samples.

FSR PS scale: The PS scale value used for the simulation of FSR in pythia  8 is scaled up by \(\sqrt{2}\) and down by \(1/\sqrt{2}\) [28] in dedicated samples. This affects the fragmentation and hadronization of the jets initiated by the ME calculation, as well as the emission of extra jets. In the FSR samples, the jet energy response of the light quarks is observed to differ by \( \pm 1.2\)% compared to the response of the default sample. This response difference would be absorbed in the residual JECs if the corrections were derived based on \(\gamma \)/\({\text {Z}}\) +jet simulations with the same PS scale. Hence, the momenta of all jets in the varied samples are scaled so that the energy response for jets induced by light quarks agrees with the default sample.

Top quark\(p_{\mathrm {T}} \): Recent calculations [65] suggest that next-to-next-to-leading-order effects have an important impact on the top quark \(p_{\mathrm {T}}\) spectrum, that NLO ME generators are unable to reproduce. Therefore, the top quark \(p_{\mathrm {T}}\) in simulation is varied to match the distribution measured by CMS [66, 67]. The observed difference with respect to the default sample is quoted as a systematic uncertainty.

Underlying event: The modeling of multiple-parton interactions in pythia  8 is tuned to measurements of the underlying event [28, 29]. The parameters of the tune are varied within their uncertainties in the simulation of the \({{\text {t}}\overline{{\text {t}}}}\) signal.

Early resonance decays: By enabling early resonance decays (ERDs) in pythia  8, color reconnections can happen between particles from the top quark decay and particles from the underlying event. In the default sample the ERDs are turned off and the top quark decay products do not interact with the underlying event. The influence of the ERD setting is estimated from a sample with ERDs enabled in pythia  8.

Color reconnection: The uncertainties that arise from ambiguities in modeling color-reconnection effects are estimated by comparing the default model in pythia  8 with ERDs to two alternative models of color reconnection, a model with string formation beyond leading color (“QCD inspired”) [48] and a model that allows gluons to be moved to another string (“gluon move”) [47]. All models are tuned to measurements of the underlying event [28, 68]. The observed shifts are listed in Table 1. Among the two approaches, the “gluon move” model leads to larger shifts and these are quoted as the systematic uncertainty.

The modeling uncertainties are mainly evaluated by varying the parameters within one model: powheg  v2 + pythia  8 with the CUETP8M2T4 tune (labeled as powhegp8 M2T4). This approach benefits from the calibration of the reconstructed physics objects which is derived from data with pythia  8 as a reference. Three alternative models of the \({{\text {t}}\overline{{\text {t}}}}\) signal are studied. The NLO MadGraph 5_amc@nlo generator with the FxFx matching [34] (labeled as MG5 p8 [FxFx] M2T4) and the LO MadGraph 5_amc@nlo with the MLM matching [33] (labeled as MG5 p8 [MLM] M1) are both interfaced with pythia  8 with the CUETP8M2T4 and the CUETP8M1 tune, respectively. In addition, powheg  v2 interfaced with herwig++  [57] (v2.7.1) with the tune EE5C [69] (labeled as powhegh++ EE5C) is evaluated. ME corrections to the top quark decay are not applied in the herwig++ sample. A dedicated analysis has found that MG5 p8 [MLM] M1 and powhegh++ EE5C do not describe the data well [29, 70] and only the NLO MG5 p8 [FxFx] M2T4 model is used in the evaluation of the systematic uncertainties.

Nevertheless, the analysis is also performed on pseudo-experiments where the \({{\text {t}}\overline{{\text {t}}}}\) signal stems from these different generator setups. This yields rather large shifts for the two discarded models. The results are summarized in Table 2. The shift for powhegh++ EE5C would translate into a 4\(\,\text {GeV}\) higher measurement of \(m_{{\text {t}}} \) if this setup were used as the default \({{\text {t}}\overline{{\text {t}}}}\) simulation and not as signal in the pseudo-data. The agreement of these generator setups and the color-reconnection models with data are studied in Sect. 7 for this top quark mass measurement.

The contributions from the different sources of systematic uncertainties are shown in Table 3. In general, the absolute value of the largest observed shifts in \(m_{{\text {t}}} \) and JSF, determined by changing the parameters by \( \pm \)1 standard deviation (\(\sigma \)), are assigned as systematic uncertainties. The only exception to this is if the statistical uncertainty in the observed shift is larger than the value of the calculated shift. In this case the statistical uncertainty is taken as the best estimate of the uncertainty in the parameter. The signs in the table are taken from the \(+1\sigma \) shift in the value of the uncertainty source where applicable.

Table 3 List of systematic uncertainties for the fits to the combined data set using the procedures described in Sect. 5. With the exception of the flavor-dependent JEC terms, the total systematic uncertainty is obtained from the sum in quadrature of the individual systematic uncertainties. The values in parentheses with indented labels are already included in the preceding uncertainty source. A positive sign indicates an increase in the value of \(m_{{\text {t}}}\) or the JSF in response to a \(+1\sigma \) shift and a negative sign indicates a decrease. The statistical uncertainty in the shift in \(m_{{\text {t}}} \) is given when different samples are compared. The statistical uncertainty in the JSF shifts is 0.1% for these sources

The details of the fitting procedure have several consequences on the uncertainties. The inclusion of the JSF as a nuisance parameter in the fit and its constraint by the \(m_\mathrm {W}^\text {reco}\) observable reduces not only the uncertainties stemming from the JECs, but also the modeling uncertainties. As the JSF is an overall energy scale factor derived mainly on light-quark jets and applied to all jets, this approach cannot reduce the uncertainties on the flavor-dependent JECs. The other remaining systematic uncertainties are also dominated by effects that cannot be fully compensated through the simultaneous determination of \(m_{{\text {t}}} \) and JSF, i.e., the \(m_{{\text {t}}} ^\text {fit}\) observable is affected differently from \(m_\mathrm {W}^\text {reco}\). For the hybrid analysis, a hybrid weight of \(w_\text {hyb}=0.3\) is found optimal based on the total uncertainty in the 2D result of the JSF and the jet energy scale uncertainty in the JECs. Due to the larger jet energy uncertainties at the beginning of the 13\(\,\text {TeV}\) data taking, \(w_\text {hyb}\) is lower than in the Run 1 analysis [6] where the prior JSF knowledge contributes 50% of the information. With an expected statistical uncertainty \(\delta \text {JSF}^{\text {2D}}_{\text {stat}} = 0.08\%\) on the JSF for the 2D analysis, the width of the prior is \(\sigma _{\text {prior}} = 0.12\%\). The hybrid analysis leads to further reduced uncertainties in the FSR PS scale and in ERDs compared to the 2D analysis. This stems from the opposite signs of the observed shifts in \(m_{{\text {t}}}\) for the 1D and 2D analyses, i.e., the JSF from the 2D analysis overcompensates the effects on \(m_{{\text {t}}} ^\text {fit}\).

6 Results

The 2D fit to the selected lepton+jets events yields:

$$\begin{aligned} m_{{\text {t}}} ^{\text {2D}}= & {} 172.40 \pm 0.09\,\text {(stat+JSF)} \pm 0.75\,\text {(syst)} \,\text {GeV},\\ \mathrm {JSF}^{\text {2D}}= & {} 0.994 \pm 0.001\,\text {(stat)} \pm 0.011\,\text {(syst)}. \end{aligned}$$

As the top quark mass and the JSF are measured simultaneously, the statistical uncertainty in \(m_{{\text {t}}} \) originates from both quantities of interest. The measured unconstrained JSF is compatible with the one obtained from jets recoiling against photons and \({\text {Z}}\) bosons within its uncertainties.

Separate fits to the 101 992 muon+jets events and the 59 504 electron+jets events give statistically compatible results:

$$\begin{aligned} \mu \text {+jets: } m_{{\text {t}}} ^{\text {2D}}= & {} 172.44 \pm 0.11\,\text {(stat+JSF)} \,\text {GeV},\\ \text {JSF}^{\text {2D}}= & {} 0.995 \pm 0.001\,\text {(stat)},\\ \mathrm {e}\text {+jets: } m_{{\text {t}}} ^{\text {2D}}= & {} 172.32 \pm 0.16\,\text {(stat+JSF)} \,\text {GeV},\\ \text {JSF}^{\text {2D}}= & {} 0.993 \pm 0.001\,\text {(stat)}.\\ \end{aligned}$$

The 1D fit and the hybrid fit with \(w_\text {hyb}=0.3\), as obtained in Sect. 5, yield for the lepton+jets channel:

$$\begin{aligned} m_{{\text {t}}} ^{\text {1D}}= & {} 171.93 \pm 0.06\,\text {(stat)} \pm 1.10\,\text {(syst)} \,\text {GeV},\\ m_{{\text {t}}} ^{\text {hyb}}= & {} 172.25 \pm 0.08\,\text {(stat+JSF)} \pm 0.62\,\text {(syst)} \,\text {GeV},\\ \mathrm {JSF}^{\text {hyb}}= & {} 0.996 \pm 0.001\,\text {(stat)} \pm 0.008\,\text {(syst)}. \end{aligned}$$

The hybrid fit measurement of \(m_{{\text {t}}} = 172.25 \pm 0.08\,\text {(stat+JSF)} \pm 0.62\,\text {(syst)} \,\text {GeV} \) offers the lowest overall uncertainty and, therefore, is chosen as the main result of this study. This is the first published result of the top quark mass measured with Run 2 data and the new NLO generator setups. Because of the larger integrated luminosity and the higher \({{\text {t}}\overline{{\text {t}}}}\) cross section at \(\sqrt{s}=13\,\text {TeV} \), the statistical uncertainty is halved compared to the Run 1 result of \(m_{{\text {t}}} = 172.35 \pm 0.16\,\text {(stat+JSF)} \pm 0.48\,\text {(syst)} \)\(\,\text {GeV}\)  [6]. This measurement is consistent with the Run 1 result within the uncertainties. The previous measurement was calibrated with \({{\text {t}}\overline{{\text {t}}}}\) events generated at LO with MadGraph  5.1.5.11 [71] matched to pythia  6.426 PS [56] with the Z2\(^*\) tune [72] using the MLM prescription. No shift in the measured top quark mass from the new simulation at NLO with powheg  v2 and pythia  8 and the new experimental setup is observed. The systematic uncertainties are larger than for the Run 1 result due to a more advanced treatment of the modeling uncertainties. This is mainly caused by the evaluation of a broader set of color-reconnection models that were not available in Run 1, yielding a more extensive treatment of the associated uncertainty. Without the uncertainty due to these models of \(0.31\,\text {GeV} \), the systematic uncertainties in \(m_{{\text {t}}}\) would be reduced from 0.62 to 0.54\(\,\text {GeV}\) and would be much closer to the Run 1 result. Tighter constraints on the existing color-reconnection models and the settings in the NLO simulations can occur in the near future and reduce the systematic uncertainties due to these specific models. The new treatment of the modeling uncertainties will require special care when combining this measurement with the Run 1 result.

Table 4 Compatibility of different models with the differential measurement of the top quark mass. For each variable and model, the probability of the cumulative \(\chi ^2\) is computed. The setup with powheg  v2 + herwig++ does not use ME corrections to the top quark decay and shows large deviations from the data
Fig. 3
figure 3

Measurements of \(m_{{\text {t}}}\) as a function of the invariant mass of the \({{\text {t}}\overline{{\text {t}}}}\) system \(m_{{{\text {t}}\overline{{\text {t}}}}}\) (upper left), the number of jets \(N_\text {jets}\) (upper right), the pseudorapidity of the \({\text {b}}\) jet assigned to the hadronic decay branch \(|\eta ^{{\text {b}},\text {had}} |\) (lower left) and the \(\varDelta R\) between the light-quark jets \(\varDelta R_{{\text {q}} \overline{{\text {q}}} ^\prime }\) (lower right) compared to different generator models The filled circles represent the data, and the other symbols are for the simulations. For reasons of clarity, the horizontal bars indicating the bin widths are shown only for the data points and each of the simulations is shown as a single offset point with a vertical error bar representing its statistical uncertainty. The statistical uncertainty of the data is displayed by the inner error bars. For the outer error bars, the systematic uncertainties are added in quadrature.

Fig. 4
figure 4

Measurements of \(m_{{\text {t}}}\) as a function of the \(\varDelta R\) between the \({\text {b}}\) jets \(\varDelta R_{{\text {b}} \overline{{\text {b}}} }\) (left) and the light-quark jets \(\varDelta R_{{\text {q}} \overline{{\text {q}}} ^\prime }\) (right) compared to alternative models of color reconnection. The symbols and conventions are the same as in Fig. 3

7 Measured top quark mass as a function of kinematic observables

The modeling of soft and perturbative QCD effects is the main source of systematic uncertainties on the analysis presented here. Differential measurements of \(m_{{\text {t}}}\) as a function of the kinematic properties of the \({{\text {t}}\overline{{\text {t}}}}\) system can be used to validate the different models and to identify possible biases in the measurement. Variables are selected that probe potential effects from color reconnection, ISR and FSR, and the kinematic observables of the jets coming from the top quark decays. They are the transverse momentum of the hadronically decaying top quark (\(p_{\mathrm {T}} ^\mathrm{{t,had}}\)), the invariant mass of the \({{\text {t}}\overline{{\text {t}}}}\) system (\(m_{{{\text {t}}\overline{{\text {t}}}}}\)), the transverse momentum of the \({{\text {t}}\overline{{\text {t}}}}\) system (\(p_{\mathrm {T}} ^{{{\text {t}}\overline{{\text {t}}}}}\)), the number of jets with \(p_{\mathrm {T}} > 30\,\text {GeV} \) (\(N_\text {jets}\)), the \(p_{\mathrm {T}}\) and the pseudorapidity of the \({\text {b}}\) jet assigned to the hadronic decay branch (\(p_{\mathrm {T}} ^{{\text {b}},\text {had}}\) and \(|\eta ^{{\text {b}},\text {had}} |\)), the \(\varDelta R\) between the \({\text {b}}\) jets (\(\varDelta R_{{\text {b}} \overline{{\text {b}}} }\)), and the \(\varDelta R\) between the light-quark jets (\(\varDelta R_{{\text {q}} \overline{{\text {q}}} ^\prime }\)). These are the same variables as in the Run 1 analysis [6].

For each variable, the event sample is divided into three to five bins as a function of the value of this variable, and we populate each bin using all permutations which lie within the bin boundaries. As some variables depend on the parton-jet assignment that cannot be resolved unambiguously, such as the \(p_{\mathrm {T}}\) of a reconstructed top quark, a single event is allowed to contribute to multiple bins. For each bin, \(m_{{\text {t}}}\) is measured using the hybrid likelihood fit with the same probability density functions as for the inclusive measurement. The JSF prior is chosen such that it constrains the measured JSF with the same relative strength. This procedure was also used in the Run 1 analysis [6].

For the modeling of the perturbative QCD effects, the data are compared to the MG5 p8 [FxFx] M2T4, MG5 p8 [MLM] M1, and powhegh++ EE5C setups. For the modeling of color reconnection, the default tune of pythia  8, the “QCD inspired” model [48], and the “gluon move” model [47] are considered. The three latter models are simulated with ERDs in pythia  8.

In these comparisons, the mean value of the measured top quark mass is subtracted from the measurement in each bin of the sample and the results are expressed in the form of offsets \(m_{{\text {t}}}-\left<m_{{\text {t}}} \right>\), where the mean comes from the inclusive measurement on the specific sample. The subtracted offsets with respect to powhegp8 M2T4 can be found in the Tables 1 and 2. To aid in the interpretation of a difference between the value of \(m_{{\text {t}}}-\left<m_{{\text {t}}} \right>\) and the prediction from a simulation in the same bin, a bin-by-bin calibration of the results is applied. This is derived using the powhegp8 M2T4 simulation with the same technique as for the inclusive measurement except that it is performed for each bin separately. The bin-by-bin bias correction for the mass can be much larger than for the inclusive analysis and reaches up to 10\(\,\text {GeV}\) for some bins. For each bin the statistical uncertainty and the dominant systematic uncertainties are combined in quadrature, where the latter include JEC (\(p_{\mathrm {T}}\)-, \(\eta \)-, and flavor-dependent), JER, pileup, \({\text {b}}\) fragmentation, renormalization and factorization scales, ME/PS matching, ISR/FSR PS scales, and the underlying event.

For each variable and model, the cumulative \(\chi ^2\) between the model and the data is computed taking into account the statistical uncertainty in the model prediction and the total uncertainty in the data value. The number of degrees of freedom for each variable is the number of bins minus one as the mean measured top quark mass is subtracted. The resulting \(\chi ^2\) probabilities (p-values) are listed in Table 4.

No significant deviation of the measured \(m_{{\text {t}}}\) is observed for the default generator setup of powhegp8 M2T4 and there is no evidence for a bias in the measurement. Only powhegh++ EE5C differs from data and all other setups for the dependence of the mass measurement on the invariant mass of the \({{\text {t}}\overline{{\text {t}}}}\) system, the \(p_{\mathrm {T}}\) of the \({\text {b}}\) jet assigned to the hadronic decay branch, and the \(\varDelta R\) between the light-quark jets. Figure 3 shows the results for \(m_{{{\text {t}}\overline{{\text {t}}}}}\), \(N_\text {jets}\), \(|\eta ^{ {\text {b}},\text {had}} |\) and \(\varDelta R_{{\text {q}} \overline{{\text {q}}} ^\prime }\) for the different generator setups for the modeling of perturbative QCD. The large deviations confirm that the powheg  v2 + herwig++ setup without ME corrections to the top quark decay needs improvements to describe the data. A bias in the measurement of the top quark mass can be spotted by a failure of the model to reproduce differential measurements. For the color-reconnection models, the \(\varDelta R_{{\text {b}} \overline{{\text {b}}} }\) and \(\varDelta R_{{\text {q}} \overline{{\text {q}}} ^\prime }\) variables should offer the best sensitivity to the modeling of the color flow. The comparison is shown in Fig. 4, but the uncertainties in the measurements are too large to rule out any of the different models.

8 Summary

This study measured the mass of the top quark using the 2016 data at \(\sqrt{s}=13\)\(\,\text {TeV}\) corresponding to an integrated luminosity of 35.9\(\,\text {fb}^{-1}\), and powheg  v2 interfaced with pythia  8 with the CUETP8M2T4 tune for the simulation. The top quark mass is measured to be \(172.25 \pm 0.08\,\text {(stat+JSF)} \pm 0.62\,\text {(syst)} \,\text {GeV} \) from the selected lepton+jets events. The result is consistent with the CMS measurements of Run 1 of the LHC at \(\sqrt{s}=7\) and 8\(\,\text {TeV}\), with no shift observed from the new experimental setup and the use of the next-to-leading-order matrix-element generator and the new parton-shower simulation and tune. Along with the new generator setup, a more advanced treatment of the modeling uncertainties with respect to the Run 1 analysis is employed. In particular, a broader set of color-reconnection models is considered. The top quark mass has also been studied as a function of the event-level kinematic properties, and no indications of a bias in the measurements are observed.