1 Introduction

After the discovery of a Higgs boson at the Large Hadron Collider (LHC) [1, 2], detailed measurements of its properties [3,4,5,6,7,8,9,10] have shown excellent compatibility with the Standard Model (SM) Higgs boson [11,12,13,14,15,16]. These results indicate that the scalar sector of the theory of the electroweak interaction contains at least a doublet of complex scalar fields. In addition, they constrain the possibilities for additional spin-0 field content in the theory and disfavour parts of the parameter space in models with extended Higgs sectors. These results, however, still allow several extensions of the Higgs sector, such as the two-Higgs-doublet model (2HDM) [17, 18], in which large parts of the parameter space are compatible with the existence of a Higgs boson like the one in the SM. In the 2HDM, a second complex doublet of the Higgs fields is added to the single SM Higgs doublet. The model has a weak decoupling limit [19] in which one of its predicted Higgs bosons has couplings to fermions and vector bosons that are the same as those of the SM Higgs boson at lowest order. In addition, a Higgs sector structure with two complex doublets of fields appears in several new physics scenarios, including supersymmetry [20], dark-matter models [21], axion models [22], electroweak baryogenesis [23] and neutrino mass models [24].

Fig. 1
figure 1

Example lowest-order Feynman diagrams for a gluon–gluon fusion production of A bosons decaying into \(ZH\rightarrow \ell \ell bb\), b b-associated production of A bosons decaying into \(ZH\rightarrow \ell \ell bb\), and c gluon–gluon fusion production of A boson decaying into \(ZH\rightarrow \ell \ell WW\)

The addition of a second Higgs doublet leads to five Higgs bosons after electroweak symmetry breaking. The phenomenology of such a model is very rich and depends on many parameters, such as the ratio of the vacuum expectation values of the two Higgs doublets (\(\tan \beta \)) and the Yukawa couplings of the scalar sector [18]. When CP conservation is assumed, the model contains two CP-even Higgs bosons, h and H with \(m_{H} > m_{h}\), one that is CP-odd, A, and two charged scalars, \(H^{\pm }\). There have been many searches for a CP-even Higgs boson at the LHC, in channels that include \(H\rightarrow WW/ZZ\) [25,26,27,28,29,30] and \(H\rightarrow hh\) [31, 32], as well as dedicated searches for the heavy CP-odd Higgs boson, as in the \(A\rightarrow Zh\) channel [33, 34]. Some 2HDM searches are agnostic with respect to whether the heavy Higgs bosons are CP-even or CP-odd, for example searches in the \(A/H\rightarrow \tau \tau /bb\)Footnote 1 channels [35,36,37]. In the interpretation of this last category of channels it is usually assumed that both heavy Higgs bosons are degenerate in mass, a hypothesis that is motivated in certain supersymmetric models [20]. Finally, there have been searches for signatures that explicitly assume different masses for the heavy Higgs bosons, for example searches in the \(A\rightarrow ZH\rightarrow \ell \ell bb\)/\(\ell \ell \tau \tau \) channels [38,39,40].

The case in which the heavy Higgs bosons have different masses, in addition to being in an allowed part of the parameter space, is further motivated by electroweak baryogenesis scenarios in the context of the 2HDM [41,42,43,44]. For 2HDM electroweak baryogenesis to occur, the requirement \(m_A > m_H\) is favoured [41] for a strong first-order phase transition to take place in the early universe. The A boson mass is also constrained to be less than approximately 800 \(\text {GeV}\), whereas the lighter CP-even Higgs boson, h, is required to have properties similar to those of a SM Higgs boson and is assumed to be the Higgs boson with a mass of 125 \(\text {GeV}\) that was discovered at the LHC [41]. Under such conditions and for large parts of the 2HDM parameter space, the CP-odd Higgs boson, A, decays into ZH [41, 45]. At the LHC, the production of the A boson in the relevant 2HDM parameter space proceeds mainly through gluon–gluon fusion and in association with b-quarks (b-associated production).

This search for \(A\rightarrow ZH\) decays uses proton–proton collision data at \(\sqrt{s} = 13\) \(\text {TeV}\) corresponding to an integrated luminosity of 139 fb\(^{-1}\) recorded by the ATLAS detector at the LHC. The search considers \(Z\rightarrow \ell \ell \), where \(\ell =e,\ \mu \), to take advantage of the clean leptonic final state. The H boson is studied in the \(H\rightarrow bb\) and \(H\rightarrow WW\) decay channels. The \(H\rightarrow bb\) channel takes advantage of the high branching ratio in large parts of the 2HDM parameter space, especially in the weak decoupling limit, where the H boson decays into weak vector bosons are suppressed. The \(H\rightarrow WW\) decay channel is considered in the case where both W bosons decay hadronically. This heavy Higgs boson decay is dominant in parts of the 2HDM parameter space close to, but not exactly at, the weak decoupling limit [41] and it provides a new way to look for \(\ell \ell WW\) resonances in a final state that has been less explored by other LHC searches. Both final states considered allow full reconstruction of the A boson’s decay kinematics. This search considers both the gluon–gluon fusion (see Fig. 1a) and b-associated production mechanisms (see Fig. 1b) for the \(A\rightarrow ZH\rightarrow \ell \ell bb\) channel. The b-associated production mode of the \(A\rightarrow ZH\rightarrow \ell \ell WW\) channel is theoretically allowed, but leads to more complicated jet combinatorics and would necessitate changing the event reconstruction strategy. For this reason, only the gluon–gluon fusion production mode (see Fig. 1c) is considered here.

This article is organised as follows. Section 2 introduces the ATLAS detector. A description of the collision and simulated data samples used in this article is given in Sect. 3. The algorithms used to reconstruct the objects used in this search are described in Sect. 4. The event selection and background estimates for the two channels considered and the modelling of the signal are discussed in Sects. 5 and 6, respectively. Section 7 is devoted to the description of the systematic uncertainties. The results are discussed in Sect. 8 and the conclusions are given in Sect. 9.

2 ATLAS detector

The ATLAS experiment [46] at the LHC is a general-purpose particle detector with cylindrical geometry and forward–backward symmetry. It includes an inner-detector tracker surrounded by a 2 T superconducting solenoid, electromagnetic and hadronic calorimeters, and a muon spectrometer with a toroidal magnetic field. The inner detector consists of a high-granularity silicon pixel detector, including the insertable B-layer [47, 48], a silicon microstrip tracker, and a straw-tube tracker. It provides precision tracking of charged particles with pseudorapidity \(|\eta | < 2.5\).Footnote 2 The calorimeter system covers the pseudorapidity range \(|\eta | < 4.9\). It is composed of sampling calorimeters with either lead/liquid-argon, steel/scintillator-tiles, copper/liquid-argon or tungsten/liquid-argon as the absorber/sensitive material. The muon spectrometer provides muon identification and momentum measurement for \(|\eta | < 2.7\). A two-level trigger system [49] is employed to select events to be recorded at an average rate of about 1 kHz for offline analysis.

3 Data and simulated event samples

The data used in this search were collected between 2015 and 2018 from \(\sqrt{s}=13~\text {TeV}\) proton–proton collisions and correspond to an integrated luminosity of 139 fb\(^{-1}\) [50,51,52,53], which includes only data-taking periods where all relevant detector subsystems were operational [54]. The data sample was collected using a set of single-muon [55] and single-electron triggers [56]. The single-muon triggers had \(p_{\text {T}}\) thresholds in the range of 20–26 \(\text {GeV}\) for isolated muons and 50 \(\text {GeV}\) for muons without any isolation requirement. The single-electron triggers employed a range of \(p_{\text {T}}\) thresholds in the range 24–300 \(\text {GeV}\) and a combination of quality and isolation requirements depending on the data-taking period and the \(p_{\text {T}}\) threshold.

Simulated signal events with A bosons produced by gluon–gluon fusion were generated at leading order (LO) with MadGraph5_aMC@NLO 2.3.3 [57, 58], using Pythia 8.210 [59] with a set of tuned parameters called the A14 tune [60] for parton showering. The decays of \(H\rightarrow bb\) and WW were considered. Additionally, in the \(A\rightarrow ZH\rightarrow \ell \ell bb\) channel, A bosons produced in association with b-quarks were generated at next-to-leading-order (NLO) with MadGraph5_aMC@NLO 2.1.2 [58, 61, 62] following Ref. [63] together with Pythia 8.212 and the A14 tune for parton showering. The gluon–gluon fusion production used NNPDF2.3lo  [64] as the parton distribution functions (PDFs), while the b-associated production used CT10nlo_nf4 [65]. The signal samples were generated for A bosons with masses in the range of 230–800 \(\text {GeV}\) (300–800 \(\text {GeV}\)) and widths up to 20% of the A mass, and for narrow-width H bosons with masses in the range of 130–700 \(\text {GeV}\) (200–700 \(\text {GeV}\)) for the \(\ell \ell bb\) (\(\ell \ell WW\)) channel.

Background events from the production of W and Z bosons in association with jets were simulated with Sherpa v2.2.1 [66] using NLO matrix elements (ME) for up to two partons, and LO matrix elements for up to four partons calculated with the Comix [67] and OpenLoops [68, 69] libraries. They were matched with the Sherpa parton shower [70] using the MEPS@NLO prescription [71,72,73,74] using the set of tuned parameters developed by the Sherpa authors. The NNPDF3.0nnloset of PDFs [75] was used and the samples were normalised to a next-to-next-to-leading-order (NNLO) prediction [76]. Production of WW, ZZ and WZ pairs was simulated using the same generator and parameters as for the W and Z boson samples.

The production of \(t\bar{t}\) events was modelled using the Powheg-Box v2  [77,78,79,80] generator at NLO with the NNPDF3.0nlo  [75] PDF set and the \(h_\mathrm {damp}\) parameterFootnote 3 set to 1.5 \(m_\mathrm{top}\) [81]. The events were interfaced to Pythia 8.230 to model the parton shower, hadronisation, and underlying event, with parameters set according to the A14 tune and using the NNPDF2.3lo set of PDFs. The decays of bottom and charm hadrons were performed by EvtGen  v1.6.0 [82]. The associated production of a single top quark and W boson (tW) and single top production in the s-channel were modelled using the Powheg-Box v2  [78,79,80, 83, 84] generator at NLO in QCD using the five-flavour scheme and the NNPDF3.0nlo set of PDFs. The diagram removal scheme [85] was used to remove interference and overlap with \(t\bar{t}\) production in the case of tW production. The production of \(t\bar{t}V\) events was modelled using the MadGraph5_aMC@NLO v2.3.3 generator at NLO with the NNPDF3.0nlo PDF set. The events were interfaced to Pythia 8.210 using the A14 tune and the NNPDF2.3lo PDF set. The decays of bottom and charm hadrons were simulated using the EvtGen  v1.2.0 program.

Finally, SM Higgs boson production in association with a vector boson was simulated using Powheg  [78,79,80, 86] and interfaced with Pythia 8.186 [87] for parton shower and non-perturbative effects. The Powheg prediction is accurate to NLO for the Vh boson plus one jet production. The loop-induced \(gg\rightarrow Zh\) process was generated separately at LO. The PDF4LHC15 PDF set [88] and the AZNLO tune [89] of Pythia 8.186 were used. The simulation prediction was normalised to cross sections calculated at NNLO in QCD with NLO electroweak corrections for \(q\bar{q}/qg \rightarrow Vh\) and at NLO and next-to-leading-logarithm accuracy in QCD for \(gg \rightarrow Zh\) [90,91,92,93,94,95,96].

The effect of multiple interactions in the same and neighbouring bunch crossings (pile-up) was modelled by overlaying the original hard-scattering event with simulated inelastic proton–proton events generated with Pythia 8.186 using the NNPDF2.3lo set of PDFs and the A3 tune [97]. The simulated events were weighted to reproduce the distribution of the average number of interactions per bunch crossing (\(\left<\mu \right>\)) observed in the data. The \(\left<\mu \right>\) value in the simulation was rescaled by a factor of \(1.03 \pm 0.07\) to improve agreement between data and simulation in the visible inelastic proton–proton cross section [98]. All generated background samples were passed through the Geant4-based [99] detector simulation [100] of the ATLAS detector. The ATLFAST-II simulation [100] was used for the signal samples to allow for the generation of many different A and H boson masses. The simulated events were reconstructed in the same way as the data.

4 Object reconstruction

Selected events are required to contain at least one vertex having at least two associated tracks with \(p_{\text {T}} > 500\) \(\text {MeV}\), and the primary vertex is chosen to be the vertex reconstructed with the largest \(\Sigma p_{\mathrm {T}}^2\) of its associated tracks.

Electrons are reconstructed from energy clusters in the electromagnetic calorimeter that are matched to tracks in the inner detector [101]. Electrons are required to have \(|\eta | < 2.47\) and \(p_{\text {T}} > 7~\text {GeV}\). The associated track must have \(|d_0|/\sigma _{d_0}<5\) and \(|z_0|\sin \theta <0.5\) mm, where \(d_0\) (\(z_0\)) is the transverse (longitudinal) impact parameter relative to the primary vertex and \(\sigma _{d_0}\) is the error in \(d_0\). To distinguish electrons from jets, isolation and quality requirements are applied. The quality requirements refer to both the inner detector track and the calorimeter shower shape. The isolation requirements are defined using tracking and calorimeter measurements. Electrons used in this search satisfy the ‘Loose’ quality and isolation requirements.

Muons are reconstructed by matching tracks reconstructed in the inner detector to tracks or track segments in the muon spectrometer [102]. Muons used for this search must have \(|\eta | < 2.5\), \(p_{\text {T}} > 7~\text {GeV}\), \(|d_0|/\sigma _{d_0}<3\), and \(|z_0|\sin \theta <0.5\) mm. They are also required to satisfy ‘Loose’ isolation requirements, similar to those used for electrons, as well as ‘Loose’ quality criteria for tracks in the inner detector and muon spectrometer [103].

Jets are reconstructed from topological clusters in the calorimeter system [104], using the anti-\(k_t\) algorithm [105, 106] with radius parameter \(R = 0.4\). Candidate jets are required to have \(p_{\text {T}} > 20~\text {GeV}\) (\(p_{\text {T}} > 30~\text {GeV}\)) for \(|\eta | < 2.5\) \((2.5< |\eta | < 4.5)\) [107]. Low-\(p_{\text {T}}\) jets from pile-up are rejected by a multivariate algorithm that uses properties of the reconstructed tracks in the event for jets with \(p_{\text {T}} < 60\) \(\text {GeV}\)and \(|\eta | < 2.4\) [108].

Jets containing b-hadrons are identified using a multivariate tagging algorithm (b-tagging) [109, 110], which makes use of track impact parameters and reconstructed secondary vertices. The b-tagging algorithm output is used to define a criterion to select jets originating from b-quark hadronisation for jets with \(|\eta | < 2.5\). The jets that are selected in this way are referred to as b-jets in the following. The criterion in use has an average efficiency of 70% for jets from b-quarks in simulated \(t\bar{t}\) events, with rejection factors of 8.9, 36 and 300 for jets initiated by c-quarks, hadronically decaying \(\tau \)-leptons and light-flavour quarks or gluons, respectively [110].

Electrons, muons and jets are reconstructed and identified independently. When those objects are spatially close, these algorithms can lead to ambiguous identifications. An overlap removal procedure [111] is therefore applied to remove ambiguities.

The missing transverse momentum, whose magnitude is denoted by \(E_{\text {T}}^{\text {miss}}\), is computed as the negative vectorial sum of the transverse momenta of calibrated leptons and jets, plus an additional soft term constructed from all tracks that originate from the primary vertex but are not associated with any identified lepton or jet [112, 113].

Fig. 2
figure 2

The \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) distributions shown before the requirement on this variable is applied for events with a exactly two b-jets and b three or more b-jets. Corrections from a fit to the data are applied to the simulation, as described in Sects. 5.1 and 8. The signal distribution for (\(m_A,m_H) = (600, 300)\) \(\text {GeV}\) is also shown, and is normalised such that the production cross section times the branching ratios \(B(A\rightarrow ZH)\) and \(B(H\rightarrow bb)\) corresponds to 1 pb. The signal shown includes only A bosons produced in association with b-quarks. The lower panel shows the ratio of the data to the background prediction (black filled circles) and the relative uncertainty, which includes both statistical and systematic components, in the background prediction (hatched area). The notations ttV, VV and Vh refer to top-pair production in association with a vector boson, diboson production and SM Higgs boson production in association with a vector boson, respectively. The production of a Z boson in association with jets is split based on jet flavour. The notation Z+(bb,bc,cc,bl) refers to the case where the jets originate from heavy flavour, which includes at least one jet originating from a b-quark or two jets originating from c-quarks, whereas the notation Z+(cl,l) includes all the remaining cases

5 Event selection and background estimation

The final states for the \(A\rightarrow ZH\rightarrow \ell \ell \ bb\)/WW decays feature a pair of oppositely charged, same-flavour leptons and either two b-jets or four mostly light-flavour jets from the W bosons decays. Three resonances can be formed by combining the selected objects: (i) the Z boson (\(\ell \ell \)), (ii) the H boson (bb or \(WW \rightarrow 4j\)), and (iii) the A boson (ZH system).

Events are required to contain exactly two muons or two electrons. The two muons must have opposite electric charges. This requirement is not applied to electrons, since they have a non-negligible charge misidentification rate due to conversions of bremsstrahlung photons. The highest-\(p_{\text {T}}\) lepton must satisfy \(p_{\text {T}} >27\) \(\text {GeV}\) in the \(\ell \ell bb\) final state, to ensure full efficiency of the single-lepton triggers. This requirement is raised to \(p_{\text {T}} >30\) \(\text {GeV}\) for the \(\ell \ell WW\) final state. The invariant mass of the lepton pair, \(m_{\ell \ell }\), must be in the range of 80–100 \(\text {GeV}\) to be compatible with the mass of the Z boson.

Further event selection criteria are channel-specific, and are described separately in the following sections.

5.1 The \(\ell \ell bb\) final state

The events that are used for the \(A\rightarrow ZH\rightarrow \ell \ell bb \) search are required to have at least two b-jets, with at least one of them having \(p_{\text {T}} > 45~\text {GeV}\). The two highest-\(p_{\text {T}}\) b-jets of the event form the \(H\rightarrow bb\) system candidate. The A boson candidate is formed by these two b-jets and, in addition, the two leptons that are matched to the Z boson.

Table 1 Summary of the event selection for signal and control regions in the \(A\rightarrow ZH\rightarrow \ell \ell bb\) channel

The requirement of a same-flavour lepton pair along with several b-jets implies that the signal region is contaminated by Z boson production in association with jets and backgrounds including top quarks, like \(t\bar{t}\) production. The presence of neutrinos in semileptonic top-pair production provides a handle to reduce this background by requiring \(E_{\text {T}}^{\text {miss}}/\sqrt{H_{\text {T}}} < 3.5~\text {GeV}^{1/2}\), where \(H_{\text {T}}\) is the scalar sum of the \(p_{\text {T}}\) of all jets and leptons in the event. The Z+jets background is reduced by requiring \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb} > 0.4\), where \(m_{\ell \ell bb}\) is the four-body invariant mass of the two-lepton, two-b-jet system assigned to the A boson candidate and the summation is performed over the \(p_{\text {T}} ^2\) of these objects. This discriminating variable is chosen because the signal distributions are similar across the two production mechanisms and the \(m_A\) and \(m_H\) range used in the search. The requirement is optimised separately for each signal hypothesis and, subsequently, an average value is chosen. The distribution of the \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) variable is shown in Fig. 2 separately for the cases where exactly two b-jets and three or more b-jets are present in the event. The distribution is shown before the \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb} > 0.4\) requirement is applied.

The two signal production mechanisms, gluon–gluon fusion and b-associated production, differ mainly in the number of heavy-flavour jets that are produced in association with the A boson. This motivates a categorisation based on the number of b-jets present in the event. In particular, two categories are defined: the \(n_b=2\) category, which contains events with exactly two b-jets, and the \(n_b \ge 3\) category, which contains events with three or more b-jets. For gluon–gluon fusion production, more than 95% of the events passing the above selection fall into the \(n_b=2\) category. For b-associated production, only 25–35% of the selected events fall into the \(n_b \ge 3\) category, and the others enter the \(n_b=2\) category. This is because of the relatively soft \(p_{\text {T}}\) spectrum of the associated b-jets and the geometric acceptance of the tracker.

Finally, the invariant mass \(m_{bb}\) of the b-jets that are assigned to the H boson must be compatible with the assumed H boson mass. This is ensured by requiring \(m_{bb}\) to be within optimised boundaries that depend on the assumed \(m_H\): \( 0.85\cdot m_H - 20~\text {GeV}< m_{bb} < m_H + 20~\text {GeV}\) for the \(n_b=2\) category, and \( 0.85\cdot m_H - 25~\text {GeV}< m_{bb} < m_H + 50~\text {GeV}\) for the \(n_b \ge 3\) category. The wider window for \(n_b\ge 3\) is motivated by a slightly poorer resolution due to potential b-jet misassignments. The b-jets that are matched to the H boson are the highest-\(p_{\text {T}}\) b-jets in the event and, hence, in the case of b-associated production, where more b-jets are present, may not be the ones that actually come from the \(H\rightarrow bb\) decay. In b-associated production, the fraction of A bosons for which the correct b-jets are chosen is in the range 50–90% for the \(n_b\ge 3\) category and is at least 65% for the \(n_b=2\) category.

The signal efficiency in the \(n_b=2\) category after the \(m_{bb}\) window requirement is 5.1–11% (2.5–6.6%) for gluon–gluon fusion (b-associated) production, depending on the \(m_A\) and \(m_H\) values. Similarly, the efficiency in the \(n_b \ge 3\) category after the \(m_{bb}\) window requirement is 1.3–3.2% for b-associated production. The quoted numbers refer to the efficiencies for A bosons decaying into ZH, with \(Z\rightarrow ee/\mu \mu /\tau \tau \) and \(H\rightarrow bb\), to pass the event selection for each of the categories. The inclusion of \(Z\rightarrow \tau \tau \) in this definition lowers the quoted signal efficiency because these decays have a very small efficiency to pass in this selection (which aims at \(Z\rightarrow ee/\mu \mu \)). The signal region selection is summarised in Table 1.

Fig. 3
figure 3

The \(p\mathrm{TZ}\) distributions for a the \(n_b=2\) and b the \(n_b \ge 3\) category. The events are required to satisfy all the signal region criteria with the exception of the \(m_{bb}\) window requirement. The same conventions as in Fig. 2 are used

The \(m_{\ell \ell bb}\) distribution after the \(m_{bb}\) requirement is the final discriminating variable, which is fitted to obtain the result of the search in this channel. To improve the \(m_{\ell \ell bb}\) resolution, the bb system’s four-momentum components are scaled to match the assumed H boson mass and the \(\ell \ell \) system’s four-momentum components are scaled to match the Z boson mass. This procedure, performed after the event selection, improves the \(m_{\ell \ell bb}\) resolution by a factor of two without significantly distorting the background distributions, resulting in an A boson mass resolution that is at best about 1% and up to 4% for gluon–gluon fusion, up to 10% for b-associated production in the \(n_b=2\) category and up to 16% for b-associated production in the \(n_b\ge 3\) category, depending on the \(m_A\) and \(m_H\) values.

Despite the dedicated selection criteria against Z+jets and top-quark production, these background processes dominate the signal region: the Z+jets contribution is \(\sim \)60–70% depending on the \(n_b\) category, while the top-quark contribution is \(\sim \)30–35%. In the \(n_b \ge 3\) category, other processes (\(t\bar{t}V\), dibosons, Vh) contribute up to \(\sim \)5% of the total background, while their contribution to the \(n_b = 2\) category is less than 1%. The accurate determination of Z+jets and top-quark contributions is paramount for the sensitivity of this search. Their estimation employs a combination of data-driven corrections to simulated events.

The most abundant background in this channel is from Z+jets production. The normalisation of this process is constrained by a control region defined by inverting the \(m_{bb}\) window criterion for each H boson mass hypothesis (see also Table 1). The control regions are distinct for the \(n_b = 2\) and \(n_b \ge 3\) categories, since the accuracy of the background simulation depends on the number of b-jets present in the event. The modelling of the Z+jets simulated events is examined extensively in a number of kinematic variables, including the \(p_{\text {T}} \) of the Z boson (\(p\mathrm{TZ} \)), the \(m_{bb}\) distribution and the \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) distribution. The simulated distributions of these variables are compared against data in a control region that requires two jets with exactly one of them being a b-jet, as well as an early selection stage, before the \(m_{bb}\) window and \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) requirements. For this early selection stage, it was verified that even those signals that were already excluded in Ref. [39] would be washed out by the background and would not bias the results. These regions are not used in the likelihood fit described in Sect. 8 and thus they are not included in Table 1. As a result of these studies, corrections to the distributions of \(p\mathrm{TZ} \), \(m_{bb}\) and \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) in the simulated Z+jets events are applied. The corrections are found to be uncorrelated and they are applied sequentially. The most significant effect on the sensitivity of this search (see also Sect. 7) is due to the corrections to the modelling of the \(p\mathrm{TZ} \) distribution, which range from \(+5\) to \(-10\%\) for most of the Z+jets events. As an example, Fig. 3 compares the \(p\mathrm{TZ}\) distributions in data with the background model after all corrections used in this search for events that satisfy all the requirements of the signal region with the exception of the \(m_{bb}\) window requirement, separately for \(n_b = 2\) and \(n_b \ge 3\) categories.

Top-quark production is heavily dominated by \(t\bar{t}\) production in which both top quarks decay semileptonically. Therefore, it is possible to define a pure top-quark control region by keeping the same selection as discussed previously, apart from an opposite-flavour lepton criterion, i.e. an \(e\mu \) pair is required instead of an ee or \(\mu \mu \) pair (see also Table 1). This region is used for top-pair production normalisation, and also to check that kinematic distributions such as the top-quark \(p_{\text {T}}\) spectrum are adequately modelled in simulation. Different control regions are used in the \(n_b =2\) and \(n_b \ge 3\) categories. This is because in the \(n_b \ge 3\) category the top-quark background is dominated by top-quark pair production in association with jets, which is more difficult to model than the inclusive top-quark pair production that dominates the top-quark background in the \(n_b =2\) category. Finally, the \(m_{bb}\) window requirement is also applied to the top-quark control region, resulting in a separate control region for each \(m_H\) hypothesis tested in the search. Good agreement within uncertainties is observed between data and simulation in the shape of all variables considered.

Backgrounds from diboson, single top-quark, and SM Higgs boson production, as well as \(t\bar{t}\) production in association with a vector boson are minor contributions to the total background composition. The shapes of their distributions are taken from simulation, whereas they are normalised using precise inclusive cross sections calculated from theory. The diboson samples are normalised using NNLO cross sections [114,115,116,117]. Single-top-quark production and top-quark-pair production in association with vector bosons are normalised to NLO cross sections from Refs. [118,119,120] and Ref. [58], respectively. The normalisation of SM Higgs boson production in association with a vector boson follows the recommendations of Ref. [63] using NNLO QCD and NLO electroweak corrections.

Fig. 4
figure 4

The \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{2\ell 4q}\) distribution shown before the requirement on this variable is applied. Corrections from a fit to the data are applied to the simulation, as described in Sects. 5.2 and 8. The notation VV in the legend corresponds to the production of diboson events. The signal distribution for (\(m_A,m_H) = (600, 300)\) \(\text {GeV}\) is also shown, and is normalised such that the production cross section times the branching ratios \(B(A\rightarrow ZH)\) and \(B(H\rightarrow WW)\) corresponds to 1 pb. The lower panel shows the ratio of the data to the background prediction (black filled circles) and the relative uncertainty, which includes both statistical and systematic components, in the background prediction (dashed area)

5.2 The \(\ell \ell WW\) final state

The decay \(A\rightarrow ZH\rightarrow \ell \ell WW \) features a pair of electrons or muons and four jets from the hadronic W boson decays. The selected events are required to have at least four jets with the highest- and second-highest-\(p_{\text {T}}\) jets satisfying \(p_{\text {T}} > 40~\text {GeV}\) and \(p_{\text {T}} > 30~\text {GeV}\), respectively. In addition, the lowest-\(p_{\text {T}}\) electrons or muons are required to have \(p_{\text {T}} > 15\) \(\text {GeV}\).

The selection of the correct jet pairs in the reconstruction of the two W boson candidates is important for improving the signal resolution and suppressing backgrounds. For this task, all possible jet pairs that can be formed by considering up to the five highest-\(p_{\text {T}}\) jets in the event are taken into account. A set of requirements on kinematic variables, such as the angular distances between the jets within a pair, the jet transverse momenta and the reconstructed masses of the W, H and A boson candidates, is optimised to test the various combinations for compatibility with the signal hypothesis so that the signal efficiency and background rejection are maximised. This procedure results in a signal efficiency that ranges from 50 to 70% depending on \(m_{A}\) and \(m_{H}\), whereas for background processes the efficiency is about 40%. The fraction of events in which the correct jet pairs are assigned to the W boson candidates after this procedure is in the range from 50 to 70%, depending on the \(m_A\) and \(m_H\) values.

The main background in this channel is from the production of a Z boson in association with jets. A criterion similar to that in the \(\ell \ell bb \) channel is employed to discriminate against it: \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{2\ell 4q}> 0.3\), where \(m_{2\ell 4q}\) is the six-body invariant mass of the two-lepton, four-jet system assigned to the A boson and the summation is performed over the \(p_{\text {T}} ^2\) of these objects. This discriminating variable is chosen for the same reasons as in the \(\ell \ell bb \) channel and it is optimised following the same considerations. The distribution of this variable before the requirement is applied is shown in Fig. 4.

Finally, the invariant mass of the four selected jets, \(m_{4q}\), must be compatible with the assumed H boson mass. This is ensured by requiring \(m_{4q}\) to be within optimised boundaries that depend on \(m_H\): \(m_{H} - 53~\text {GeV}< m_{4q}< 0.97 \cdot m_{H} + 54\) \(\text {GeV}\). After this requirement the signal efficiency for A bosons decaying into ZH with \(Z\rightarrow ee/\mu \mu /\tau \tau \) and \(H\rightarrow WW\rightarrow qqqq\) is 6.5–11%, depending on the \(m_A\) and \(m_H\) values. The signal region selection is summarised in Table 2.

Table 2 Summary of the event selection for signal and control regions in the \(A\rightarrow ZH\rightarrow \ell \ell WW\) channel

The \(m_{2\ell 4q}\) distribution after the \(m_{4q}\) requirement is the final discriminating variable, which is fitted to obtain the results of the search in this channel. To improve the \(m_{2\ell 4q}\) resolution, the four-jet system’s four-momentum components are scaled to match the assumed H boson mass and the \(\ell \ell \) system’s four-momentum components are scaled to match the Z boson mass. The final A boson mass resolution is in the range from 1% to 17% of \(m_A\), depending on the \(m_A\) and \(m_H\) values.

The dominant backgrounds after the event selection are from Z+jets (\(\sim \)90% of total background), top-quark (\(\sim \)5%), and diboson (\(\sim \)5%) production. Smaller backgrounds (W+jets, \(t\bar{t}h\), \(t\bar{t}V\), and Vh) contribute less than \(1\%\) to the total background and are not included in the background composition.

The shape of the Z+jets background is taken from simulation combined with data-driven corrections, and the normalisation is constrained by the control region outside the \(m_{4q}\) mass window of each signal region (see Table 2), using a procedure similar to that in the \(\ell \ell bb\) channel. To address shape differences between distributions of kinematic variables in data and simulated backgrounds, two corrections are applied to the \(p_{\text {T}}\) of the Z boson candidates and to the leading jet’s \(p_{\text {T}}\). Those corrections are derived from a control region orthogonal to the signal region, obtained by selecting \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{2\ell 4q}< 0.3\). This region is not used subsequently in the likelihood fit described in Sect. 8 and therefore it is not included in Table 2. The corrections are found to be uncorrelated and they are applied sequentially. The correction to the \(p\mathrm{TZ} \) distribution in the simulation is as large as 20% at low \(p\mathrm{TZ} \) values and it becomes smaller as \(p\mathrm{TZ} \) increases, whereas the correction to the leading jet’s \(p_{\text {T}} \) does not exceed \(\pm 10\)%. The distributions of the \(p_{\text {T}}\) of the Z boson candidates and of the leading jet’s \(p_{\text {T}}\), after the reweighting, are shown in Fig. 5 for events satisfying all requirements for the signal region with the exception of the \(m_{4q}\) window cut.

Fig. 5
figure 5

The distributions of a the \(p_{\text {T}}\) of the Z boson candidates and b the leading jet’s \(p_{\text {T}}\) in the \(\ell \ell WW\) channel. The events are required to satisfy all the signal region criteria with the exception of the \(m_{4q}\) window requirement. Data-driven corrections are applied, as described in the text. The same conventions as in Fig. 4 are used

The top-quark background shape is taken from simulated events. The normalisation is constrained using a high-purity control region defined by keeping the same selection as for the signal region, but replacing the electron or muon pairs by opposite-flavour leptons (\(e\mu \) pairs), as indicated in Table 2. The single-top-quark, Z+jets and diboson production contributions in this control region are estimated from simulation. The diboson background shape and normalisation are taken from the simulated samples, using the same cross-section calculation as in the \(\ell \ell bb\) channel.

Fig. 6
figure 6

Signal \(m_{\ell \ell bb}\) or \(m_{2\ell 4q}\) distributions assuming \(m_A=500\) \(\text {GeV}\) and \(m_H=300\) \(\text {GeV}\) for the following cases: \(\ell \ell bb\) channel: a gluon–gluon fusion in the \(n_b = 2\) category, b b-associated production in the \(n_b = 2\) category, and c b-associated production in the \(n_b \ge 3\) category; d \(\ell \ell WW\) channel. In the upper panels, the black filled circles correspond to the simulated distributions, which are compared against the interpolated parameterised signal distributions shown as solid red curves. Also in the same panels, the shape variations of the interpolated parameterised signal distributions are shown in dotted blue (\(+1\sigma \)) and black (\(-1\sigma \)) lines. In the lower panels, the black filled circles correspond to the ratio of the simulation to the interpolated parameterised curve. The dotted blue (black) line corresponds to the ratio of the \(+1\sigma \) (\(-1\sigma \)) shape variation of the interpolated curve to the interpolated curve

6 Signal modelling

This analysis searches for two new particles, with their mass hypotheses considered in the two-dimensional space \(m_A\)\(m_H\), with good mass resolution of the A and H reconstructed final states. The investigation of the relevant phase space requires a large number of signal mass hypotheses to be tested. In addition, various new physics scenarios which are of interest for this search, like the 2HDM, include A bosons with natural widths comparable to, or larger than, the experimental mass resolution for large parts of the parameter space in which this search has sensitivity. The H bosons are considered to always have negligible natural width, in accordance with the 2HDM scenarios used to interpret this search (see Sect. 8). For these reasons, the \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) distributions can be simulated only for some (\(m_A\), \(m_H\)) points and an interpolation using analytic functions is employed for the rest, following a procedure similar to that used in Ref. [39].

In the cases where the natural widths of both the A and H bosons are much smaller than the experimental mass resolution, the modelling of the mass distributions uses two types of parametric functions. First, an ExpGaussExp (EGE) function [39, 121] provides a good description of gluon–gluon fusion production of A bosons in the \(n_b = 2\) category of the \(\ell \ell bb\) channel. Second, a double-Gaussian Crystal Ball (DSCB) function [39, 122] gives a good description of gluon–gluon fusion production in the \(\ell \ell WW\) channel and b-associated production in both the \(n_b = 2\) and \(n_b \ge 3\) categories of the \(\ell \ell bb\) channel.

Both the EGE and DSCB functions have a Gaussian core but they differ in the way the tails are treated. The tails of the EGE function are exponential, described by two parameters, whereas DSCB has power-law tails described by four extra parameters. The values of the function parameters are extracted from unbinned maximum-likelihood fits to the simulated \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) distributions. Polynomial functions are used to interpolate the parameters to mass points that were not simulated. These interpolated parametric functions are used to model the signal mass shapes for all the signal assumptions considered in this search. The fit uncertainties of the DSCB and EGE function parameters, as well as the parameters of the polynomial functions used for the interpolation, are used to derive a shape uncertainty for each of the interpolated distributions.

A typical example of the result of the signal parameterisation is shown in Fig. 6 for the \((m_A, m_H)=(500, 300)\) \(\text {GeV}\) mass point. The figure shows a comparison of the simulated mass distribution and the interpolated parametric function, as well as the shape variation that is taken as an estimate of the systematic uncertainty from the procedure. In general, the cores of the \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) distributions are well-parameterised by the chosen functional forms. There are some small differences between the function description and the simulated distribution in the tails of the distributions, but those have negligible effects on the final results and they are covered by interpolation uncertainties.

The parameterisation procedure described in the previous paragraph is modified to allow for cases where the width of the A boson is comparable to, or larger than, the experimental mass resolution. This can be modelled by convolving a modified Breit–Wigner distributionFootnote 4 with the EGE or DSCB function. This procedure is valid as long as the width of the H boson remains narrow relative to the experimental resolution, which is the case for the 2HDM scenarios considered in Sect. 8. Widths of up to approximately 20% of the A boson mass are considered, which is the range relevant to the sensitive parameter space of the 2HDM scenarios that are of interest for this search.

Finally, the signal efficiencies for the interpolated mass points are obtained through separate two-dimensional interpolations on the \((m_A, m_H)\) plane using thin-plate splines [123].

7 Systematic uncertainties

Several sources of systematic uncertainty in the signal and background estimates are considered, including experimental and theoretical sources. Experimental uncertainties comprise those in the luminosity measurement [124] (obtained using the LUCID-2 detector [53]), trigger, object identification, energy/momentum scale and resolution as well as underlying-event and pile-up modelling [98, 102, 103, 107]. These uncertainties impact the simulations of signal and background processes.

The signal and background modelling have associated theoretical uncertainties. For the signal modelling, the uncertainties due to the factorisation and renormalisation scale choice, the initial- and final-state radiation treatment and the PDF choice are considered. No additional signal modelling uncertainties related to model-specific cross-section predictions, such as the 2HDM predictions used in Sect. 8, are considered. The renormalisation and factorisation scales are varied up and down separately by a factor of two, and the largest deviation from the nominal signal is taken as the estimated uncertainty. The uncertainties due to initial- and final-state radiation as well as the multiple parton interaction modelling are estimated using a subset of A14 tuning variations [60]. PDF uncertainties are computed using the prescription from PDF4LHC15 [88], which include the envelope of three PDF sets, namely CT14, MMHT2014 and NNPDF3.0.

Additional systematic uncertainties are assigned to cover the differences in signal efficiencies and \(m_{\ell \ell bb}\) and \(m_{\ell \ell WW}\) resolution differences between the interpolations and the simulations, as shown by the dotted blue and black lines in the lower panels of Fig. 6.

For the background modelling, the most important sources of systematic uncertainty are the modelling of shapes of several kinematic distributions of Z+jets events. In the \(\ell \ell bb\) channel, they arise from the shape corrections for the \(p\mathrm{TZ}\), \(\sqrt{\Sigma p_{\text {T}} ^2}/m_{\ell \ell bb}\) and \(m_{bb}\) variables described in Sect. 5.1. An uncertainty is estimated by comparing the corrections and the agreement between the background prediction and the data for various variables and among various control regions. For each of the corrections, the applied uncertainty is half the size of the correction in the \(n_b = 2\) category, and the full size of the correction in the \(n_b \ge 3\) category. In the \(\ell \ell WW\) channel, the uncertainties are due to the shapes of the \(p\mathrm{TZ}\) and leading-jet \(p_{\text {T}} \) distributions (Sect. 5.2). The uncertainty is estimated similarly to that in the \(\ell \ell bb\) channel and is half the size of the correction. For other background processes, modelling uncertainties are obtained by varying the factorisation and renormalisation scales, and the amount of initial- and final-state radiation.

The effect of these systematic uncertainties on the search is studied using a signal-strength parameter \(\mu \) for hypothesised signal production (see also Sect. 8). The uncertainties found to have the largest impact depend on the choice of \((m_A, m_H)\) signal point. Table 3 shows the relative uncertainties in the \(\mu \) value from the leading sources of systematic uncertainty for two example mass points of gluon–gluon fusion and b-associated production for the \(\ell \ell bb \) channel. The uncertainties are evaluated using an Asimov dataset [125] generated with the signal cross section set to the expected limits for the particular \((m_A, m_H)\) signal point, considering a narrow-width A boson. Table 4 shows the same information for the \(\ell \ell WW\) channel. The leading sources of systematic uncertainty are similar for other mass points studied and for larger A boson widths.

For the \(\ell \ell bb\) channel, the most relevant sources of systematic uncertainty are the background modelling, the signal interpolation, and the jet energy scale and resolution. The limited size of the simulated samples has a higher impact at low masses, since at higher masses other sources are more dominant. Other systematic uncertainties with non-negligible impact include those associated with b-tagging and theoretical errors. In the \(\ell \ell WW\) channel, the most relevant systematic uncertainties are those related to the jet energy scale and resolution, as expected in a channel with four jets in the final state. The limited size of the simulated samples, the background modelling and the signal interpolation also have a non-negligible impact on the signal-strength parameter. In both channels, the data statistical uncertainties have lower impact at low masses compared to the systematic uncertainties. In addition, the search sensitivity is affected at high masses by the limited size of the data sample, an effect which is more pronounced in the \(\ell \ell bb\) channel.

Table 3 The effect of the most important sources of uncertainty on the signal-strength parameter at two example mass points of \((m_A, m_H)=(230, 130)\) \(\text {GeV}\) and \((m_A, m_H)=(700, 200)\) \(\text {GeV}\) in the \(\ell \ell bb\) channel, for both gluon–gluon fusion and b-associated production of a narrow-width A boson. The signal cross sections are taken to be the expected median upper limits (see Sect. 8) and they correspond to values that are shown next to the indicated mass points. JES and JER stand for jet energy scale and jet energy resolution, ‘Sim. stat.’ for simulation statistics, ‘Sig. interp.’ for signal interpolation, and ‘Bkg. model.’ for the background modelling. ‘Theory’ refers to theoretical uncertainties in the signal samples due to the PDF choice, factorisation and renormalisation scales, and initial- and final-state radiation
Table 4 The effect of the most important sources of uncertainty on the signal-strength parameter at two example mass points of \((m_A, m_H)=(500, 300)\) \(\text {GeV}\) and \((m_A, m_H)=(700, 200)\) \(\text {GeV}\) in the \(\ell \ell WW\) channel for gluon–gluon fusion production of a narrow-width A boson. The same notation as in Table 3 is used

8 Results

The \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) distributions are expected to exhibit a resonant structure if signal events are present, while background events result in a smoothly falling spectrum. Therefore, those are chosen as the final variables to discriminate between signal and background. The shape differences between the signal and background contributions in the \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) distributions are exploited through binned maximum-likelihood fits of the signal-plus-background hypotheses to extract potential signal contributions. The fits are based on the statistical framework described in Refs. [125,126,127]. For a given mass hypothesis of \((m_A, m_H)\), the likelihood is constructed as the product of Poisson probabilities for event yields in the \(m_{\ell \ell bb}\) or \(m_{2\ell 4q}\) bins:

$$\begin{aligned}&L(\mu ,\vec {\alpha },\vec {\theta } | m_A,m_H) \\&\quad = \prod _{i={\text {bins}}} {\text {Poisson}}\left( N_i \bigg |\left( \mu \times S_i(m_A,m_H,\vec {\theta })+B_i(\vec {\alpha },\vec {\theta })\right) \right) \cdot G(\vec {\theta }), \end{aligned}$$
Fig. 7
figure 7

The \(m_{bb}\) distribution before any \(m_{bb}\) window cuts for the a \(n_b=2\) and b \(n_b \ge 3\) categories. The signal distribution for (\(m_A\), \(m_H\)) = (600, 300) \(\text {GeV}\) is also shown, and is normalised such that the production cross section times the branching ratios \(B(A\rightarrow ZH)\) and \(B(H\rightarrow bb)\) corresponds to 1 pb. The same conventions as in Fig. 2 are used

where \(N_i\) is the number of observed events, and \(S_i(m_A, m_H, \vec {\theta })\) and \(B_i(\vec {\alpha },\vec {\theta })\) are the expected number of signal events and estimated number of background events in bin i. The vector \(\vec {\alpha }\) represents free background normalisation scale factors (described later) and the vector \(\vec {\theta }\) denotes all non-explicitly listed parameters of the likelihood function such as nuisance parameters associated with systematic uncertainties. Systematic uncertainties are incorporated in the likelihood as nuisance parameters with either Gaussian or log-normal constraint terms, denoted by \(G(\vec {\theta })\) in the formula above. The parameter of interest, \(\mu \), is a multiplicative factor applied to the expected signal rate. The \(m_{\ell \ell bb}\) and \(m_{2\ell 4q}\) bin widths are chosen according to the expected detector resolution and taking into account the statistical uncertainty in the number of simulated background events. The bin centres are adjusted such that at least 65% of the test signal is contained in one bin.

Fig. 8
figure 8

The \(m_{\ell \ell bb}\) mass distribution for the \(m_{bb}\) windows defined for \(m_H=300~\text {GeV}\) and \(m_H=500~\text {GeV}\) for a, c the \(n_b=2\) and b, d the \(n_b\ge 3\) category, respectively. Signal distributions with \((m_A, m_H) = (600, 300)\) \(\text {GeV}\) and \((m_A, m_H) = (670, 500)\) \(\text {GeV}\) are also shown for gluon–gluon fusion production in (a, c) and b-associated production in (b, d). The number of entries shown in each bin is the number of events in that bin divided by the width of the bin. The same conventions as in Fig. 2 are used

For each bin, \(S_i\) is calculated from the total integrated luminosity, the assumed cross section times branching ratio for the signal and its selection efficiency. The sum of all background contributions in the bin, \(B_i\), is estimated from simulation, which includes the modelling corrections discussed in Sects. 5.1 and 5.2. The number of events in the \(t\bar{t}\) and Z+jets control regions is included in the likelihood calculation to constrain their normalisation in the signal regions. This is achieved by introducing two free normalisation scale factors per channel, represented by \(\vec {\alpha }\) in the likelihood description earlier in this section. In the \(\ell \ell bb\) channel these scale factors apply to the \(t\bar{t}\) contribution and the heavy-flavour component of the Z+jets contribution, whereas the rest of the contributions in the control region are estimated from simulation. In the \(\ell \ell WW\) channel the scale factors apply to the \(t\bar{t}\) contribution and the flavour-inclusive Z+jets contribution. Typical values of the scale factors are close to unity with the exception of Z+jets in the \(\ell \ell bb\) channel, which is scaled by a factor of 1.2, and \(t\bar{t}\) in the \(\ell \ell bb\) \(n_{b} \ge 3\) category, which is typically scaled by a factor of 1.4.

Fig. 9
figure 9

Upper bounds at 95% CL on the production cross section times the branching ratio \(B(A\rightarrow ZH)\times B(H\rightarrow bb)\) in pb for a, b gluon–gluon fusion and c, d b-associated production. The expected upper limits are shown in (a) and (c) and the observed upper limits are shown in (b) and (d)

Fig. 10
figure 10

Observed and expected 95% CL exclusion regions for the \(\ell \ell bb\) channel in the \((m_H, m_A)\) plane for various \(\tan \beta \) values for the a type-I, b type-II, c lepton-specific and d flipped 2HDM, with \(\cos (\beta -\alpha )=0\)

The signals that are fitted in each category are motivated by signal efficiency considerations and the interpretation of the search in the context of the 2HDM. In the \(\ell \ell bb\) channel the following fits are performed. First, A bosons produced by gluon–gluon fusion are considered in the \(n_b=2\) category. Second, a combined fit for the b-associated production mechanism in both the \(n_b=2\) and \(n_b\ge 3\) categories is performed. Finally, there is a combination of the b-associated production fit with the gluon–gluon fusion fit, which is interpreted in the context of the 2HDM. In the \(\ell \ell WW\) channel, only A bosons produced by gluon–gluon fusion are considered and, hence, it is the only fit that is considered.

8.1 \(A \rightarrow ZH \rightarrow \ell \ell bb\) results

The \(m_{\ell \ell bb}\) distributions from different \(m_{bb}\) mass windows are scanned for potential excesses beyond the background expectations through signal-plus-background fits. The scan is performed in steps of 10 \(\text {GeV}\) for both the \(m_A\) range 230–800 \(\text {GeV}\) and the \(m_H\) range 130–700 \(\text {GeV}\), such that \(m_A-m_H \ge 100~\text {GeV}\). The step sizes are chosen to be compatible with the detector resolution for \(m_{\ell \ell bb}\) and \(m_{bb}\). In total, there are 58 \(m_{bb}\) windows that are probed for the \(n_b=2\) and \(n_b\ge 3\) categories. The overall number of (\(m_A\), \(m_H\)) signal hypotheses that are tested is 1711 per category.

Figure 7 shows the distribution of the H boson candidate mass \(m_{bb}\) before the \(m_{bb}\) window requirement in each of the two categories. Typical examples of \(m_{\ell \ell bb}\) distributions after the application of the \(m_{bb}\) window requirement are shown in Fig. 8a–d. In particular, the \(m_{bb}\) window defined for \(m_H = 300\) \(\text {GeV}\) is shown in Fig. 8a, b for the \(n_b=2\) and \(n_b\ge 3\) categories, respectively. On the same figures, a signal distribution is shown as well, which corresponds to gluon–gluon fusion production in Fig. 8a and b-associated production in Fig. 8b, for the \((m_A, m_H) = (600, 300)\) \(\text {GeV}\) signal point. Similarly, an \(m_{bb}\) window defined for \(m_H = 500\) \(\text {GeV}\) is shown Fig. 8c, d for the \(n_b=2\) and \(n_b\ge 3\) categories, respectively. The signal distribution for the \((m_A, m_H) = (670, 500)\) \(\text {GeV}\) signal point is also shown for gluon–gluon fusion production in Fig. 8c and b-associated production in Fig. 8d.

In all cases, the data are found to be well described by the background model. The most significant excess for the gluon–gluon fusion production signal assumption is at the \((m_A, m_H) = (610, 290)\) \(\text {GeV}\) signal point, for which the local (global) significance [128] is 3.1 (1.3) standard deviations. For b-associated production, the most significant excess is at the \((m_A, m_H) = (440, 220)\) \(\text {GeV}\) signal point, for which the local (global) significance is 3.1 (1.3) standard deviations. The significances are calculated for each production process separately, ignoring the contribution from the other.

In the absence of any statistically significant excess, the results of the search in this channel are interpreted as upper limits on the production cross section of an A boson decaying into ZH followed by the \(H\rightarrow bb\) decay, \(\sigma \times B(A\rightarrow ZH)\times B(H\rightarrow bb)\). The cross-section upper limits consider A bosons that are produced only by a single mechanism, i.e. either gluon–gluon fusion or b-associated production. Modified frequentist [129] 95% confidence level (CL) upper limits on the production cross section of this process are obtained using the asymptotic approximation [125] for the various signal hypotheses that are tested. In particular, expected and observed upper limits for gluon–gluon fusion production of narrow-width A bosons in the \(n_b=2\) category are shown in Fig. 9a, b, respectively. For b-associated production of narrow-width A bosons, the expected and observed limits for the combination of the \(n_b=2\) and \(n_b\ge 3\) categories are shown in Fig. 9c, d, respectively. The upper limits for gluon–gluon fusion vary from 6.2 fb for \((m_A, m_H) = (780, 129)\) \(\text {GeV}\) to 380 fb for \((m_A, m_H) = (250, 150)\) \(\text {GeV}\). This is to be compared with the corresponding expected limits of 15 fb and 240 fb for these two signal hypotheses. For b-associated production the upper limit varies from 6.8 fb for \((m_A, m_H) = (760, 220)\) \(\text {GeV}\) to 210 fb for \((m_A, m_H) = (230, 130)\) \(\text {GeV}\), whereas the corresponding expected limits are 15 fb and 160 fb.

Upper limits are also calculated for signal assumptions where the natural width of the A boson is large in comparison with the experimental mass resolution, which is needed for the interpretation of the search in the context of the 2HDM. The cross-section upper limit decreases as the natural width of the A boson increases. In particular, a gluon–gluon produced A boson with a natural width of 10% of its mass has a cross-section upper limit that is reduced on average by a factor of approximately 3 from the narrow-width case. This factor becomes approximately 4 when the natural width increases to 20%. The A bosons from b-associated production have worse experimental mass resolution and the deterioration of the limit is on average smaller: the upper limits are reduced by a factor of about 1.9 (2.3) for a natural width of 10% (20%).

The results for A boson natural widths that are comparable to, or larger than, the experimental mass resolution are used for the interpretation of the search in the context of the CP-conserving 2HDM. The 2HDM benchmark against which the search results are compared has three free parameters: \(m_A\), \(m_H\) and \(\tan \beta \). In addition, there are four ways to assign the Yukawa couplings to fermions, defining type-I, type-II, lepton-specific and flipped 2HDMs. The remaining parameters are fixed. The mass of the lightest Higgs boson in the model is fixed to \(125~\text {GeV}\) and its couplings are set to be the same as those of the SM Higgs boson by choosing \(\cos (\beta -\alpha )=0\) [19], which is known as the 2HDM weak decoupling limit. The charged Higgs boson is assumed to have the same mass as the A boson and the potential parameter \(m^2_{12}\) [18] is fixed to \(m^2_{A}\tan \beta /(1+\tan ^2\beta )\).

Fig. 11
figure 11

The \(m_{4q}\) distribution before any \(m_{4q}\) window cuts. The same conventions as in Fig. 4 are used

The cross sections for A boson production in the 2HDM are calculated using corrections at up to NNLO in QCD for gluon–gluon fusion and b-associated production in the five-flavour scheme as implemented in SusHi [130,131,132,133]. For b-associated production a cross section in the four-flavour scheme is also calculated as described in Refs. [134, 135] and the results are combined with the five-flavour scheme calculation following Ref. [136]. The Higgs boson widths and branching ratios are calculated using 2HDMC [137]. The procedure for the calculation of the cross sections and branching ratios, as well as for the choice of 2HDM parameters, follows Ref. [63].

Fig. 12
figure 12

The \(m_{2\ell 4q}\) mass distribution for the \(m_{4q}\) windows defined for a \(m_H=300~\text {GeV}\) and b \(m_H=500~\text {GeV}\). Signal distributions are also shown with \((m_A, m_H) = (600, 300)\) \(\text {GeV}\) and \((m_A, m_H) = (670, 500)\) \(\text {GeV}\). The number of entries shown in each bin is the number of events in that bin divided by the width of the bin. The same conventions as in Fig. 4 are used

The interpretation of the search in the 2HDM is performed in the \((m_H, m_A)\) plane, as shown in Fig. 10. In this plot, colour-shaded areas indicate expected and observed exclusions for various \(\tan \beta \) values. There is one plot for each of the four 2HDM types. For the type-I and lepton-specific 2HDMs, only gluon–gluon fusion production is relevant. The exclusion region reaches \(m_H \lesssim \) 350 \(\text {GeV}\) for \(\tan \beta =1\) and the sensitivity decreases for larger \(\tan \beta \) values. In type-I 2HDM for instance, for \(\tan \beta =10\) the exclusion reaches \(m_H \lesssim \) 320 \(\text {GeV}\) and \(m_A \lesssim \) 500 \(\text {GeV}\). The limiting value at \(m_H \simeq 350\) \(\text {GeV}\) is due to the drop of the \(H\rightarrow bb\) branching ratio, which competes with \(H\rightarrow t\bar{t}\) at larger \(m_H\) values. The type-II and flipped 2HDMs are dominated by A bosons from b-associated production as \(\tan \beta \) increases, although gluon–gluon fusion is still important for \(\tan \beta \approx 1\). Like the type-I and lepton-specific 2HDMs, the type-II and flipped 2HDMs provide similar constraints because they only differ in the lepton Yukawa couplings. The contribution from b-associated signal production increases the sensitivity at large \(\tan \beta \) values, excluding \(m_H \lesssim \) 650 \(\text {GeV}\) for \(\tan \beta =20\). The search sensitivity deteriorates at lower \(\tan \beta \) values, excluding \(m_H \lesssim \) 350 \(\text {GeV}\) for \(\tan \beta =1\).

8.2 \(A \rightarrow ZH \rightarrow \ell \ell WW\) results

The \(m_{2\ell 4q}\) distributions from different \(m_{4q}\) mass windows are scanned for possible excesses using a procedure similar to the one in the \(\ell \ell bb\) channel. The scan is performed in steps of 10 \(\text {GeV}\) for both the \(m_A\) range 300–800 \(\text {GeV}\) and the \(m_H\) range 200–700 \(\text {GeV}\), such that \(m_A - m_H \ge 100~\text {GeV}\). This gives in total 51 \(m_{4q}\) mass windows and the overall number of (\(m_A\), \(m_H\)) signal hypotheses that are tested is 1326.

Figure 11 shows the distribution of the H boson candidate mass \(m_{4q}\) before the \(m_{4q}\) window requirement. Typical examples of \(m_{2\ell 4q}\) distributions after the application of the \(m_{4q}\) window requirement are shown in Fig. 12a, b, referring to \(m_{4q}\) windows defined for \(m_H=300~\text {GeV}\) and \(m_H=500~\text {GeV}\), respectively. Signal distributions corresponding to the \((m_A, m_H) = (600, 300)\) \(\text {GeV}\) signal point for Fig. 12a and the \((m_A, m_H) = (670, 500)\) \(\text {GeV}\) signal point for Fig. 12b are also shown.

In all cases, the data are found to be well described by the background model. The most significant excess is at the \((m_A, m_H) = (440, 310)\) \(\text {GeV}\) signal point, for which the local (global) significance is 2.9 (0.82) standard deviations.

Using the same method as for the \(\ell \ell bb\) channel, constraints on the production of \(A\rightarrow ZH\) followed by \(H\rightarrow WW\) decay are derived. The 95% CL upper limits are shown in Fig. 13 for a narrow-width A boson produced via gluon–gluon fusion. The upper limit varies from 0.023 pb for the \((m_A, m_H) = (770, 660)\) \(\text {GeV}\) signal point to 8.9 pb for the \((m_A, m_H) = (340, 220)\) \(\text {GeV}\) signal point. This is to be compared with the corresponding expected limits of 0.041 pb and 3.6 pb for these two signal points. The upper limits deteriorate when the natural width of the A boson is comparable to, or larger than, the experimental mass resolution. In particular, for a natural width that is 10% of \(m_A\) the upper limits decrease on average by a factor of 3. This factor becomes approximately 5 when the natural width increases to 20%.

Fig. 13
figure 13

Expected (a) and observed (b) upper bounds at 95% CL on the production cross section times the branching ratio \(B(A\rightarrow ZH)\times B(H\rightarrow WW)\) in pb

Fig. 14
figure 14

Observed and expected 95% CL exclusion regions in the (\(\cos (\beta -\alpha )\), \(m_A\)) plane for various \(\tan \beta \) values for a, b \(m_H=200~\text {GeV}\) and c, d \(m_H = 240~\text {GeV}\) in the context of type-I 2HDM for the \(\ell \ell WW\) channel

The sensitivity of the \(\ell \ell WW\) channel in the context of the CP-conserving 2HDM was examined. The same 2HDM calculations as in the \(\ell \ell bb\) channel are used and the only differences are related to the parameter space of the model that is probed. In particular, because only A bosons produced by gluon–gluon fusion are studied in this search, only type-I and lepton-specific 2HDMs are considered. In addition, the partial width \(\Gamma (H\rightarrow WW)\) vanishes when \(\cos (\beta -\alpha )=0\) and is maximal at \(|\cos (\beta -\alpha )|=1\), whereas for the partial width \(\Gamma (A\rightarrow ZH)\) the opposite is true, i.e. it vanishes when \(|\cos (\beta -\alpha )|=1\) and it is maximal when \(\cos (\beta -\alpha )=0\). These observations imply that this channel should be most sensitive between these two extreme values of \(|\cos (\beta -\alpha )|\).

The interpretation of the observed and expected upper limits on the cross section times branching ratio in the context of the type-I and lepton-specific 2HDM scenarios show that the \(\ell \ell WW\) channel has little sensitivity in regions that are not already excluded by the 125 \(\text {GeV}\) Higgs boson coupling measurements [3], an analysis that also provides similar limits in this parameter space. In particular, for the \(m_A\) range considered in this channel, there is sensitivity up to \(m_H < 250~\text {GeV}\) and for \(\tan \beta < 4\). Some examples of 95% CL excluded regions in the plane defined by \(m_A\) and \(\cos (\beta -\alpha )\) for \(m_{H}=200~\text {GeV}\) and \(m_{H}=240~\text {GeV}\) are shown in Fig. 14 for the type-I 2HDM. The results are very similar for the lepton-specific 2HDM, since the only difference between the two 2HDM types is the lepton Yukawa couplings, which only affect the total width.

9 Conclusion

Data recorded by the ATLAS experiment at the LHC, corresponding to an integrated luminosity of 139 \(\text {fb}^{-1}\) from proton–proton collisions at a centre-of-mass energy 13 \(\text {TeV}\), are used to search for a heavy Higgs boson, A, decaying into ZH, where H denotes another heavy Higgs boson with mass \(m_H>125~\text {GeV}\). Two final states were considered, where the H boson decays into a pair of b-quarks or W bosons, and in both cases the Z boson decays into a pair of electrons or muons. In the \(\ell \ell bb\) channel, the A boson is assumed to be produced via either gluon–gluon fusion or b-associated production. In the \(\ell \ell WW\) channel, only gluon–gluon fusion production is considered. No significant deviation from the SM background predictions is observed in the \(ZH\rightarrow \ell \ell bb\) and \(ZH\rightarrow \ell \ell WW \rightarrow \ell \ell qqqq\) final states that are considered in this search. Considering each channel and each production process separately, upper limits are set at the 95% confidence level for \(\sigma \times B(A\rightarrow ZH)\times B(H\rightarrow bb \; \text {or} \; H \rightarrow WW)\). For \(\ell \ell bb\), upper limits are set in the range 6.2–380 fb for gluon–gluon fusion and 6.8–210 fb for b-associated production of a narrow A boson in the mass range 230–800 \(\text {GeV}\), assuming the H boson is in the mass range 130–700 \(\text {GeV}\). For \(\ell \ell WW\), the observed upper limits are in the range 0.023–8.9 pb for gluon–gluon fusion production of a narrow A boson in the mass range 300–800 \(\text {GeV}\), assuming the H boson is in the mass range 200–700 \(\text {GeV}\). Taking into account both production processes, the \(\ell \ell bb\) search tightens the constraints on the 2HDM scenario in the case of large mass splittings between its heavier neutral Higgs bosons. The \(\ell \ell WW\) channel has not been explored previously at the LHC, and this search explicitly demonstrates its potential to constrain 2HDM parameters away from the weak decoupling limit.