1 Introduction

Following the discovery of the Higgs boson by the ATLAS and CMS Collaborations [1, 2] at the Large Hadron Collider (LHC), a comprehensive programme of measurements of the properties of this particle is underway. These measurements could uncover deviations from expected Standard Model (SM) branching ratios or allow for the possibility of decays into non-SM particles. Existing measurements constrain the non-SM or “exotic” branching ratio of the Higgs boson decays to less than approximately \(30\,\%\) at \(95\,\%\) confidence level (CL) [35]. Exotic decays are predicted by many theories of physics beyond the SM [6], including those with an extended Higgs sector such as the Next-to-Minimal Supersymmetric Standard Model (NMSSM) [711], several models of dark matter [1216], models with a first-order electroweak phase transition [17, 18], and theories with neutral naturalness [1921].

One of the simplest possibilities is that the Higgs boson decays to a pair of new spin-zero particles, a, which in turn decay to a pair of SM particles, mainly fermions [6].Footnote 1 These kinds of models have been used to explain the recent observations of a gamma-ray excess from the galactic centre by the Fermi Large Area Telescope (FermiLAT) [22, 23]. Several searches have been performed for \(H\rightarrow aa\). The D0 and ATLAS Collaborations have searched for a signal of \(H\rightarrow aa \rightarrow 2\mu 2\tau \) in the a-boson mass ranges \(3.7\,\mathrm{{GeV}} \le m_a \le 19\,\mathrm{{GeV}}\) and \(3.7\,\mathrm{{GeV}} \le m_a \le 50\,\mathrm{{GeV}}\), respectively [24, 25]. The D0 and CMS Collaborations have searched for the signature \(H \rightarrow aa \rightarrow 4\mu \) in the range \(2 m_{\mu } \le m_a \le 2 m_{\tau }\) [24, 26]. In this analysis, the a-boson is assumed to have a negligibly small lifetime. Several other searches have been performed by the ATLAS, CMS and LHCb Collaborations for signatures that may correspond to a long-lived a-boson: displaced decays of jets or displaced decays of collimated leptons [2732].

The result presented in this paper covers an unexplored decay mode in searches for \(H \rightarrow aa\) by considering \(a\rightarrow bb\). The a-boson can be either a scalar or a pseudoscalar under parity transformations, since the decay mode considered in this search is not sensitive to the difference in coupling. An example of a model with predominant \(a\rightarrow bb\) decays is one where the new scalar mixes with the SM Higgs boson and inherits its Yukawa couplings [6]. This search focuses on the \(pp \rightarrow WH\) process, with \(W\rightarrow \ell \nu \) (\(\ell =e,\mu \)) and \(H \rightarrow 2a \rightarrow 4b\) in the range \(20\,\mathrm{{GeV}}<m_a<60\,\mathrm{{GeV}}\). The resulting signature has a single lepton accompanied by a high multiplicity of jets originating from a bottom quark (b-jets). Since the b-jets are produced from the decay of the Higgs boson, they tend to have low transverse momentum (\(p_{\text {T}} \)) compared to \(m_H\) and can be overlapping, especially for light a-bosons. Events with an electron or muon, including those produced via leptonically decaying \(\tau \)-leptons, are considered. The WH process is chosen for this search because the charged lepton in the final state provides a powerful handle to efficiently trigger and identify these events against the more ubiquitous background process of strong production of four b-jets. Several kinematic variables, including the reconstructed masses in the decay \(H\rightarrow 2a \rightarrow 4b\), are used to identify signal events. The background estimation techniques, systematic uncertainties and statistical treatment closely follow those used in other ATLAS searches with similar signatures [3336].

2 ATLAS detector

The ATLAS detector [37] covers nearly the entire solid angleFootnote 2 around the collision point. It consists of an inner tracking detector surrounded by a thin superconducting solenoid magnet producing a 2 T axial magnetic field, electromagnetic and hadronic calorimeters, and an external muon spectrometer incorporating three large toroid magnet assemblies. The inner detector consists of a high-granularity silicon pixel detector, including the newly installed insertable B-layer [38], and a silicon microstrip tracker, together providing precision tracking in the pseudorapidity range \(|\eta |<2.5\), complemented by a transition radiation tracker providing tracking and electron identification information for \(|\eta |<2.0\). The electromagnetic (EM) sampling calorimeter uses lead as the absorber material and liquid argon (LAr) as the active medium, and is divided into barrel (\(|\eta |<1.475\)) and end-cap (\(1.375<|\eta |<3.2\)) regions. Hadron calorimetry is also based on the sampling technique, with either scintillator tiles or LAr as the active medium, and with steel, copper, or tungsten as the absorber material. The scintillator tile calorimeter is divided into barrel (\(|\eta |<1.0\)) and end-cap (\(0.8<|\eta |<1.7\)) regions, and the LAr hadronic calorimeter includes an end-cap (\(1.5<|\eta |<3.2\)) and a forward (\(3.1<|\eta |<4.9\)) region. The muon spectrometer measures the deflection of muons with \(|\eta |<2.7\) using multiple layers of high-precision tracking chambers in a toroidal field of approximately 0.5 T and 1 T in the central and end-cap regions of ATLAS, respectively. The muon spectrometer is also instrumented with separate trigger chambers covering \(|\eta |<2.4\). A two-level trigger system, consisting of a custom-hardware level followed by a software-based level, is used to reduce the event rate to a maximum of around 1 kHz for offline storage [39].

3 Event samples and object selection

The search presented in this paper is based on the proton–proton (pp) collision dataset collected by the ATLAS detector at the LHC at \(\sqrt{s}=13 \,\mathrm{{TeV}}\) with 25 ns bunch spacing during 2015. The full dataset corresponds to an integrated luminosity of \(3.2 \,{\text{ fb }^{-1}}\). The data for this search were collected using the single-electron or single-muon triggers with the lowest transverse momentum thresholds available [39].

Electron candidates are reconstructed by associating an inner-detector track with an isolated energy deposit in the EM calorimeter [40, 41]. Candidates are identified using the tight quality criteria and are required to have \(p_{\text {T}} >25\,\mathrm{{GeV}}\) and \(|\eta |<2.47\), excluding the transition region between the barrel and end-cap EM calorimeters, \(1.37<|\eta |<1.52\). Muon candidates are reconstructed by combining matching tracks in the inner detector and the muon spectrometer [42], and are required to satisfy the medium quality criteria and to have \(p_{\text {T}} >25\,\mathrm{{GeV}}\) and \(|\eta |<2.4\). Events are required to have exactly one reconstructed electron or muon that is matched within a cone of size \(\Delta R \equiv \sqrt{(\Delta \eta )^2 + (\Delta \phi )^2} = 0.15\) to the lepton candidate reconstructed by the trigger algorithms.

In order to distinguish leptons produced in the decays of \(W\) bosons from those produced in the decays of heavy-flavour hadrons, all lepton candidates are required to be consistent with originating from the primary interaction vertex, chosen as the vertex with the highest sum of the \(p_{\text {T}} ^2\) of its associated tracks. Furthermore, since lepton candidates arising from background sources, such as decays of hadrons, are typically embedded in jets, all lepton candidates are required to be isolated from other particles in the event. This is achieved by imposing a maximal allowed value on the energy deposited in the calorimeter and/or the momentum of inner-detector tracks within a cone of \(\Delta R = 0.2\) around the direction of the lepton candidate’s momentum. The isolation criteria, dependent on \(p_{\text {T}}\) and \(\eta \), are applied to produce a nominal efficiency of at least \(90\,\%\) for electrons and muons from \(Z\rightarrow \ell ^+\ell ^-\) decays after all other identification criteria are applied [42].

Jets are reconstructed from clusters [43] of energy in the calorimeters using the anti-\(k_{t}\) clustering algorithm [44, 45] with radius parameter \(R=0.4\). Jets are required to have \(p_{\text {T}} > 20\,\mathrm{{GeV}}\) and \(|\eta |<2.5\), and they are calibrated using energy- and \(\eta \)-dependent corrections. A track-based veto is used to suppress contributions from jets arising from additional pp interactions (pile-up) [46]. Jets consistent with the hadronisation of a b-quark are identified using information from track impact parameters and secondary vertices, which are combined in a multivariate discriminant [47]. The efficiency to identify b-quark jets (b-tagging) is approximately \(77\,\%\) for a factor of 126 in rejection against light-quark and gluon jets, about 5 against jets originating from c-quarks, and about 10 against hadronically decaying \(\tau \)-leptons, as determined in a simulated sample of top-quark pair (\(t\bar{t} \)) events [4749]. Jets tagged by this multivariate discriminant, independently of the flavour of the quark that initiated it, are called b-tagged jets throughout the text, while the term b-jet is reserved for those jets originating from b-quark decays, as determined from simulation.

Jets are required to be separated from the lepton candidates by \(\Delta R\) larger than 0.2 or 0.4 for electrons or muons, respectively. Electrons separated from the nearest jet by \(0.2<\Delta R < 0.4\) are considered part of the jet and not a lepton candidate. The transverse momentum imbalance \(\vec {E}_{\text {T}}^{\text {miss}}\), the magnitude of which (\(E_{\text {T}}^{\text {miss}} \)) is commonly referred to as missing transverse momentum, is defined as the negative vector sum of the transverse momenta of calibrated selected objects, such as electrons, muons and jets. The transverse momenta of charged-particle tracks compatible with the primary vertex and not matched to any of those objects are also included in the negative vector sum [50, 51].

4 Signal and background modelling

Simulated event samples are used to study the characteristics of the signal and to calculate its acceptance, and are also used for most of the SM background estimation. Signal samples of associated Higgs boson production with a \(W\) boson, \(pp \rightarrow W H\), are generated with Powheg-Box v2–r3033 [5255] using the CT10 parton distribution functions (PDFs) [56] at next-to-leading order (NLO). A Higgs boson mass of \(m_H = 125\,\mathrm{{GeV}}\) is assumed and the sample is normalised to the next-to-next-to-leading-order (NNLO) cross section recommended by the Higgs cross-section working group \(\sigma _\mathrm{SM}(WH) = 1.37\) pb [57]. The Higgs boson decay to two spin-zero a-bosons and the subsequent decay of each a-boson to a pair of b-quarks are simulated with Pythia  v8.186 [58]. The a-boson decay is done in the narrow-width approximation and the coupling to the b-quarks is assumed to be that of a pseudoscalar. However, since the polarisation of the quarks is not observable, this search is insensitive to the specific parity hypothesis. Pythia  v8.186 is used for the showering, hadronisation, and underlying-event (UE) simulation with the A14 set of tuned parameters (tune) [59]. The mass of the a-boson is varied for different signal hypotheses in the range \(20\,\mathrm{{GeV}} \le m_a \le 60\,\mathrm{{GeV}}\), in \(10\,\mathrm{{GeV}}\) mass steps. Different branching-ratio hypotheses are obtained by scaling the signal sample normalisation.

Samples of \(t\bar{t} \) are also produced using the NLO Powheg-Box v2–r3026 generator with the CT10 PDFs. A top-quark mass (\(m_t\)) of \(172.5\,\mathrm {GeV}\) is assumed. The Powheg-Box model parameter \(h_\mathrm{damp}\), which controls matrix element to parton shower (PS) matching and effectively regulates the high-\(p_{\text {T}} \) radiation, is set to \(h_\mathrm{damp} = m_t \). This setting was found to best describe the \(t\bar{t} \)-system \(p_{\text {T}} \) at \(\sqrt{s} = 7\mathrm {TeV}\) [60]. The baseline \(t\bar{t} \) sample is interfaced to Pythia  v6.428 [61] with the Perugia 2012 tune [62]. Alternative \(t\bar{t} \) samples are generated using Powheg-Box v2–r3026 interfaced to Herwig++ v2.7 [63] or MadGraph5_aMC@NLO [64] interfaced to Herwig++. The effects of initial- and final-state radiation (ISR/FSR) are explored using two alternative Powheg-Box  v2–r3026+Pythia  v6.428 samples. The first has \(h_\mathrm{damp}\) set to \(2 m_t\), the renormalisation and factorisation scales set to half the nominal value and uses the Perugia 2012 radHi UE tune, giving more radiation. The second sample uses the Perugia 2012 radLo UE tune, has \(h_\mathrm{damp}=m_t\) and has the renormalisation and factorisation scales set to twice the nominal value, giving less radiation [65]. The \(t\bar{t} \) samples are normalised to the NNLO theoretical cross section of \(832^{+46}_{-51}\) pb, obtained with Top++ v2.0 [6672].

The simulated \(t\bar{t} \) events are categorised depending on the parton-level flavour content of additional particle jetsFootnote 3 not originating from the decay of the \(t\bar{t} \) system. Events containing at least one additional particle jet matched to a b-hadron are labelled as \(t\bar{t}\). Events containing at least one additional particle jet matched to a c-hadron and no b-hadron are labelled as \(b\bar{b}\). The \(t\bar{t}\) and \(b\bar{b}\) categories are generically referred to as \(t\bar{t} \)+HF events (with HF standing for “heavy flavour”). Remaining events are labelled \(t\bar{t} \)+light-jets (referred to as \(t\bar{t} \)+light) and also include events with no additional particle jets.

The associated heavy-flavour jets in \(t\bar{t} \)+HF are modelled in Powheg-Box+Pythia via the PS evolution and are simulated with a five-flavour scheme. The \(t\bar{t}\) modelling is improved by reweighting the top-quark \(p_{\text {T}}\), \(t\bar{t}\)-system \(p_{\text {T}}\), and kinematic properties of the associated particle jets not originating from the top-quark decay [33] to agree with a \(t\bar{t}\) sample generated at NLO with Sherpa+OpenLoops [73, 74]. This Sherpa+OpenLoops sample is simulated with the four-flavour scheme (4FS) using Sherpa v2.1.1 [73] and the CT10 PDF set.

Samples of single-top-quark backgrounds corresponding to the Wt and s-channel production mechanisms are generated with Powheg-Box v2–r2819 [75, 76] using the CT10 PDF set. Overlaps between the \(t\bar{t}\) and Wt final states are handled using the “diagram removal” scheme [77]. Samples of t-channel single-top-quark events are generated using the Powheg-Box [78] NLO generator that uses the 4FS. The single-top-quark samples are normalised to the approximate NNLO theoretical cross sections [7981]. The parton shower, hadronisation and underlying event are modelled using either Pythia v6.428 with the Perugia 2012 tune or Herwig ++ v2.7 with the UE-EE-5 [82] tune.

Samples of W / Z+jets events are generated with the Sherpa v2.1.1 generator. The matrix-element calculation is performed up to two partons at NLO and up to four partons at leading order (LO) using Comix [83] and OpenLoops [74] and uses the CT10 PDFs. Both the W+jets and Z+jets samples are normalised to their respective inclusive NNLO theoretical cross section calculated with FEWZ [84].

Samples of diboson production WW / WZ / ZZ+jets events are generated with the NLO generator Sherpa v2.1.1. Samples of \(t\bar{t} + \gamma /W/Z \) events, including \(t\bar{t} + WW\), are generated with up to two additional partons using MadGraph5_aMC@NLO and interfaced to Pythia  v8.186. Samples of \(t\bar{t} + H\) events are generated using MadGraph5_aMC@NLO and interfaced to Herwig ++ v2.7.

The main signal and background samples use the EvtGen v1.2.0 [85] program to simulate the decay of heavy-flavour hadrons, except for those generated with Sherpa. All are then processed with the full simulation of the ATLAS detector [86] based on GEANT4 [87]. The alternative \(t\bar{t} \) samples used to estimate systematic uncertainties are based on a fast simulation of the calorimeter response [88]. Events are generated with pile-up that is simulated with Pythia  v8.186 [58] and are reweighted so that the distribution of the multiplicity of pile-up interactions matches the distribution observed in the data. Simulated event samples are processed using the same reconstruction algorithms and analysis chain as the data.

As described in Sect. 5, backgrounds are estimated by fitting predictions derived from simulation to data in several background-enriched samples. The only background prediction not derived from simulation is the multijet background, which contributes to the selected data sample when a jet is mis-reconstructed as a lepton and satisfies the identification criteria. In the electron channel, it consists of non-prompt electrons from heavy-flavour decays, from unidentified photon conversions or from jets with a high fraction of energy deposited in the EM calorimeter. In the muon channel, it consists of heavy-flavour decays and in-flight decays of light mesons.

The multijet background contribution is evaluated from data using the “matrix method” [34, 89, 90], which uses differences between the isolation properties of background (fake/non-prompt) leptons and signal (prompt) leptons from W boson decays. The estimate uses a sample enriched in multijet background events obtained by applying the full event selection except for loosening the lepton isolation requirement. Each event with a lepton candidate that satisfies at least the loosened isolation requirement is scaled by a weight that depends on whether this lepton candidate also satisfies the tighter isolation requirement. The weights are determined from the efficiencies for fake/non-prompt and prompt leptons satisfying the loosened isolation requirement to also satisfy the tighter one [90]. These efficiencies are measured in data control samples enriched in either fake/non-prompt leptons, mostly multijet events, or prompt leptons, mostly \(Z\rightarrow \ell ^+\ell ^-\) events. The shape of each multijet background distribution is derived by applying the same method to the sample obtained with an identical selection as described in Sect. 5, but lowering the b-tagged-jet multiplicity requirement to two. This strategy reduces the statistical uncertainty of the multijet background estimate, improving the stability of the fitting method described in Sect. 5.2.

Table 1 List of variables used in the three signal regions as inputs to the BDT multivariate discriminant and used in the five control regions. The variables are described in the text

5 Analysis strategy

The \(H \rightarrow 2a \rightarrow 4b\) decay chain is expected to have multiple b-tagged jets, often three or four, satisfying the object selection. The dominant background arises from \(t\bar{t}\) events. Preselected events are required to have exactly one electron or muon and at least three jets, of which at least two must be b-tagged. Events are required to satisfy \(E_{\text {T}}^{\text {miss}} >25\,\mathrm{{GeV}}\) and the transverse massFootnote 4 must fulfil \(m_{\mathrm T}^W>50\,\mathrm{{GeV}}\), in order to be consistent with \(W\) boson decays. Events are categorised into eight channels depending on the number of jets (3, 4 and \(\ge \)5) and the number of b-tagged jets (2, 3 and \(\ge \)4). These analysis channels are referred to as (nj, mb) indicating n selected jets including m b-tagged jets.

The categories most sensitive to the \(H \rightarrow 2a \rightarrow 4b\) decay chain are (3j, 3b), (4j, 3b) and (4j, 4b). In these channels, background \(t\bar{t} \) events can only satisfy the selection criteria if accompanied by additional b-tagged jets. In the case of (3j, 3b) or (4j, 3b), the main sources of \(t\bar{t} \) background are events with jets mis-identified as b-jets, particularly from \(W \rightarrow cs\) decays, where the c-jet is mis-identified, and from \(W \rightarrow \tau \nu \), where the \(\tau \)-lepton decays hadronically and is likewise mis-identified. In the case of (4j, 4b), the \(t\bar{t} \) background includes more events with genuine b-quarks from gluon splitting to \(b\bar{b}\) pairs. The main purpose of the five other jet and b-tagged-jet multiplicity channels is to constrain the \(t\bar{t} \)+jets background prediction and the related systematic uncertainties (see Sect. 6) through a profile likelihood fit to data (see Sect. 5.2).

The \(t\bar{t} \)+light background is dominant in the sample of events with exactly two or three b-tagged jets. The background processes \(b\bar{b}\) and \(t\bar{t}\) become more important as the jet and b-tagged-jet multiplicities increase. In particular, the \(t\bar{t}\) background dominates for events with \(\ge \)5 jets and \(\ge \)4 b-tagged jets.

5.1 Signal and background discrimination

In order to improve the sensitivity of the search, several kinematic variables are identified to distinguish between signal and background, and are combined into a boosted decision tree (BDT) multivariate discriminant [91] that uses the AdaBoost algorithm [92]. The BDT is trained to discriminate between signal events with an a-boson mass of 60 GeV and \(t\bar{t} \) events. As described below, the variables chosen as input for the BDT do not depend strongly on the value of \(m_a\) and provide excellent separation between signal and background, so training each mass hypothesis separately with these variables would only slightly improve the sensitivity of the search. The training is performed separately for each of the channels (3j, 3b), (4j, 3b) and (4j, 4b) since the signal and background kinematics differ between them.

Signal events are characterised by the presence of a resonance resulting from the Higgs boson decay \(H \rightarrow 2a \rightarrow 4b\). Two variables are used to reconstruct particles from the signal decay chain. The first is the reconstructed invariant mass of the b-tagged jets, \(m_{bbb}\) or \(m_{bbbb}\), defined for events with three or four b-tagged jets respectively, which peaks around the Higgs boson mass for signal events. In the case of three b-tagged jets, the peak is due to events where two b-quarks are merged in a single jet or one of the b-quarks is very soft in an asymmetric decay and has a small impact on the kinematics. The second discriminating variable for events with four b-tagged jets is the minimum difference between the invariant masses of bb pairs (\(\Delta m_\mathrm{min}^{bb}\)). For signal events, two pairs of b-quarks originate from a pair of a-bosons, so for the correct jet pairing, \(m_{bb} \approx m_a\), and the difference between the invariant masses of the bb pairs is smaller for signal than for \(t\bar{t} \) background events.

Fig. 1
figure 1

Comparison of data with the SM background predictions for the distributions of a \(m_{bbb}\), b \(m_{bbbb}\) and c \(\Delta m_\mathrm{min}^{bb}\) in the sample that is inclusive in number of jets and b-tagged jets. Distributions for the signal model (WH, \(H\rightarrow 2a \rightarrow 4b\)), with \(m_a=\) 60 GeV, normalised to the SM \(pp \rightarrow WH\) cross section, assuming BR\((H\rightarrow aa)\) \(\times \) BR\((a \rightarrow bb)^2 = 1\) and scaled by a factor of 1000, are overlaid. The hashed area represents the total uncertainty in the background. Comparisons use events with \(\ge \)3 jets, except when at least four jets are necessary to define the variable, in which case events with \(\ge \)4 jets are used. The last bin contains the overflow. Markers are not drawn if they are outside the y-axis range

Fig. 2
figure 2

Comparison of data with the SM background predictions for the distributions of a \(H_\mathrm{T}\), b \(p_{\text {T}} ^W\), c \(\Delta R_\mathrm{av}^{bb}\) and d \(\Delta R_\mathrm{min}^{\ell b}\) in the sample that is inclusive in number of jets and b-tagged jets. Distributions for the signal model (WH, \(H\rightarrow 2a \rightarrow 4b\)), with \(m_a=60\,\mathrm{{GeV}}\), normalised to the SM \(pp \rightarrow WH\) cross section, assuming \(\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2 = 1\) and scaled by a factor of 1000, are overlaid. The hashed area represents the total uncertainty in the background. The last bin contains the overflow

Fig. 3
figure 3

Comparison of data with the SM background predictions for the distributions of a \(m_{bbj}\) and b \(m_\mathrm{T2}\) in the sample that is inclusive in number of jets and b-tagged jets. Distributions for the signal model (WH, \(H\rightarrow 2a \rightarrow 4b\)), with \(m_a=60\,\mathrm{{GeV}}\), normalised to the SM \(pp \rightarrow WH\) cross section, assuming \(\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2 = 1\) and scaled by a factor of 1000, are overlaid. The hashed area represents the total uncertainty in the background. The last bin contains the overflow

Additional kinematic variables exhibit differences between signal and background. The \(H_\mathrm{T}\) variable, defined as the scalar sum of \(p_{\text {T}} \) for all jets in the event, is a measure of the total hadronic energy in the event, which is typically larger for \(t\bar{t} \) than for WH events. The transverse momentum of the W boson, \(p_{\text {T}} ^W\), constructed from the vector sum of the \(\vec {E}_{\text {T}}^{\text {miss}}\) and the lepton \(\vec {p}_{\text {T}} \), is slightly higher for signal WH events, where the W boson recoils against the Higgs boson, than for background \(t\bar{t}\) events. Another variable used is the average angular separation between all pairs of b-tagged jets, referred to as \(\Delta R_\mathrm{av}^{bb}\). For background \(t\bar{t} \) events, the b-tagged jets originate from the decays of the two top quarks and tend to be spatially more separated than for the signal. A related variable is the minimum \(\Delta R\) separation between any b-tagged jet and the lepton, \(\Delta R_\mathrm{min}^{\ell b}\). In \(t\bar{t} \) background events, the lepton is typically closer to a b-tagged jet than in signal events, since the lepton and the nearest b-tagged jet often originate from the same top-quark decay. In the case of the signal, the Higgs boson and hence the b-jets recoil against the W boson, which the lepton comes from.

Finally, two variables are used to identify particles from the dominant \(t\bar{t} \) background decay chain. The first variable is used in the (4j, 3b) channel to distinguish between \(t\bar{t} \) events with two b-tagged jets from the top-quark decays and \(t\bar{t} \) events with a third b-tagged jet from a mis-identified charm or light jet from the hadronically decaying \(W\) boson. The invariant mass of two b-tagged jets, selected as the pair with the smallest \(\Delta R\) separation, and the non-b-tagged jet, \(m_{bbj}\), reconstructs the hadronically decaying top quark, peaking around the top-quark mass for these background events. The second variable, used in the (4j, 4b) channel, is a variant of the \(m_\mathrm{T2}\) observable, defined as the minimum “mother” particle mass compatible with all the transverse momenta and mass-shell constraints [93], that identifies events with several invisible particles. In the case of the \(t\bar{t}\) background events, in addition to the \(E_{\text {T}}^{\text {miss}}\) from the neutrino from a leptonic W boson decay, invisible particles may arise from a \(\tau \)-lepton decay or from a lost jet from a W boson. In these cases, the \(m_\mathrm{T2}\) has an endpoint at the top-quark mass, which is not the case for the signal.

Table 1 indicates which variables are used to train each of the three BDT discriminants for the (3j, 3b), (4j, 3b), and (4j, 4b) categories. Figures 1, 2 and 3 show the expected distributions of the kinematical variables obtained after using the statistical procedure and the systematic uncertainties described in Sects. 5.2 and 6, respectively. These variables are used in the BDT discriminants for signal and background for all events that satisfy the event selection criteria, and are shown in Figs. 1, 2 and 3 inclusively in number of jets and b-tagged jets. The distributions are dominated by events with the minimum number of b-tagged jets. In this comparison, the jets in each event are ordered by value of the b-tagging discriminant and those with the highest score are used to calculate the input variables of the BDT, even if they do not satisfy the b-tagging criteria used in this analysis. The distributions are similar to those obtained in each analysis channel and indicate that each variable individually has some signal and background discrimination power. The tail in the \(m_{bbbb}\) distribution for signal events, shown in Fig. 1, is mainly formed by events with jets mis-associated to the a-boson decay. The tail is greatly reduced in the signal regions with the tighter requirement on the number of b-tagged jets. Figure 4 shows the BDT discriminant for signal and background events that satisfy the event selection criteria inclusively in number of jets and b-tagged jets. These distributions are used to validate the BDT modelling in background-enriched samples with kinematic properties that are similar to those in the signal regions.

Fig. 4
figure 4

Comparison of data with the SM background predictions for the distributions of a BDT (3j, 3b), b BDT (4j, 3b), and c BDT (4j, 4b) in the sample that is inclusive in number of jets and b-tagged jets. Distributions for the signal model (WH, \(H\rightarrow 2a \rightarrow 4b\)), with \(m_a=\) 60 GeV, normalised to the SM \(pp \rightarrow WH\) cross section, assuming BR\((H\rightarrow aa)\) \(\times \) BR\((a \rightarrow bb)^2 = 1\) and scaled by a factor of 1000, are overlaid. The hashed area represents the total uncertainty in the background. Comparisons use events with \(\ge \)3 jets, except when at least four jets are necessary to define the BDT discriminant, in which case events with \(\ge \)4 jets are used. The BDT output is determined in the range \([-1,1]\). The first and last bin contain the underflow and overflow, respectively

5.2 Fitting procedure

The distributions of the final discriminants in the eight analysis channels considered are combined to test the presence of a signal. The BDT discriminant, described in Sect. 5.1, is used for the channels enriched with signal, (3j, 3b), (4j, 3b) and (4j, 4b), while the \(H_{\text {T}} \) distribution is used in the five control channels. The statistical analysis is based on a binned likelihood function constructed as a product of Poisson probability terms over all bins considered in the search.

The likelihood function, L, depends on the parameter of interest, the signal-strength \(\mu \), defined as:

$$\begin{aligned} \mu = \sigma (WH)\times \mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2, \end{aligned}$$
(1)

where \(\sigma (WH)\) is the production cross section for \(pp \rightarrow WH\).

Systematic uncertainties in the signal and background predictions (see Sect. 6) are accounted for in the likelihood function as a set of nuisance parameters, \(\varvec{\theta }\). These parameters are implemented as Gaussian priors in the case of shape uncertainties and log-normal priors for uncertainties affecting the normalisation, with width parameters corresponding to the size of the respective uncertainties. Statistical uncertainties in the background estimates in each bin of the discriminant distributions are also taken into account via dedicated nuisance parameters in the fit.

Table 2 Summary of the impact of the considered systematic uncertainties (in %) on the normalisation of the signal (\(m_a\) = 60 GeV) and the main backgrounds for the (4j, 4b) channel after the fit. The total uncertainty can differ from the sum in quadrature of individual sources due to correlations between them

The background-only hypothesis is tested by fitting the background predictions to the observed data, setting \(\mu =0\) and maximising the likelihood over \(\varvec{\theta }\). The best-fit \(\mu \) is obtained by performing a binned likelihood fit to the data under the signal-plus-background hypothesis, i.e. maximising the likelihood function \(L(\mu ,\varvec{\theta })\) over \(\mu \) and \(\varvec{\theta }\). The nuisance parameters \(\varvec{\theta }\) allow variations of the predicted signal and background according to the corresponding systematic uncertainties, and their fitted values correspond to the deviations from the nominal predictions that globally provide the best fit to the data. This procedure allows a reduction of the impact of systematic uncertainties on the search sensitivity by taking advantage of the highly populated background-dominated channels included in the likelihood fit.

6 Systematic uncertainties

Several sources of systematic uncertainty are considered that affect the normalisation or the shape of the signal and background contributions to the final discriminant distributions. Each source of systematic uncertainty is considered to be uncorrelated with other sources, but correlated across processes and channels where appropriate. This section describes the sources of systematic uncertainty considered in this search.

Luminosity and pile-up The uncertainty in the integrated luminosity is \(5\,\%\), affecting the overall normalisation of all processes estimated from the simulation. It is derived, following a methodology similar to that detailed in Ref. [94], from a calibration of the luminosity scale using xy beam-separation scans performed in August 2015. The uncertainty associated with the modelling of pile-up arises mainly from differences between the expected and observed fraction of the visible pp cross section.

Reconstructed objects Uncertainties associated with leptons arise from the reconstruction, identification and trigger efficiencies, as well as lepton momentum scales and resolutions. These efficiencies are measured using tag-and-probe techniques on \(Z\rightarrow \ell ^+\ell ^-\) data and simulated events. The small differences found are corrected in the simulation. Negligible uncertainties arise from the corrections applied to adjust the lepton momentum scales and resolutions in simulation to match those in data. The combined effect of all these uncertainties results in an overall normalisation uncertainty in the signal and background of less than \(1\,\%\).

Fig. 5
figure 5

Comparison between the data and prediction for the distribution of the \(H_\mathrm{T}\) variable used in the control regions with two b-tagged jets. These distributions are after the fit is performed on data under the background-only hypothesis. The hashed area represents the total uncertainty in the background. The last bin contains the overflow

Fig. 6
figure 6

Comparison between the data and prediction for the distribution of the \(H_\mathrm{T}\) variable used in the control regions with three and four b-tagged jets. These distributions are after the fit is performed on data under the background-only hypothesis. The last bin contains the overflow

Fig. 7
figure 7

Comparison between the data and prediction for the distribution of the BDT discriminant used in the signal regions. These distributions are after the fit is performed on data under the background-only hypothesis. The hashed area represents the total uncertainty in the background. The distributions for the signal model (WH, \(H\rightarrow 2a \rightarrow 4b\)), with \(m_a=60\,\mathrm{{GeV}}\), are normalised to the SM \(pp \rightarrow WH\) cross section, assuming \(\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2 = 1\). The BDT output is determined in the range \([-1,1]\). The first and last bin contain the underflow and overflow, respectively. Markers are not drawn if they are outside the y-axis range

Uncertainties associated with jets arise from the efficiency of jet reconstruction and identification, as well as the jet energy scale and resolution. The largest contribution comes from the jet energy scale uncertainty, which depends on jet \(p_{\text {T}} \) and \(\eta \). It affects the normalisation of signal and backgrounds by approximately \(5\,\%\) in the most sensitive search channels. Uncertainties associated with energy scales and resolutions of leptons and jets are propagated to \(E_{\text {T}}^{\text {miss}} \). An uncertainty in the contribution from charged-particle tracks is also included in the \(E_{\text {T}}^{\text {miss}} \) uncertainty [51]. Additional uncertainties originating from the modelling of the underlying event are negligibly small.

Several uncertainties are associated with the identification of the jet flavour, in particular the modelling of the b-, c-, and light-jet-tagging efficiencies in the simulation, which are corrected to match the efficiencies measured in data [4749]. These uncertainties are derived from studies performed with data at \(\sqrt{s} = 8 \mathrm{{TeV}}\) and are extrapolated to \(13 \mathrm{{TeV}}\). They depend on the jet \(p_{\text {T}} \) and the light-jet-tagging additionally depends on the jet \(\eta \). The sources of systematic uncertainty in the tagging efficiencies are taken as uncorrelated between b-jets, c-jets, and light-jets. They have their largest impact in the (4j, 4b) channel, resulting in \(4\,\%\) uncertainty in the \(t\bar{t}\) background normalisation associated with the uncertainty in the b-jet-tagging scale factors, \(8\,\%\) uncertainty in the \(b\bar{b}\) background normalisation associated with the uncertainty in the c-jet-tagging scale factors, and \(45\,\%\) uncertainty in the normalisation of the \(t\bar{t} \)+light background normalisation associated with the uncertainty in the light-jet-tagging scale factors.

Background modelling: Several sources of systematic uncertainty affecting the modelling of \(t\bar{t} \)+jets are considered. An uncertainty of approximately 6 % is assumed for the \(t\bar{t} \) production cross section [72], including contributions from variations of the factorisation and renormalisation scales, and uncertainties arising from the PDFs, \(\alpha _\mathrm{S}\), and the top-quark mass.

A 50 % uncertainty is assigned to the normalisation of the \(t\bar{t}\) background. This uncertainty is derived from a comparison of the \(t\bar{t}\) production cross sections predicted by Powheg-Box+Pythia and by Sherpa+OpenLoops at NLO (see Sect. 4) [33]. An additional \(50\,\%\) uncertainty is assigned to the component of the \(t\bar{t}\) background that contains exactly one b-hadron not originating from a top-quark decay matched to a particle jet. The same systematic uncertainty of \(50\,\%\) is applied to the normalisation of the \(b\bar{b}\) background in the absence of an NLO prediction for this process. The uncertainties in the \(t\bar{t}\) components and \(b\bar{b}\) are treated as uncorrelated.

Systematic uncertainties affecting the shape of the \(t\bar{t} \) background account for the choice of generator, the choice of parton shower and hadronisation models, and the effects of initial- and final-state radiation. The uncertainties are derived from comparisons between the nominal simulation and alternative samples produced with Powheg-Box or MadGraph5_aMC@NLO interfaced to Pythia or Herwig ++ (see Sect. 4) and are treated as uncorrelated across \(t\bar{t} \)+jets backgrounds. Additional uncertainties are evaluated to account for the use of Sherpa+OpenLoops NLO to model the \(t\bar{t}\) background. In particular, uncertainties are assessed for the PDFs, as well as the choice of shower recoil model and scale. An additional uncertainty accounts for limited knowledge of the component of the \(t\bar{t}\) background originating from multiple parton interactions, which is not included in the NLO prediction. These systematic uncertainties are estimated following the methods described in Ref. [33].

Table 3 Expected event yields of the SM background processes in the three signal regions after performing the fit with the background-only hypothesis. The observed data and the number of expected signal events are also indicated. The signal yields are quoted for some representative values of \(m_{a}\) and assume the SM \(pp \rightarrow WH\) cross section, \(\sigma _\mathrm{SM}(WH) = 1.37\) pb [57], and \(\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2 = 1\). The uncertainties include statistical and systematic components (systematic uncertainties are discussed in Sect. 6). The total uncertainty can differ from the sum in quadrature of individual sources due to correlations between them

The uncertainties in the predictions for the total cross sections for the other background processes are applied as normalisation uncertainties and are: \(5\,\%\) for each of the W / Z+jets and diboson processes, \(+5\,\%\)/\(-4\,\%\) for single-top-quark production, \(15\,\%\) for \(t\bar{t} +\gamma /W/Z \) and \(+9\,\%\)/\(-12\,\%\) for \(t\bar{t} H\) [7981, 9599]. An additional uncertainty of \(24\,\%\) is added in quadrature for each additional jet to account for the extrapolation to higher jet multiplicities, based on a comparison among different algorithms for merging LO matrix-element and parton shower simulations [100]. An uncertainty is applied to the modelling of the single-top-quark background to account for the choice of scheme to handle the overlaps between the \(t\bar{t}\) and Wt final states. Small uncertainties arising from scale variations, which change the amount of initial-state radiation and thus the event kinematics, are also considered.

Uncertainties in the estimate of the multijet background come from the limited number of events in the data sample without the isolation requirement and from uncertainties in the measured non-prompt and prompt lepton efficiencies. The normalisation uncertainty assigned to this background is 60 %, as derived by comparing the multijet background prediction to data in control regions obtained by inverting the requirements on the \(E_{\text {T}}^{\text {miss}} \) and on \(m_{\mathrm T}^W\). An uncertainty in the shape of the predicted background distribution covers the difference between the prediction obtained by reducing the required number of b-tagged jets and the prediction at high b-tagged-jet multiplicity (see Sect. 4).

Signal modelling Several sources of systematic uncertainty affect the theoretical modelling of the signal acceptance. Uncertainties originate from the choice of PDFs, the factorization and renormalization scales, and the parton shower, hadronisation and underlying event models.

Fig. 8
figure 8

Upper limit at 95 % CL on \(\sigma (WH)\times \mathrm {BR}\), where \(BR=\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2\), versus \(m_a\). The observed \(({\mathrm {CL}}_{\mathrm {s}})\) values (solid black line) are compared to the expected (median) \(({\mathrm {CL}}_{\mathrm {s}})\) values under the background-only hypothesis (dotted black line). The surrounding shaded bands correspond to the 68 and 95 % CL intervals around the expected \(({\mathrm {CL}}_{\mathrm {s}})\) values, denoted by \(\pm 1\sigma \) and \(\pm 2\sigma \), respectively. The solid red line indicates the SM \(pp \rightarrow WH\) cross section, assuming \(\mathrm {BR}(H\rightarrow aa) \times \mathrm {BR}(a \rightarrow bb)^2 = 1\). Markers are not drawn if they are outside the y-axis range

As described in Sect. 5.2, a binned maximum-likelihood fit is performed on the distributions of the final discriminant in the eight channels considered. The fit constrains systematic uncertainties from several sources thanks to the large number of events in the analysis channels considered and the variations in the background composition across channels. The channels with two b-tagged jets constrain the main uncertainties affecting the \(t\bar{t} \)+light background prediction, while the channels with \(\ge \)5 jets and \(\ge \)3 b-tagged jets are sensitive to the dominant uncertainties affecting the \(t\bar{t} \)+HF background prediction.

After performing the fit, the leading sources of systematic uncertainty are the modelling of the \(t\bar{t} \)+jets background and b-, c- and light-jet-tagging efficiencies. Table 2 summarises the systematic uncertainties by indicating their impact on the normalisation of the signal and the main backgrounds in the (4j, 4b) channel. The uncertainties for the other signal channels (3j, 3b) and (4j, 3b) are reduced to about \(7\,\%\) for the \(t\bar{t} +\)light contribution, mainly due to the reduced dependence on the light-jet-tagging efficiency, and to about \(12\,\%\) for the signal, primarily because of the reduced b-tagging efficiency uncertainty due to the lower b-tagged-jet multiplicity requirement.

7 Results

The best fit of the background predictions to data in the binned maximum-likelihood fit is shown in Figures 5, 6 and 7. Table 3 shows the resulting yields and uncertainties for the signal regions after the fit. The SM background yields obtained after performing the fit are in agreement with the results from a fit using only the \(H_{\text {T}} \) distributions in the control regions.

In the absence of a significant excess of data above the background prediction, upper limits are calculated for \(\mu \), defined in Eq. (1). The modified frequentist method \(({\mathrm {CL}}_{\mathrm {s}})\) [101] and asymptotic formulae [102] are used. Figure 8 shows the upper limits obtained at 95 % CL. The mass hypothesis \(m_a\) is tested in steps of \(10\,\mathrm{{GeV}}\) between 20 and \(60\,\mathrm{{GeV}}\). The observed (expected) 95 % CL upper limits on \(\mu \) range from 6.2 (8.6) pb, assuming \(m_a = 20\,\mathrm{{GeV}}\), to 1.5 (2.0) pb, assuming \(m_a = 60\,\mathrm{{GeV}}\). Assuming the SM \(pp \rightarrow WH\) cross section, it is not possible to set limits on the branching fraction with the amount of data used. The reduced sensitivity for the light a-boson hypothesis is due to a lower acceptance caused by overlapping b-jets. The event yields indicated in Table 3 correspond to the sum of all BDT bins, while the fit is most sensitive in the highest BDT bins, where the data are slightly below the prediction, and hence the observed limit is slightly lower than the expected one.

8 Conclusion

This paper presents a dedicated search for exotic decays of the Higgs boson to a pair of new spin-zero particles, \(H \rightarrow aa\), where the new a-boson decays to b-quarks. The search focuses on the process \(pp \rightarrow WH\) where the Higgs boson is produced in association with a \(W\) boson. The analysis uses the pp collision dataset at \(\sqrt{s} = 13 \mathrm{{TeV}}\) recorded by the ATLAS detector at the LHC in 2015, corresponding to an integrated luminosity of \(3.2 \pm 0.2 {\text{ fb }^{-1}}\). The search for \(H \rightarrow 2a \rightarrow 4b\) is performed in the mass range \(20\,\mathrm{{GeV}} \le m_a \le 60\,\mathrm{{GeV}}\). The analysis uses several kinematic variables combined in a multivariate discriminant in signal regions and uses control regions to reduce the uncertainties in the backgrounds. No significant excess of data is observed relative to the SM predictions. Upper limits are derived for the product of the production cross section for \(pp \rightarrow WH\) times the branching ratio for the decay \(H \rightarrow 2a \rightarrow 4b\). The upper limit ranges from 6.2 pb for an a-boson mass \(m_a = 20\,\mathrm{{GeV}}\) to 1.5 pb for \(m_a = 60\,\mathrm{{GeV}}\).