1 Introduction

The discovery of a Higgs boson by the ATLAS and CMS experiments [1, 2] at the LHC [3] offers a novel opportunity to search for new sources of CP violation in the interaction of the Higgs boson with other Standard Model (SM) particles. C and CP violation is one of the three Sakharov conditions [4,5,6] needed to explain the observed baryon asymmetry of the universe. In the SM with massless neutrinos the only source of CP violation is the complex phase in the quark mixing (CKM) matrix [7, 8]. The measured size of the complex phase and the derived magnitude of CP violation in the early universe is insufficient to explain the observed value of the baryon asymmetry [9] within the SM [10, 11] and, most probably, new sources of CP violation beyond the SM need to be introduced. No observable effect of CP violation is expected in the production or decay of the SM Higgs boson. Hence any observation of CP violation involving the observed Higgs boson would be an unequivocal sign of physics beyond the SM.

The measured Higgs boson production cross sections, branching ratios and derived constraints on coupling-strength modifiers, assuming the tensor structure of the SM, agree with the SM predictions [12, 13]. Investigations of spin and CP quantum numbers in bosonic decay modes and measurements of anomalous couplings including CP-violating ones in the decay into a pair of massive electroweak gauge bosons show no hints of deviations from the tensor structure of the SM Higgs boson [14, 15]. Differential cross-section measurements in the decay \(H \rightarrow \gamma \gamma \) have been used to set limits on couplings including CP-violating ones in vector-boson fusion production in an effective field theory [16]. However, the observables, including absolute event rates, used in that analysis were CP-even and hence not sensitive to the possible interference between the SM and CP-odd couplings and did not directly test CP invariance. The observables used in this analysis are CP-odd and therefore sensitive to this interference and the measurement is designed as a direct test of CP invariance.

In this paper, a first direct test of CP invariance in Higgs boson production via vector-boson fusion (VBF) is presented, based on proton–proton collision data corresponding to an integrated luminosity of 20.3 fb\(^{-1}\)  collected with the ATLAS detector at \(\sqrt{s}\) = 8 \(\,\mathrm{TeV}\) in 2012. A CP-odd Optimal Observable [17,18,19] is employed. The Optimal Observable combines the information from the multi-dimensional phase space in a single quantity calculated from leading-order matrix elements for VBF production. Hence it does not depend on the decay mode of the Higgs boson. A direct test of CP invariance is possible measuring the mean value of the CP-odd Optimal Observable. Moreover, as described in Sect. 2, an ansatz in the framework of an effective field theory is utilised, in which all CP-violating effects corresponding to operators with dimensions up to six in the couplings between a Higgs boson and an electroweak gauge boson can be described in terms of a single parameter \(\tilde{d}\). Limits on \(\tilde{d}\) are derived by analysing the shape of spectra of the Optimal Observable measured in \(H\rightarrow \tau \tau \) candidate events that also have two jets tagging VBF production. The event selection, estimation of background contributions and of systematic uncertainties follows the analysis used to establish \(4.5\sigma \) evidence for the \(H\rightarrow \tau \tau \) decay [20]. Only events selected in the VBF category are analysed, and only fully leptonic \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) or semileptonic \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) decays of the \(\tau \)-lepton pair are considered.

The theoretical framework in the context of effective field theories is discussed in Sect. 2 and the methodology of testing CP invariance and the concept of the Optimal Observable are introduced in Sect. 3. After a brief description of the ATLAS detector in Sect. 4, the simulated samples used are summarised in Sect. 5. The experimental analysis is presented in Sect. 6, followed by a description of the statistical method used to determine confidence intervals for \(\tilde{d}\) in Sect. 7. The results are discussed in Sect. 8, following which conclusions are given.

2 Effective Lagrangian framework

The effective Lagrangian considered is the SM Lagrangian augmented by CP-violating operators of mass dimension six, which can be constructed from the Higgs doublet \(\Phi \) and the U(1)\(_Y\) and SU(2)\(_{I_W,\mathrm{L}}\) electroweak gauge fields \(B^\mu \) and \(W^{a,\mu }\) \((a = 1,2,3)\), respectively. No CP-conserving dimension-six operators built from these fields are taken into account. All interactions between the Higgs boson and other SM particles (fermions and gluons) are assumed to be as predicted in the SM; i.e. the coupling structure in gluon fusion production and in the decay into a pair of \(\tau \)-leptons is considered to be the same as in the SM.

The effective U(1)\(_Y\)- and SU(2)\(_{I_W,\mathrm{L}}\)-invariant Lagrangian is then given by (following Ref. [21, 22]):

$$\begin{aligned} \mathcal{L}_{\mathrm {eff}} = \mathcal{L}_{\mathrm {SM}} + \frac{f_{\tilde{B}B}}{\Lambda ^2}\mathcal{O}_{\tilde{B}B} + \frac{f_{\tilde{W}W}}{\Lambda ^2}\mathcal{O}_{\tilde{W}W} + \frac{f_{\tilde{B}}}{\Lambda ^2}\mathcal{O}_{\tilde{B}} \end{aligned}$$
(1)

with the three dimension-six operators

$$\begin{aligned}&\mathcal{O}_{\tilde{B}B} = \Phi^+ \hat{\tilde{B}}_{\mu\nu}\hat{B}^{\mu\nu} \Phi \nonumber\\&\mathcal{O}_{\tilde{W}W} = \Phi^+\hat{\tilde{W}}_{\mu\nu} \hat{W}^{\mu\nu} \Phi\nonumber\\&\mathcal{O}_{\tilde{B}} = (D_\mu \Phi)^+ \hat{\tilde{B}}^{\mu\nu}D_\nu \Phi.\end{aligned}$$
(2)

and three dimensionless Wilson coefficients \(f_{\tilde{B}B}\), \(f_{\tilde{W}W}\) and \(f_{\tilde{B}}\); \(\Lambda \) is the scale of new physics.

Here \(D_\mu \) denotes the covariant derivative \(D_\mu = \partial _\mu +\frac{\mathrm {i}}{2}g^\prime B_\mu + \mathrm {i} g \frac{\sigma _a}{2} W_\mu ^a\), \(\hat{V}_{\mu \nu }\) (\(V = B, W^a\)) the field-strength tensors and \(\tilde{V}_{\mu \nu } = \frac{1}{2} \epsilon _{\mu \nu \rho \sigma }{V}^{\rho \sigma }\) the dual field-strength tensors, with \(\hat{B}_{\mu \nu } + \hat{W}_{\mu \nu } = \mathrm {i} \frac{g^\prime }{2} B_{\mu \nu } + \mathrm {i} \frac{g}{2} \sigma ^a W^a_{\mu \nu }\).

The last operator \(\mathcal{O}_{\tilde{B}}\) contributes to the CP-violating charged triple gauge-boson couplings \(\tilde{\kappa }_{\gamma }\) and \(\tilde{\kappa }_{Z}\) via the relation \(\tilde{\kappa }_{\gamma } = - \cot ^2 \theta _W \tilde{\kappa }_{Z} = \frac{m_W^2}{2\Lambda ^2} f_{\tilde{B}}\). These CP-violating charged triple gauge boson couplings are constrained by the LEP experiments [23,24,25] and the contribution from \(\mathcal{O}_{\tilde{B}}\) is neglected in the following; i.e. only contributions from \(\mathcal{O}_{\tilde{B}B}\) and \(\mathcal{O}_{\tilde{W}W}\) are taken into account.

After electroweak symmetry breaking in the unitary gauge the effective Lagrangian in the mass basis of Higgs boson H, photon A and weak gauge bosons Z and \(W^\pm \) can be written, e.g. as in Ref. [26]:

$$\begin{aligned} \mathcal{L}_{\mathrm {eff}}= & {} \mathcal{L}_{\mathrm {SM}} + \tilde{g}_{HAA} H \tilde{A}_{\mu \nu }{A}^{\mu \nu } + \tilde{g}_{HAZ} H \tilde{A}_{\mu \nu }{Z}^{\mu \nu }\nonumber \\&\,+ \tilde{g}_{HZZ} H \tilde{Z}_{\mu \nu }{Z}^{\mu \nu } + \tilde{g}_{HWW} H \tilde{W}^+_{\mu \nu }{W}^{-\mu \nu }. \end{aligned}$$
(3)

Only two of the four couplings \(\tilde{g}_{HVV}\) (\(V=W^\pm ,Z,\gamma \)) are independent due to constraints imposed by U(1)\(_Y\) and SU(2)\(_{I_W,\mathrm{L}}\) invariance. They can be expressed in terms of two dimensionless couplings \(\tilde{d}\) and \(\tilde{d}_B\) as:

$$\begin{aligned} \tilde{g}_{HAA} = \frac{g}{2 m_W} (\tilde{d} \sin ^2 \theta _W + \tilde{d}_B \cos ^2 \theta _W)&\tilde{g}_{HAZ} = \frac{g}{2 m_W} \sin 2\theta _W (\tilde{d} -\tilde{d}_B)\end{aligned}$$
(4)
$$\begin{aligned} \tilde{g}_{HZZ} = \frac{g}{2 m_W} (\tilde{d} \cos ^2 \theta _W + \tilde{d}_B \sin ^2 \theta _W)&\tilde{g}_{HWW} = \frac{g}{m_W} \tilde{d}. \end{aligned}$$
(5)

Hence in general WW, ZZ, \(Z\gamma \) and \(\gamma \gamma \) fusion contribute to VBF production. The relations between \(\tilde{d}\) and \(f_{\tilde{W}W}\), and \(\tilde{d}_B\) and \(f_{\tilde{B}B}\) are given by:

$$\begin{aligned} \tilde{d} = - \frac{m_W^2}{\Lambda ^2} f_{\tilde{W}W}\quad \tilde{d}_B = - \frac{m_W^2}{\Lambda ^2} \tan ^2 \theta _W f_{\tilde{B}B}. \end{aligned}$$
(6)

As the different contributions from the various electroweak gauge-boson fusion processes cannot be distinguished experimentally with the current available dataset, the arbitrary choice \(\tilde{d} = \tilde{d}_B\) is adopted. This yields the following relation for the \(\tilde{g}_{HVV}\):

$$\begin{aligned} \tilde{g}_{HAA} = \tilde{g}_{HZZ} = \frac{1}{2} \tilde{g}_{HWW} = \frac{g}{2 m_W} \tilde{d}\quad {\mathrm {and}} \quad \tilde{g}_{HAZ} = 0 \, . \end{aligned}$$
(7)

The parameter \(\tilde{d}\) is related to the parameter \(\hat{\kappa }_W = \tilde{\kappa }_W/ \kappa _{\mathrm {SM}} \tan \alpha \) used in the investigation of CP properties in the decay \(H\rightarrow WW\) [15] via \(\tilde{d} = - \hat{\kappa }_W\). The choice \(\tilde{d} = \tilde{d}_B\) yields \(\hat{\kappa }_W = \hat{\kappa }_Z\) as assumed in the combination of the \(H\rightarrow WW\) and \(H\rightarrow ZZ\) decay analyses [15].

The effective Lagrangian yields the following Lorentz structure for each vertex in the Higgs bosons coupling to two identical or charge-conjugated electroweak gauge bosons \(HV(p_1)V(p_2)\) (\(V=W^{\pm},Z,\gamma \)), with \(p_{1,2}\) denoting the momenta of the gauge bosons:

$$\begin{aligned}&T^{\mu \nu } (p_1,p_2) = \sum _{V=W,Z} \frac{2m^{2}_{V}}{v} g^{\mu \nu } + \sum _{V=W,Z,\gamma }\frac{2g}{m_W} \tilde{d} ~ \varepsilon ^{\mu \nu \rho \sigma } p_{1\rho } p_{2\sigma }. \end{aligned}$$
(8)

The first terms (\(\propto g^{\mu \nu }\)) are CP-even and describe the SM coupling structure, while the second terms (\(\propto \varepsilon ^{\mu \nu \rho \sigma } p_{1\rho } p_{2\sigma }\)) are CP-odd and arise from the CP-odd dimension-six operators. The choice \(\tilde{d} = \tilde{d}_B\) gives the same coefficients multiplying the CP-odd structure for \(HW^+W^-\), HZZ and \(H\gamma \gamma \) vertices and a vanishing coupling for the \(HZ\gamma \) vertex.

The matrix element \(\mathcal {M}\) for VBF production is the sum of a CP-even contribution \(\mathcal {M}_{\text {SM}}\) from the SM and a CP-odd contribution \(\mathcal {M}_{\text {CP-odd}}\) from the dimension-six operators considered:

$$\begin{aligned} \mathcal {M} =\mathcal {M}_{\text {SM}}+\tilde{d}\cdot \mathcal {M}_{\text {CP-odd}}. \end{aligned}$$
(9)

The differential cross section or squared matrix element has three contributions:

$$\begin{aligned} |\mathcal {M}|^{2}=|\mathcal {M}_{\text {SM}}|^{2}+ \tilde{d}\cdot 2 \mathrm{Re}(\mathcal {M}_{\text {SM}}^{*}\mathcal {M}_{\text {CP-odd}}) + \tilde{d}^{2}\cdot |\mathcal {M}_{\text {CP-odd}}|^{2} \, . \end{aligned}$$
(10)

The first term \(|\mathcal {M}_{\text {SM}}|^{2}\) and third term \(\tilde{d}^{2}\cdot |\mathcal {M}_{\text {CP-odd}}|^{2}\) are both CP-even and hence do not yield a source of CP violation. The second term \(\tilde{d}\cdot 2 \mathrm{Re}(\mathcal {M}_{\text {SM}}^{*}\mathcal {M}_{\text {CP-odd}})\), stemming from the interference of the two contributions to the matrix element, is CP-odd and is a possible new source of CP violation in the Higgs sector. The interference term integrated over a CP-symmetric part of phase space vanishes and therefore does not contribute to the total cross section and observed event yield after applying CP-symmetric selection criteria. The third term increases the total cross section by an amount quadratic in \(\tilde{d}\), but this is not exploited in the analysis presented here.

3 Test of CP invariance and Optimal Observable

Tests of CP invariance can be performed in a completely model-independent way by measuring the mean value of a CP-odd observable \(\langle \mathcal {O}_{\text {CP}} \rangle \). If CP invariance holds, the mean value has to vanish \(\langle \mathcal {O}_{\text {CP}} \rangle = 0\). An observation of a non-vanishing mean value would be a clear sign of CP violation. A simple CP-odd observable for Higgs boson production in VBF, the “signed” difference in the azimuthal angle between the two tagging jets \(\Delta \phi _{jj}\), was suggested in Ref. [22] and is formally defined as:

$$\begin{aligned} \epsilon _{\mu \nu \rho \sigma } b_+^\mu p_+^\nu b_-^\rho p_-^\sigma = 2 p_\mathrm{T+} p_\mathrm{T-} \sin (\phi _+ - \phi _-) = 2 p_\mathrm{T+} p_\mathrm{T-} \sin \Delta \phi _{jj} \, . \end{aligned}$$
(11)

Here \(b_+^\mu \) and \(b_-^\mu \) denote the normalised four-momenta of the two proton beams, circulating clockwise and anti-clockwise, and \(p_+^\mu \) (\(\phi _+\)) and \(p_-^\mu \) (\(\phi _-\)) denote the four-momenta (azimuthal angles) of the two tagging jets, where \(p_+\) (\(p_-\)) points into the same detector hemisphere as \(b_+^\mu \) (\(b_-^\mu \)). This ordering of the tagging jets by hemispheres removes the sign ambiguity in the standard definition of \(\Delta \phi _{jj}\).

The final state consisting of the Higgs boson and the two tagging jets can be characterised by seven phase-space variables while assuming the mass of the Higgs boson, neglecting jet masses and exploiting momentum conservation in the plane transverse to the beam line. The concept of the Optimal Observable combines the information of the high-dimensional phase space in a single observable, which can be shown to have the highest sensitivity for small values of the parameter of interest and neglects contributions proportional to \(\tilde{d}^2\) in the matrix element. The method was first suggested for the estimation of a single parameter using the mean value only [17] and via a maximum-likelihood fit to the full distribution [18] using the so-called Optimal Observable of first order. The extension to several parameters and also exploiting the matrix-element contributions quadratic in the parameters by adding an Optimal Observable of second order was introduced in Refs. [19, 27, 28]. The technique has been applied in various experimental analyses, e.g. Refs. [15, 29,30,31,32,33,34,35,36,37,38,39].

The analysis presented here uses only the first-order Optimal Observable \(\mathcal {\,OO\,}\) (called Optimal Observable below) for the measurement of \(\tilde{d}\) via a maximum-likelihood fit to the full distribution. It is defined as the ratio of the interference term in the matrix element to the SM contribution:

$$\begin{aligned} \mathcal {OO} =\frac{2 \mathrm{Re}(\mathcal {M}_{\text {SM}}^{*}\mathcal {M}_{\text {CP-odd}})}{|\mathcal {M}_{\text {SM}}|^{2}} \, . \end{aligned}$$
(12)

Figure 1 shows the distribution of the Optimal Observable, at parton level both for the SM case and for two non-zero \(\tilde{d}\) values, which introduce an asymmetry into the distribution and yield a non-vanishing mean value.

Fig. 1
figure 1

Distribution of the Optimal Observable at parton-level for two arbitrary \(\tilde{d}\) values. The SM sample was generated using MadGraph5_aMC@NLO [40] (see Sect. 5) at leading order, and then reweighted to different \(\tilde{d}\) values. Events are chosen such that there are at least two outgoing partons with \(p_{\mathrm {T}}> 25 \,\mathrm{GeV} \), \(|\eta | < 4.5 \), large invariant mass (\(m(p_1,p_2) > 500 \,\mathrm{GeV} \)) and large pseudorapidity gap (\(\Delta \eta (p_1,p_2) > 2.8\) )

The values of the leading-order matrix elements needed for the calculation of the Optimal Observable are extracted from HAWK [41,42,43]. The evaluation requires the four-momenta of the Higgs boson and the two tagging jets. The momentum fraction \(x_{1}\) (\(x_{2}\)) of the initial-state parton from the proton moving in the positive (negative) z-direction can be derived by exploiting energy–momentum conservation from the Higgs boson and tagging jet four-momenta as:

$$\begin{aligned} x_{1/2}^{\text {reco}} = \frac{m_{Hjj}}{\sqrt{s}}\mathrm {e}^{\pm y_{Hjj}} ~~~~~~ \end{aligned}$$
(13)

where \(m_{\text {Hjj}}\) (\(y_{\text {Hjj}}\)) is the invariant mass (rapidity) obtained from the vectorially summed four-momenta of the tagging jets and the Higgs boson. Since the flavour of the initial- and final-state partons cannot be determined experimentally, the sum over all possible flavour configurations \(ij \rightarrow kl H\) weighted by the CT10 leading-order parton distribution functions (PDFs) [44] is calculated separately for the matrix elements in the numerator and denominator:

$$\begin{aligned} 2 \mathrm{Re}(\mathcal {M}_{\text {SM}}^{*}\mathcal {M}_{\text {CP-odd}})= & {} \sum _{i,j,k,l} f_i(x_1) f_j(x_2) 2 \mathrm{Re}((\mathcal {M}_{\text {SM}}^{ij \rightarrow kl H})^*\mathcal {M}_{\text {CP-odd}}^{ij \rightarrow kl H})\end{aligned}$$
(14)
$$\begin{aligned} |\mathcal {M}_{\mathrm {SM}}|^2= & {} \sum _{i,j,k,l} f_i(x_1) f_j(x_2) |\mathcal {M}_{\mathrm {SM}}^{ij \rightarrow kl H}|^2\, . \end{aligned}$$
(15)

4 The ATLAS detector

The ATLAS detector [45] is a multi-purpose detector with a cylindrical geometry.Footnote 1 It comprises an inner detector (ID) surrounded by a thin superconducting solenoid, a calorimeter system and an extensive muon spectrometer in a toroidal magnetic field. The ID tracking system consists of a silicon pixel detector, a silicon microstrip detector, and a transition radiation tracker. It provides precise position and momentum measurements for charged particles and allows efficient identification of jets containing b-hadrons (b-jets) in the pseudorapidity range \(|\eta |<2.5\). The ID is immersed in a 2 T axial magnetic field and is surrounded by high-granularity lead/liquid-argon sampling electromagnetic calorimeters which cover the pseudorapidity range \( |\eta |< 3.2\). A steel/scintillator tile calorimeter provides hadronic energy measurements in the central pseudorapidity range (\( |\eta |< 1.7\)). In the forward regions (\(1.5< |\eta | < 4.9\)), the system is complemented by two end-cap calorimeters using liquid argon as active material and copper or tungsten as absorbers. The muon spectrometer surrounds the calorimeters and consists of three large superconducting eight-coil toroids, a system of tracking chambers, and detectors for triggering. The deflection of muons is measured in the region \(|\eta |< 2.7\) by three layers of precision drift tubes, and cathode strip chambers in the innermost layer for \(|\eta | > 2.0\). The trigger chambers consist of resistive plate chambers in the barrel (\(|\eta | < 1.05\)) and thin-gap chambers in the end-cap regions (\(1.05<|\eta |<2.4\)).

A three-level trigger system [46] is used to select events. A hardware-based Level-1 trigger uses a subset of detector information to reduce the event rate to 75 kHz or less. The rate of accepted events is then reduced to about 400 Hz by two software-based trigger levels, named Level-2 and the Event Filter.

5 Simulated samples

Background and signal events are simulated using various Monte Carlo (MC) event generators, as summarised in Table 1. The generators used for the simulation of the hard-scattering process and the model used for the simulation of the parton shower, hadronisation and underlying-event activity are listed. In addition, the cross-section values to which the simulation is normalised and the perturbative order in QCD of the respective calculations are provided.

Table 1 MC event generators used to model the signal and the background processes at \(\sqrt{s}=8\,\mathrm{TeV} \)

All the background samples used in this analysis are the same as those employed in Ref. [20], except the ones used to simulate events with the Higgs boson produced via gluon fusion and decaying into the \(\tau \tau \) final state. The Higgs-plus-one-jet process is simulated at NLO accuracy in QCD with Powheg-Box [47,48,49, 73], with the MINLO feature  [74] applied to include Higgs-plus-zero-jet events at NLO accuracy. This sample is referred to as HJ MINLO. The Powheg-Box event generator is interfaced to Pythia8 [51], and the CT10 [44] parameterisation of the PDFs is used. Higgs boson events produced via gluon fusion and decaying into the \(W^{+}W^{-}\) final state, which are a small component of the background, are simulated, as in Ref. [20], with Powheg [47,48,49, 81] interfaced to Pythia8 [51]. For these simulated events, the shape of the generated \(p_{\mathrm {T}}\) distribution is matched to a NNLO + NNLL calculation HRes2.1 [82, 83] in the inclusive phase space. Simultaneously, for events with two or more jets, the Higgs boson \(p_{\mathrm {T}}\) spectrum is reweighted to match the MINLO HJJ predictions [84]. The overall normalisation of the gluon fusion process (ggF) is taken from a calculation at next-to-next-to-leading order (NNLO) [75,76,77,78,79,80] in QCD, including soft-gluon resummation up to next-to-next-to-leading logarithm terms (NNLL) [85]. Next-to-leading-order (NLO) electroweak (EW) corrections are also included [86, 87]. Higgs boson events produced via VBF, with SM couplings, are also simulated with Powheg interfaced with Pythia8 (see Table 1 and Ref. [20]).

Production by VBF is normalised to a cross section calculated with full NLO QCD and EW corrections [41, 42, 52] with an approximate NNLO QCD correction applied [53]. The NLO EW corrections for VBF production depend on the \(p_{\mathrm {T}} \) of the Higgs boson, and vary from a few percent at low \(p_{\mathrm {T}} \) to \(\sim 20\%\) at \(p_{\mathrm {T}} \) = 300 \(\,\mathrm{GeV}\)  [88]. The \(p_{\mathrm {T}} \) spectrum of the VBF-produced Higgs boson is therefore reweighted, based on the difference between the Powheg-Box+Pythia calculation and the Hawk [41,42,43] calculation which includes these corrections.

In the case of VBF-produced Higgs boson events in the presence of anomalous couplings in the HVV vertex, the simulated samples are obtained by applying a matrix element (ME) reweighting method to the VBF SM signal sample. The weight is defined as the ratio of the squared ME value for the VBF process associated with a specific amount of CP mixing (measured in terms of \(\tilde{d}\)) to the SM one. The inputs needed for the ME evaluation are the flavour of the incoming partons, the four-momenta and the flavour of the two or three final-state partons and the four-momentum of the Higgs boson. The Bjorken x values of the initial-state partons can be calculated from energy–momentum conservation. The leading-order ME from HAWK [41,42,43] is used for the \(2\rightarrow 2+H\) or \(2\rightarrow 3+H\) process separately. This reweighting procedure is validated against samples generated with MadGraph5_aMC@NLO [40]. As described in Ref. [89], MadGraph5_aMC@NLO can simulate VBF production with anomalous couplings at next-to-leading order. The reweighting procedure proves to be a good approximation to a full next-to-Leading description of the BSM process.

In the case of the \(H\rightarrow WW\) sample, if CP violation exists in the HVV coupling, it would affect both the VBF production and the HWW decay vertex. It was verified that the shape of the Optimal Observable distribution is independent of any possible CP violation in the \(H\rightarrow WW\) decay vertex and that it is identical for \(H\rightarrow WW\) and \(H\rightarrow \tau \tau \) decays. Hence the same reweighting is applied for VBF-produced events with \(H\rightarrow WW\) and \(H\rightarrow \tau \tau \) decays.

For all samples, a full simulation of the ATLAS detector response [90] using the Geant4 program [91] was performed. In addition, multiple simultaneous minimum-bias interactions are simulated using the AU2 [92] parameter tuning of Pythia8. They are overlaid on the simulated signal and background events according to the luminosity profile of the recorded data. The contributions from these pile-up interactions are simulated both within the same bunch crossing as the hard-scattering process and in neighbouring bunch crossings. Finally, the resulting simulated events are processed through the same reconstruction programs as the data.

6 Analysis

After data quality requirements, the integrated luminosity of the \(\sqrt{s}=8\,\mathrm{TeV} \) dataset used is 20.3 fb\(^{-1}\). The triggers, event selection, estimation of background contributions and systematic uncertainties closely follow the analysis in Ref. [20]. In the following a short description of the analysis strategy is given; more details are given in that reference.

Depending on the reconstructed decay modes of the two \(\tau \) leptons (leptonic or hadronic), events are separated into the dileptonic (\(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\)) and semileptonic (\(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\)) channels. Following a channel-specific preselection, a VBF region is selected by requiring at least two jets with \(p_{\mathrm {T}}^{j_1}>\) 40 \(\,\mathrm{GeV}\) (50 \(\,\mathrm{GeV}\)) and \(p_{\mathrm {T}}^{j_2}>30\) \(\,\mathrm{GeV}\) and a pseudorapidity separation \(\Delta \eta (j_1,j_2) > 2.2\) (3.0) in the \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) (\(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\)) channel. Events with b-tagged jets are removed to suppress top-quark backgrounds.

Inside the VBF region, boosted decision trees (BDT)Footnote 2 are utilised for separating Higgs boson events produced via VBF from the background (including other Higgs boson production modes). The final signal region in each channel is defined by the events with a BDT\(_{\text {score}}\) value above a threshold of 0.68 for \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and 0.3 for \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\). The efficiency of this selection, with respect to the full VBF region, is 49% (51%) for the signal and 3.6% (2.1%) for the sum of background processes for the \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) (\(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\)) channel. A non-negligible number of events from VBF-produced \(H\rightarrow WW\) events survive the \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) selection: they amount to 17% of the overall VBF signal in the signal region. Their contribution is entirely negligible in the \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) selection. Inside each signal region, the Optimal Observable is then used as the variable with which to probe for CP violation. The BDT\(_{\text {score}}\) does not affect the mean of the Optimal Observable, as can be seen in Fig. 2.

Fig. 2
figure 2

Mean of the Optimal Observable as a function of the BDT\(_{\text {score}}\) for the SM signal (black dots with error bars) and for the sum of all background processes (filled red area), for the a \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and b \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) channel. The signal and background model is in agreement with the hypothesis of no bias from the BDT score

The modelling of the Optimal Observable distribution for various background processes is validated in dedicated control regions. The top-quark control regions are defined by the same cuts as the corresponding signal region, but inverting the veto on b-tagged jets and not applying the selection on the BDT\(_{\text {score}}\) (in the \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) channel a requirement of the transverse massFootnote 3 \(m_\mathrm {T}>40\) GeV is also applied). In the \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) channel a \(Z\rightarrow \ell \ell \) control region is obtained by requiring two same-flavour opposite-charge leptons, the invariant mass of the two leptons to be \(80< m_{\ell \ell } < 100~\,\mathrm{GeV} \), and no BDT\(_{\text {score}}\) requirement, but otherwise applying the same requirements as for the signal region. These regions are also used to normalise the respective background estimates using a global fit described in the next section. Finally, an additional region is defined for each channel, called the low-BDT\(_{\text {score}}\) control region, where a background-dominated region orthogonal to the signal region is selected by requiring the BDT\(_{\text {score}}\) to be less than 0.05 for \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and less than 0.3 for \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\). The distribution of the Optimal Observable in these regions is shown in Figs. 3 and  4, demonstrating the good description of the data by the background estimates.

Fig. 3
figure 3

Distributions of the Optimal Observable for the \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) channel in the a top-quark control region (CR), b \(Z\rightarrow \ell \ell \) CR, and c low-BDT\(_{\text {score}}\) CR. The CR definitions are given in the text. These figures use background predictions before the global fit defined in Sect. 7. The “Other” backgrounds include diboson and \(Z\rightarrow \ell \ell \). Only statistical uncertainties are shown

Fig. 4
figure 4

Distributions of the Optimal Observable for the \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) channel in the a top-quark control region (CR) and b low-BDT\(_{\text {score}}\) CR. The CR definitions are given in the text. These figures use background predictions before the global fit defined in Sect. 7. The “Other” backgrounds include diboson and \(Z\rightarrow \ell \ell \). Only statistical uncertainties are shown

The effect of systematic uncertainties on the yields in signal region and on the shape of the Optimal Observable is evaluated following the procedures and prescriptions described in Ref. [20]. An additional theoretical uncertainty in the shape of the Optimal Observable is included to account for the signal reweighting procedure described in Sect. 5. This is obtained from the small difference between the Optimal Observable distribution in reweighted samples, compared to samples with anomalous couplings directly generated with MadGraph5_aMC@NLO. While the analysis is statistically limited, the most important systematic uncertainties are found to arise from effects on the jet, hadronically decaying \(\tau \) and electron energy scales; the most important theoretical uncertainty is due to the description of the underlying event and parton shower in the VBF signal sample.

7 Fitting procedure

The best estimate of \(\tilde{d}\) is obtained using a maximum-likelihood fit performed on the Optimal Observable distribution in the signal region for each decay channel simultaneously, with information from different control regions included to constrain background normalisations and nuisance parameters. The normalisation of the VBF \(H\rightarrow \tau \tau \) and \(H\rightarrow WW\) signal sample is left free in the fit, i.e. this analysis only exploits the shape of the Optimal Observable and does not depend on any possibly model-dependent information about the cross section of CP-mixing scenarios. The relative proportion of the two Higgs boson decay modes is assumed to be as in the SM. All other Higgs boson production modes are treated as background in this study and normalised to their SM expectation, accounting for the corresponding theoretical uncertainties.

A binned likelihood function \(\mathcal {L}(\mathbf {x}; \mu , \varvec{\theta })\) is employed, which is a function of the data \(\mathbf {x}\), the free-floating signal strength \(\mu \), defined as the ratio of the measured cross section times branching ratio to the Standard Model prediction, and further nuisance parameters \(\varvec{\theta }\). It relies on an underlying model of signal plus background, and it is defined as the product of Poisson probability terms for each bin in the distribution of the Optimal Observable. A set of signal templates corresponding to different values of the CP-mixing parameter \(\tilde{d}\) is created by reweighting the SM VBF \(H\rightarrow \tau \tau \) and \(H\rightarrow WW\) signal samples, as described in Sect. 5. The likelihood function is then evaluated for each \(\tilde{d}\) hypothesis using the corresponding signal template, while keeping the same background model. The calculation profiles the nuisance parameters to the best-fit values \(\hat{\varvec{\theta }}\), including information about systematic uncertainties and normalisation factors, both of which affect the expected numbers of signal and background events.

After constructing the negative log-likelihood (NLL) curve by calculating the NLL value for each \(\tilde{d}\) hypothesis, the approximate central confidence interval at 68% confidence level (CL) is determined from the best estimator \(\hat{\tilde{d}}\), at which the NLL curve has its minimum value, by reading off the points at which \(\Delta \)NLL=NLL−NLL\(_{\text {min}} = 0.5\). The expected sensitivity is determined using an Asimov dataset, i.e. a pseudo-data distribution equal to the signal-plus-background expectation for given values of \(\tilde{d}\) and the parameters of the fit, in particular the signal strength \(\mu \), and not including statistical fluctuations [93].

In both channels, a region of low BDT\(_{\text {score}}\) is obtained as described in the preceding section. The distribution of the BDT\(_{\text {score}}\) itself is fitted in this region, which has a much larger number of background events than the signal region, allowing the nuisance parameters to be constrained by the data. This region provides the main constraint on the \(Z\rightarrow \tau \tau \) normalisation, which is free to float in the fit. The event yields from the top-quark (in \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\)) and \(Z\rightarrow \ell \ell \) (in \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) only) control regions defined in the previous section are also included in the fit, to constrain the respective background normalisations, which are also left free in the fit.

The distributions of the Optimal Observable in each channel are shown in Fig. 5, with the nuisance parameters, background and signal normalisation adjusted by the global fit performed for the \(\tilde{d}=0\) hypothesis. Table 2 provides the fitted yields of signal and background events, split into the various contributions, in each channel. The number of events observed in data is also provided.

Table 2 Event yields in the signal region, after the global fit performed for the \(\tilde{d}=0\) hypothesis. The errors include systematic uncertainties
Fig. 5
figure 5

Distributions of the Optimal Observable in the signal region for the a \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and b \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) channel, after the global fit performed for the \(\tilde{d}=0\) hypothesis. The best-fit signal strength is \(\mu =1.55^{+0.87}_{-0.76}\). The “Other” backgrounds include diboson and \(Z\rightarrow \ell \ell \). The error bands include all uncertainties

8 Results

The mean value of the Optimal Observable for the signal is expected to be zero for a CP-even case, while there may be deviations in case of CP-violating effects. A mean value of zero is also expected for the background, as has been demonstrated. Hence, the mean value in data should also be consistent with zero if there are no CP-violating effects within the precision of this measurement. The observed values for the mean value in data inside the signal regions are \(0.3\pm 0.5\) for \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and \(-0.3\pm 0.4\) for \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\), fully consistent with zero within statistical uncertainties and thus showing no hint of CP violation.

Fig. 6
figure 6

Observed and expected \(\Delta \)NLL as a function of the \(\tilde{d}\) values defining the underlying signal hypothesis, for \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) (green), \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) (red) and their combination (black). The best-fit values of all nuisance parameters from the combined fit at each \(\tilde{d}\) point were used in all cases. An Asimov dataset with SM backgrounds plus pure CP-even VBF signal (\(\tilde{d}=0\)), scaled to the best-fit signal-strength value, was used to calculate the expected values, shown in blue. The markers indicate the points where an evaluation was made – the lines are only meant to guide the eye

Fig. 7
figure 7

Expected \(\Delta \)NLL for the combination of both channels as a function of the \(\tilde{d}\) values defining the underlying signal hypothesis when using the Optimal Observable (black) or the \(\Delta \phi ^{\mathrm {sign}}_{jj}\) parameter (blue) as the final discriminating variable. An Asimov dataset with SM backgrounds plus pure CP-even VBF signal (\(\tilde{d}=0\)) scaled to the SM expectation was used to calculate the expected values in both cases. The markers indicate the points where an evaluation was made – the lines are only meant to guide the eye

Fig. 8
figure 8

Observed (black) and expected (red) \(\Delta \)NLL for the combination of both channels as a function of the \(\tilde{d}\) values defining the underlying signal hypothesis when using the \(\Delta \phi ^{\mathrm {sign}}_{jj}\) parameter as the final discriminating variable. An Asimov dataset with SM backgrounds plus pure CP-even VBF signal (\(\tilde{d}=0\)), scaled to the best-fit value of the signal strength in the combined fit when using the \(\Delta \phi ^{\mathrm {sign}}_{jj}\) parameter (\(\mu =2.02^{+0.87}_{-0.77}\)) was used to calculate the expected values. The markers indicate the points where an evaluation was made – the lines are only meant to guide the eye

As described in the previous section, the observed limit on CP-odd couplings is estimated using a global maximum-likelihood fit to the Optimal Observable distributions in data. The observed distribution of \(\Delta \)NLL as a function of the CP-mixing parameter \(\tilde{d}\) for the individual channels separately, and for their combination, is shown in Fig. 6. The \(\tau _{\mathrm {lep}}\tau _{\mathrm {lep}}\) and \(\tau _{\mathrm {lep}}\tau _{\mathrm {had}}\) curves use the best-fit values of all nuisance parameters from the combined fit at each \(\tilde{d}\) point. The expected curve is calculated assuming no CP-odd coupling, with the \(H\rightarrow \tau \tau \) signal scaled to the signal-strength value (\(\mu = 1.55^{+0.87}_{-0.76}\)) determined from the fit for \(\tilde{d}=0\). In the absence of CP violation the curve is expected to have a minimum at \(\tilde{d}=0\). Since the first-order Optimal Observable used in the present analysis is only sensitive to small variations in the considered variable, for large \(\tilde{d}\) values there is no further discrimination power and thus the \(\Delta \)NLL curve is expected to flatten out. The observed curve follows this behaviour and is consistent with no CP violation. The regions \(\tilde{d}<-0.11\) and \(\tilde{d}>0.05\) are excluded at 68% CL. The expected confidence intervals are \([-0.08 , 0.08]~([-0.18 , 0.18])\) for an assumed signal strength of \(\mu =\) 1.55 (1.0). The constraints on the CP-mixing parameter \(\tilde{d}\) based on VBF production can be directly compared to those obtained by studying the Higgs boson decays into vector bosons, as the same relation between the HWW and HZZ couplings as in Ref. [14, 15] is assumed. The 68% CL interval presented in this work is a factor 10 better than the one obtained in Ref. [15].

As a comparison, the same procedure for extracting the CP-mixing parameter \(\tilde{d}\) was applied using the \(\Delta \phi ^{\mathrm {sign}}_{jj}\) observable, previously proposed for this measurement and defined in Eq. 11, rather than the Optimal Observable. The expected \(\Delta \)NLL curves for a SM Higgs boson signal from the combination of both channels for the two CP-odd observables are shown in Fig. 7, allowing a direct comparison, and clearly indicate the better sensitivity of the Optimal Observable. The observed \(\Delta \)NLL curve derived from the \(\Delta \phi ^{\mathrm {sign}}_{jj}\) distribution is also consistent with \(\tilde{d}=0\), as shown in Fig. 8, along with the expectation for a signal with \(\tilde{d}=0\) scaled to the best-fit signal-strength value (\(\mu =2.02^{+0.87}_{-0.77}\)).

9 Conclusions

A test of CP invariance in the Higgs boson coupling to vector bosons has been performed using the vector-boson fusion production mode and the \(H\rightarrow \tau \tau \) decay. The dataset corresponds to 20.3 \(\mathrm{fb}^{-1}\)of \(\sqrt{s}\) = 8 \(\,\mathrm{TeV}\) proton–proton collisions recorded by the ATLAS detector at the LHC. Event selection, background estimation and evaluation of systematic uncertainties are all very similar to the ATLAS analysis that provided evidence of the \(H\rightarrow \tau \tau \) decay. An Optimal Observable is constructed and utilised, and is shown to provide a substantially better sensitivity than the variable traditionally proposed for this kind of study, \(\Delta \phi ^{\mathrm {sign}}_{jj}\). No sign of CP violation is observed. Using only the dileptonic and semileptonic \(H\rightarrow \tau \tau \) channels, and under the assumption \(\tilde{d} = \tilde{d}_B\), values of \(\tilde{d}\) less than \(-0.11\) and greater than 0.05 are excluded at 68% CL.

This 68% CL interval is a factor of 10 better than the one previously obtained by the ATLAS experiment from Higgs boson decays into vector bosons. In contrast, the present analysis has no sensitivity to constrain a 95% CL interval with the dataset currently available – however larger data samples in the future and consideration of additional Higgs boson decay channels should make this approach highly competitive.