1 Introduction

The identification of the flavour of reconstructed B 0 and \(B ^{0}_{s}\) mesons at production is necessary for the measurements of oscillations and time-dependent CP asymmetries. This procedure is known as flavour tagging and is performed at LHCb by means of several algorithms.

Opposite-side (OS) tagging algorithms rely on the pair production of b and \(\bar{b}\) quarks and infer the flavour of a given B meson (signal B) from the identification of the flavour of the other b hadron (tagging B).Footnote 1 , Footnote 2 The algorithms use the charge of the lepton (μ, e) from semileptonic b decays, the charge of the kaon from the bcs decay chain or the charge of the inclusive secondary vertex reconstructed from b-hadron decay products. All these methods have an intrinsic dilution on the tagging decision, for example due to the possibility of flavour oscillations of the tagging B. This paper describes the optimization and calibration of the OS tagging algorithms which are performed with the data used for the first measurements performed by LHCb on \(B ^{0}_{s}\) mixing and time-dependent CP violation [13].

Additional tagging power can be derived from same-side tagging algorithms which determine the flavour of the signal B by exploiting its correlation with particles produced in the hadronization process. The use of these algorithms at LHCb will be described in a forthcoming publication. The use of flavour tagging in previous experiments at hadron colliders is described in Refs. [4, 5].

The sensitivity of a measured CP asymmetry is directly related to the effective tagging efficiency ε eff, or tagging power. The tagging power represents the effective statistical reduction of the sample size, and is defined as

$$ \varepsilon_{\mathrm{eff}} = \varepsilon_{\mathrm{tag}}\mathcal{D}^2 = \varepsilon_{\mathrm{tag}}(1-2\omega)^2, $$
(1)

where ε tag is the tagging efficiency, ω is the mistag fraction and \(\mathcal{D}\) is the dilution. The tagging efficiency and the mistag fraction are defined as

$$ {\varepsilon_{\mathrm{tag}}}= \frac{R + W}{R + W + U} \quad \mbox{and} \quad \omega= \frac{W}{R + W}, $$
(2)

where R, W, U are the number of correctly tagged, incorrectly tagged and untagged events, respectively.

The mistag fraction can be measured in data using flavour-specific decay channels, i.e. those decays where the final state particles uniquely define the quark/antiquark content of the signal B. In this paper, the decay channels B +J/ψK +, B 0J/ψK ∗0 and B 0D ∗− μ + ν μ are used. For charged mesons, the mistag fraction is obtained by directly comparing the tagging decision with the flavour of the signal B, while for neutral mesons it is obtained by fitting the B 0 flavour oscillation as a function of the decay time.

The probability of a given tag decision to be correct is estimated from the kinematic properties of the tagging particle and the event itself by means of a neural network trained on Monte Carlo (MC) simulated events to identify the correct flavour of the signal B. When more than one tagging algorithm gives a response for an event, the probabilities provided by each algorithm are combined into a single probability and the decisions are combined into a single decision. The combined probability can be exploited on an event-by-event basis to assign larger weights to events with low mistag probability and thus to increase the overall significance of an asymmetry measurement. In order to get the best combination and a reliable estimate of the event weight, the calculated probabilities are calibrated on data. The default calibration parameters are extracted from the B +J/ψK + channel. The other two flavour-specific channels are used to perform independent checks of the calibration procedure.

2 The LHCb detector and the data sample

The LHCb detector [6] is a single-arm forward spectrometer which measures CP violation and rare decays of hadrons containing b and c quarks. A vertex detector (VELO) determines with high precision the positions of the primary and secondary vertices as well as the impact parameter (IP) of the reconstructed tracks with respect to the primary vertex. The tracking system also includes a silicon strip detector located in front of a dipole magnet with integrated field about 4 Tm, and a combination of silicon strip detectors and straw drift chambers placed behind the magnet. Charged hadron identification is achieved through two ring-imaging Cherenkov (RICH) detectors. The calorimeter system consists of a preshower detector, a scintillator pad detector, an electromagnetic calorimeter and a hadronic calorimeter. It identifies high transverse energy hadron, electron and photon candidates and provides information for the trigger. Five muon stations composed of multi-wire proportional chambers and triple-GEMs (gas electron multipliers) provide fast information for the trigger and muon identification capability.

The LHCb trigger consists of two levels. The first, hardware-based, level selects leptons and hadrons with high transverse momentum, using the calorimeters and the muon detectors. The hardware trigger is followed by a software High Level Trigger (HLT), subdivided into two stages that use the information from all parts of the detector. The first stage performs a partial reconstruction of the event, reducing the rate further and allowing the next stage to fully reconstruct and to select the events for storage up to a rate of 3 kHz.

The majority of the events considered in this paper were triggered by a single hadron or muon track with large momentum, transverse momentum and IP. In the HLT, the channels with a J/ψ meson in the final state were selected by a dedicated di-muon decision that does not apply any requirement on the IP of the muons.

The data used in this paper were taken between March and June 2011 and correspond to an integrated luminosity of 0.37 fb−1. The polarity of the LHCb magnet was reversed several times during the data taking period in order to minimize systematic biases due to possible detector asymmetries.

3 Flavour tagging algorithms

Opposite-side tagging uses the identification of electrons, muons or kaons that are attributed to the other b hadron in the event. It also uses the charge of tracks consistent with coming from a secondary vertex not associated with either the primary or the signal B vertex. These taggers are called electron, muon, kaon and vertex charge taggers, respectively. The tagging algorithms were developed and studied using simulated events. Subsequently, the criteria to select the tagging particles and to reconstruct the vertex charge are re-tuned, using the B +J/ψK + and the B 0D ∗− μ + ν μ control channels. An iterative procedure is used to find the selection criteria which maximize the tagging power ε eff.

Only charged particles reconstructed with a good quality of the track fit are used. In order to reject poorly reconstructed tracks, the track is required to have a polar angle with respect to the beamline larger than 12 mrad and a momentum larger than 2 GeV/c. Moreover, in order to avoid possible duplications of the signal tracks, the selected particles are required to be outside a cone of 5 mrad formed around any daughter of the signal B. To reject tracks coming from other primary interactions in the same bunch crossing, the impact parameter significance with respect to these pile-up (PU) vertices, \(\mathrm{IP}_{\mathrm{PU}}/\sigma_{\mathrm{IP}_{\mathrm{PU}}} > 3\), is required.

3.1 Single-particle taggers

The tagging particles are selected exploiting the properties of the b-hadron decay. A large impact parameter significance with respect to the primary vertex (IP/σ IP) and a large transverse momentum p T are required. Furthermore, particle identification cuts are used to define each tagger based on the information from the RICH, calorimeter and muon systems. For this purpose, the differences between the logarithm of the likelihood for the muon, electron, kaon or proton and the pion hypotheses (referred as DLL μπ , DLL eπ , DLL Kπ and DLL pπ ) are used. The detailed list of selection criteria is reported in Table 1. Additional criteria are used to identify the leptons. Muons are required not to share hits in the muon chambers with other tracks, in order to avoid mis-identification of tracks which are close to the real muon. Electrons are required to be below a certain threshold in the ionization charge deposited in the silicon layers of the VELO, in order to reduce the number of candidates coming from photon conversions close to the interaction point. An additional cut on the ratio of the particle energy E as measured in the electromagnetic calorimeter and the momentum p of the candidate electron measured with the tracking system, E/p>0.6, is applied.

Table 1 Selection criteria for the OS muon, electron and kaon taggers

In the case of multiple candidates from the same tagging algorithm, the single-particle tagger with the highest p T is chosen and its charge is used to define the flavour of the signal B.

3.2 Vertex charge tagger

The vertex charge tagger is based on the inclusive reconstruction of a secondary vertex corresponding to the decay of the tagging B. The vertex reconstruction consists of building a composite candidate from two tracks with a transverse momentum p T>0.15 GeV/c and IP/σ IP>2.5. The pion mass is attributed to the tracks. Moreover, good quality of the vertex reconstruction is required and track pairs with an invariant mass compatible with a \(K^{0}_{\mathrm{S}}\) meson are excluded. For each reconstructed candidate the probability that it originates from a b-hadron decay is estimated from the quality of the vertex fit as well as from the geometric and kinematic properties. Among the possible candidates the one with the highest probability is used. Tracks that are compatible with coming from the two track vertex but do not originate from the primary vertex are added to form the final candidate. Additional requirements are applied to the tracks associated to the reconstructed secondary vertex: total momentum > 10 GeV/c, total p T>1.5 GeV/c, total invariant mass > 0.5 GeV/c 2 and the sum of IP/σ IP of all tracks > 10.

Finally, the charge of the tagging B is calculated as the sum of the charges Q i of all the tracks associated to the vertex, weighted with their transverse momentum to the power κ

$$ Q_{\mathrm{vtx}} = \frac{\sum_i Q_i p^{\kappa}_{\mathrm{T}i} }{\sum_ip^{\kappa}_{\mathrm{T}i} }, $$
(3)

where the value κ=0.4 optimizes the tagging power. Events with |Q vtx|<0.275 are rejected as untagged.

3.3 Mistag probabilities and combination of taggers

For each tagger i, the probability η i of the tag decision to be wrong is estimated by using properties of the tagger and of the event itself. This mistag probability is evaluated by means of a neural network trained on simulated B +J/ψK + events to identify the correct flavour of the signal B and subsequently calibrated on data as explained in Sect. 5.

The inputs to each of the neural networks are the signal B transverse momentum, the number of pile-up vertices, the number of tracks preselected as tagging candidates and various geometrical and kinematic properties of the tagging particle (p, p T and IP/σ IP of the particle), or of the tracks associated to the secondary vertex (the average values of p T, of IP, the reconstructed invariant mass and the absolute value of the vertex charge).

If there is more than one tagger available per event, the decisions provided by all available taggers are combined into a final decision on the initial flavour of the signal B. The combined probability P(b) that the meson contains a b-quark is calculated as

$$ P(b) = \frac{p(b)}{p(b)+p(\bar{b})}, \qquad P(\bar{b})=1-P(b), $$
(4)

where

$$ \everymath{\displaystyle} \begin{array}{@{}l} p(b) = \prod_i \biggl( \frac{1+d_i}{2} - d_i (1- \eta_i) \biggr), \\\noalign{\vspace{6pt}} p(\bar{b}) = \prod_i \biggl(\frac{1-d_i}{2} + d_i (1- \eta_i) \biggr). \end{array} $$
(5)

Here, d i is the decision taken by the i-th tagger based on the charge of the particle with the convention d i =1(−1) for the signal B containing a \(\bar{b}(b)\) quark and η i the corresponding predicted mistag probability. The combined tagging decision and the corresponding mistag probability are d=−1 and η=1−P(b) if \(P(b)>P(\bar{b})\), otherwise d=+1 and \(\eta= 1 - P(\bar{b})\).

The contribution of taggers with a poor tagging power is limited by requiring the mistag probabilities of the kaon and the vertex charge to be less than 0.46.

Due to the correlation among taggers, which is neglected in (5), the combined probability is slightly overestimated. The largest correlation occurs between the vertex charge tagger and the other OS taggers, since the secondary vertex may include one of these particles. To correct for this overestimation, the combined OS probability is calibrated on data, as described in Sect. 5.

4 Control channels

The flavour-specific B decay modes B +J/ψK +, B 0J/ψK ∗0 and B 0D ∗− μ + ν μ are used for the tagging analysis. All three channels are useful to optimize the performance of the OS tagging algorithm and to calibrate the mistag probability. The first two channels are chosen as representative control channels for the decays \(B ^{0}_{s} \to J/\psi \phi\) and \(B ^{0}_{s} \to J/\psi f_{0}\), which are used for the measurement of the \(B ^{0}_{s}\) mixing phase ϕ s  [2, 3], and the last channel allows detailed studies given the high event yield of the semileptonic decay mode. All B decay modes with a J/ψ meson in the final state share the same trigger selection and common offline selection criteria, which ensures a similar performance of the tagging algorithms. Two trigger selections are considered, with or without requirements on the IP of the tracks. They are labeled “lifetime biased” and “lifetime unbiased”, respectively.

4.1 Analysis of the B +J/ψK + channel

The B +J/ψK + candidates are selected by combining J/ψμ + μ and K + candidates. The J/ψ mesons are selected by combining two muons with transverse momenta p T>0.5 GeV/c that form a common vertex of good quality and have an invariant mass in the range 3030–3150 MeV/c 2. The K + candidates are required to have transverse momenta p T>1 GeV/c and momenta p>10 GeV/c and to form a common vertex of good quality with the J/ψ candidate with a resulting invariant mass in a window ±90 MeV/c 2 around the B + mass. Additional requirements on the particle identification of muons and kaons are applied to suppress the background contamination. To enhance the sample of signal events and reduce the dominant background contamination from prompt J/ψ mesons combined with random kaons, only the events with a reconstructed decay time of the B + candidate t>0.3 ps are selected. The decay time t and the invariant mass m of the B + meson are extracted from a vertex fit that includes a constraint on the associated primary vertex, and a constraint on the J/ψ mass for the evaluation of the J/ψK invariant mass. In case of multiple B candidates per event, only the one with the smallest vertex fit χ 2 is considered.

The signal events are statistically disentangled from the background, which is dominated by partially reconstructed b-hadron decays to J/ψK + X (where X represents any other particle in the decay), by means of an unbinned maximum likelihood fit to the reconstructed B + mass and decay time. In total ∼85 000 signal events are selected with a background to signal ratio B/S∼0.035, calculated in a window of ±40 MeV/c 2 centered around the B + mass. The mass fit model is based on a double Gaussian distribution peaking at the B + mass for the signal and an exponential distribution for the background. The time distributions of both the signal and the background are assumed to be exponential, with separate decay constants. The fraction of right, wrong or untagged events in the sample is determined according to a probability density function (PDF), \(\mathcal{P}(r)\), that depends on the tagging response r, defined by

$$ \mathcal{P}(r) = \left\{ \begin{array}{l@{\quad}l} {\varepsilon_{\mathrm{tag}}} ( 1- \omega ) & r=\mbox{``right tag decision''} \\ {\varepsilon_{\mathrm{tag}}} \omega& r=\mbox{``wrong tag decision''} \\ 1-{\varepsilon_{\mathrm{tag}}}& r=\mbox{``no tag decision''}. \end{array} \right. $$
(6)

The parameters ω and ε tag (defined in (2)) are different for signal and background. Figure 1 shows the mass distribution of the selected and tagged events, together with the superimposed fit.

Fig. 1
figure 1

Mass distribution of OS tagged B +J/ψK + events. Black points are data, the solid blue line, red dotted line and green area are the overall fit, the signal and the background components, respectively (Color figure online)

4.2 Analysis of the B 0D ∗− μ + ν μ channel

The B 0D ∗− μ + ν μ channel is selected by requiring that a muon and the decay \(D ^{*-}\to\overline{D}^{0}(\to K^{+} \pi^{-}) \pi^{-}\) originate from a common vertex, displaced with respect to the pp interaction point. The muon and \(\overline{D}^{0}\) transverse momenta are required to be larger than 0.8 GeV/c and 1.8 GeV/c respectively. The selection criteria exploit the long B 0 and \(\overline{D}^{0}\) lifetimes by applying cuts on the impact parameters of the daughter tracks, on the pointing of the reconstructed B 0 momentum to the primary vertex, on the difference between the z coordinate of the B 0 and \(\overline{D}^{0}\) vertices, and on the \(\overline {D}^{0}\) flight distance. Additional cuts are applied on the muon and kaon particle identification and on the quality of the fits of all tracks and vertices. In case of multiple B candidates per event the one with the smallest impact parameter significance with respect to the primary vertex is considered. Only events triggered in the HLT by a single particle with large momentum, large transverse momentum and large IP are used. In total, the sample consists of ∼482 000 signal events.

Even though the final state is only partially reconstructed due to the missing neutrino, the contamination of background is small and the background to signal ratio B/S is measured to be ∼0.14 in the signal mass region. The main sources of background are events containing a \(\overline {D}^{0}\) originating from a b-hadron decay (referred to as \(\overline{D}^{0}\)-from-B), events with a D ∗− not from a b-hadron decay, decays of B + mesons to the same particles as the signal together with an additional pion (referred to as B +) and combinatorial background. The different background sources can be disentangled from the signal by exploiting the different distributions of the observables m=m , Δm=m Kππ m , the reconstructed B 0 decay time t and the mixing state q. The mixing state is determined by comparing the flavour of the reconstructed signal B 0 at decay time with the flavour indicated by the tagging decision (flavour at production time). For unmixed (mixed) events q=+1(−1) while for untagged events q=0. The decay time is calculated using the measured B 0 decay length, the reconstructed B 0 momentum and a correction for the missing neutrino determined from simulation. It is parametrized as a function of the reconstructed B 0 invariant mass.

An extended unbinned maximum likelihood fit is performed by defining a PDF for the observables (mm,t,q) as a product of one PDF for the masses and one for the t and q observables. For the \(\overline{D}^{0}\) and D ∗− mass peaks two double Gaussian distributions with common mean are used, while a parametric function motivated by available phase space is used to describe the Δm distributions of the \(\overline{D}^{0}\)-from-B, and combinatorial background components. The decay time distribution of the signal consists of mixed, unmixed and untagged events, and is given by

$$ \mathcal{P}^{\mathrm{s}}(t,q) \propto\left\{ \begin{array}{l@{\quad}l} {\varepsilon_{\mathrm{tag}}} a(t) \{ e^{-t/ \tau_{B^0}} [ 1+q(1-2\omega) \cos(\Delta m_{d}t)] \otimes R(t-t') \} & \mbox{if $q=\pm1$} \\\noalign{\vspace{4pt}} (1-{\varepsilon_{\mathrm{tag}}}) a(t) \{ e^{-t/ \tau_{B^0}} \otimes R(t-t') \} & \mbox{if $q=0$}, \end{array} \right. $$
(7)

where Δm d and \(\tau_{B^{0}}\) are the \(B^{0}\mbox{--}\overline{B}^{0}\) mixing frequency and B 0 lifetime. The decay time acceptance function is denoted by a(t) and R(tt′) is the resolution model, both extracted from simulation. A double Gaussian distribution with common mean is used for the decay time resolution model. In (7) the tagging parameters are assumed to be the same for B and \(\bar{B}\)-mesons.

The decay time distributions for the B + and \(\overline{D}^{0}\)-from-B background components are taken as exponentials convolved by the resolution model and multiplied by the same acceptance function as used for the signal. For the prompt D and combinatorial background, Landau distributions with independent parameters are used. The dependence on the mixing observable q is the same as for the signal. The tagging parameters ε tag and ω of the signal and of each background component are varied independently in the fit, except for the B + background where they are assumed to be equal to the parameters in the signal decay. Figure 2 shows the distributions of the mass and decay time observables used in the maximum likelihood fit. The raw asymmetry is defined as

$$ \mathcal{A}^{\mathrm{raw}}(t) = \frac{N^{\mathrm{unmix}}(t)-N^{\mathrm{mix}}(t)}{N^{\mathrm{unmix}}(t)+N^{\mathrm{mix}}(t)} $$
(8)

where N mix (N unmix) is the number of tagged events which have (not) oscillated at decay time t. From (7) it follows that the asymmetry for signal is given by

$$ \mathcal{A}(t) = (1-2\omega)\cos(\Delta m_{d}t) . $$
(9)

Figure 3 shows the raw asymmetry for the subset of events in the signal mass region that are tagged with the OS tagger combination. At small decay times the asymmetry decreases due to the contribution of background events, \(\mathcal{A} \simeq0\). The value of Δm d was fixed to Δm d =0.507 ps−1 [7]. Letting the Δm d parameter vary in the fit gives consistent results.

Fig. 2
figure 2

Distributions of (a) K + π invariant mass, (b) mass difference m(Kππ)−m() and (c) decay time of the B 0D ∗− μ + ν μ events. Black points with errors are data, the blue curve is the fit result. The other lines represent signal (red dot-dashed), \(\overline{D}^{0}\)-from-B decay background (gray dashed), B + background (green short dashed), D prompt background (magenta solid). The combinatorial background is the magenta filled area (Color figure online)

Fig. 3
figure 3

Raw mixing asymmetry of B 0D ∗− μ + ν μ events in the signal mass region when using the combination of all OS taggers. Black points are data and the red solid line is the result of the fit. The lower plot shows the pulls of the residuals with respect to the fit (Color figure online)

4.3 Analysis of the B 0J/ψK ∗0 channel

The B 0J/ψK ∗0 channel is used to extract the mistag rate through a fit of the flavour oscillation of the B 0 mesons as a function of the decay time. The flavour of the B 0 meson at production time is determined from the tagging algorithms, while the flavour at the decay time is determined from the K ∗0 flavour, which is in turn defined by the kaon charge.

The B 0J/ψK ∗0 candidates are selected from J/ψμ + μ and K ∗0K + π decays. The J/ψ mesons are selected by the same selection as used for the B +J/ψK + channel, described in Sect. 4.1. The K ∗0 candidates are reconstructed from two good quality charged tracks identified as K + and π . The reconstructed K ∗0 meson is required to have a transverse momentum higher than 1 GeV/c, a good quality vertex and an invariant mass within ±70 MeV/c 2 of the nominal K ∗0 mass. Combinations of J/ψ and K ∗0 candidates are accepted as B 0 candidates if they form a common vertex with good quality and an invariant mass in the range 5100–5450 MeV/c 2. The B 0 transverse momentum is required to be higher than 2 GeV/c. The decay time and the invariant mass of the B 0 are extracted from a vertex fit with an identical procedure as for the B +J/ψK + channel, by applying a constraint to the associated primary vertex, and a constraint to the J/ψ mass. In case of multiple B candidates per event, only the candidate with the smallest χ 2 of the vertex is kept.

Only events that were triggered by the “lifetime unbiased” selection are kept. The B 0 candidates are required to have a decay time higher than 0.3 ps to remove the large combinatorial background due to prompt J/ψ production. The sample contains ∼33 000 signal events.

The decay time distribution of signal events is parametrized as in (7), without the acceptance correction. The background contribution, with a background to signal ratio B/S∼0.29, is due to misreconstructed b-hadron decays, where a dependence on the decay time is expected (labeled “long-lived” background). We distinguish two long-lived components. The first corresponds to events where one or more of the four tracks originate from a long-lived particle decay, but where the flavour of the reconstructed K ∗0 is not correlated with a true b-hadron. Its decay time distribution is therefore modeled by a decreasing exponential. In the second long-lived background component, one of the tracks used to build the K ∗0 originated from the primary vertex, hence the correlation between the K ∗0 and the B flavour is partially lost. Its decay time distribution is more “signal-like”, i.e. it is a decreasing exponential with an oscillation term, but with different mistag fraction and lifetime, left as free parameters in the fit.

The signal and background decay time distributions are convolved with the same resolution function, extracted from data. The mass distributions, shown in Fig. 4, are described by a double Gaussian distribution peaking at the B 0 mass for the signal component, and by an exponential with the same exponent for both long-lived backgrounds.

Fig. 4
figure 4

Mass distribution of OS tagged B 0J/ψK ∗0 events. Black points are data, the solid blue line, red dotted line and green area are the overall fit, the signal and the background components, respectively (Color figure online)

The OS mistag fraction is extracted from a fit to all tagged data, with the values for the B 0 lifetime and Δm d fixed to the world average [7]. Figure 5 shows the time-dependent mixing asymmetry in the signal mass region, obtained using the information of the OS tag decision. Letting the Δm d parameter vary in the fit gives consistent results.

Fig. 5
figure 5

Raw mixing asymmetry of the B 0J/ψK ∗0 events in the signal mass region, for all OS tagged events. Black points are data and the red solid line is the result of the fit. The lower plot shows the pulls of the residuals with respect to the fit (Color figure online)

5 Calibration of the mistag probability on data

For each individual tagger and for the combination of taggers, the calculated mistag probability (η) is obtained on an event-by-event basis from the neural network output. The values are calibrated in a fit using the measured mistag fraction (ω) from the self-tagged control channel B +J/ψK +. A linear dependence between the measured and the calculated mistag probability for signal events is used, as suggested by the data distribution,

$$ \omega(\eta) = p_0 + p_1 \bigl(\eta- \langle\eta\rangle\bigr) , $$
(10)

where p 0 and p 1 are parameters of the fit and 〈η〉 is the mean calculated mistag probability. This parametrization is chosen to minimize the correlation between the two parameters. Deviations from p 0=〈η〉 and p 1=1 would indicate that the calculated mistag probability should be corrected.

In order to extract the p 0 and p 1 calibration parameters, an unbinned maximum likelihood fit to the mass, tagging decision and mistag probability η observable is performed. The fit parametrization takes into account the probability density function of η, \(\mathcal{P}(\eta)\), that is extracted from data for signal and background separately, using events in different mass regions. For example, the PDF for signal events from (6) then becomes

(11)

The measured mistag fraction of the background is assumed to be independent from the calculated mistag probability, as confirmed by the distribution of background events.

The calibration is performed on part of the data sample in a two-step procedure. Each tagger is first calibrated individually. The results show that, for each single tagger, only a minor adjustment of p 0 with respect to the starting calibration of the neural network, performed on simulated events, is required. In particular, the largest correction is p 0−〈η〉=0.033±0.005 in the case of the vertex charge tagger, while the deviations from unity of the p 1 parameter are about 10 %, similar to the size of the corresponding statistical errors. In a second step the calibrated mistag probabilities are combined and finally the combined mistag probability is calibrated. This last step is necessary to correct for the small underestimation (p 0−〈η〉=0.022±0.003) of the combined mistag probability due to the correlation among taggers neglected in the combination procedure. The calibrated mistag is referred to as η c in the following.

Figure 6 shows the distribution of the mistag probability for each tagger and for their combination, as obtained for B +J/ψK + events selected in a ±24 MeV/c 2 mass window around the B + mass.

Fig. 6
figure 6

Distribution of the calibrated mistag probability for the single OS taggers and their combination for B +J/ψK + events selected in a ±24 MeV/c 2 mass window around the B + mass

6 Tagging performance

The tagging performances of the single taggers and of the OS combination measured after the calibration of the mistag probability are shown in Tables 2, 3 and 4 for the B +J/ψK +, B 0J/ψK ∗0 and B 0D ∗− μ + ν μ channels, respectively.

Table 2 Tagging performance in the B +J/ψK + channel. Uncertainties are statistical only
Table 3 Tagging performance in the B 0J/ψK ∗0 channel. Uncertainties are statistical only
Table 4 Tagging performance in the B 0D ∗− μ + ν μ channel. Uncertainties are statistical only

The performance of the OS combination is evaluated in different ways. First the average performance of the OS combination is calculated, giving the same weight to each event. In this case, the best tagging power is obtained by rejecting the events with a poor predicted mistag probability η c (larger than 0.42), despite a lower ε tag. Additionally, to better exploit the tagging information, the tagging performance is determined on independent samples obtained by binning the data in bins of η c . The fits described in the previous sections are repeated for each sub-sample, after which the tagging performances are determined. As the samples are independent, the tagging efficiencies and the tagging powers are summed and subsequently the effective mistag is extracted. The total tagging power increases by about 30 % with respect to the average value, as shown in the last line of Tables 24.

The measured tagging performance is similar among the three channels. The differences between the B +J/ψK + and B 0J/ψK ∗0 results are large in absolute values, but still compatible given the large statistical uncertainties of the B 0J/ψK ∗0 results. There are two reasons for the difference in the tagging efficiency for the B 0D ∗− μ + ν μ and the BJ/ψX channels. Firstly, their selections lead to different B momentum spectra which through production correlations give different momentum spectra of the tagging B. Secondly, the fraction of events passing the hardware trigger due to high transverse momentum leptons or hadrons produced in the opposite B decay differs.

7 Systematic uncertainties

The systematic uncertainties on the calibration parameters p 0 and p 1 are studied by repeating the calibration procedure on B +J/ψK + events for different conditions. The difference is evaluated between the value of the fitted parameter and the reference value, and is reported in the first row of Table 5. Several checks are performed of which the most relevant are reported in Table 6 and are described below:

  • The data sample is split according to the run periods and to the magnet polarity, in order to check whether possible asymmetries of the detector efficiency, or of the alignment accuracy, or variations in the data-taking conditions introduce a difference in the tagging calibration.

  • The data sample is split according to the signal flavour, as determined by the reconstructed final state. In fact, the calibration of the mistag probability for different B flavours might be different due to the different particle/antiparticle interaction with matter or possible detector asymmetries. In this case a systematic uncertainty has to be considered, unless the difference is explicitly taken into account when fitting for CP asymmetries.

  • The distribution of the mistag probability in the fit model, \(\mathcal{P}(\eta)\), is varied either by assuming the signal and background distributions to be equal or by swapping them. In this way possible uncertainties related to the fit model are considered.

In addition, the stability of the calibration parameters is verified for different bins of transverse momentum of the signal B.

Table 5 Fit values and correlations of the OS combined mistag calibration parameters measured in the B +J/ψK +, B 0J/ψK ∗0 and B 0D ∗− μ + ν μ channels. The uncertainties are statistical only
Table 6 Systematic uncertainties on the calibration parameters p 0 and p 1 obtained with B +J/ψK + events

The largest systematic uncertainty in Table 6 originates from the dependence on the signal flavour. As a cross check this dependence is also measured with B 0D ∗− μ + ν μ events, repeating the calibration after splitting the sample according to the signal decay flavour. The differences in this case are δp 0=±0.009 and δp 1=±0.009, where the latter is smaller than in the B +J/ψK + channel. Both for the run period dependence and for the signal flavour the variations of δp 0 and δp 1 are not statistically significant. However, as a conservative estimate of the total systematic uncertainty on the calibration parameters, all the contributions in Table 6 are summed in quadrature.

The tagging efficiencies do not depend on the initial flavour of the signal B. In the case of the B +J/ψK + channel the values are (27.4±0.2) % for the B + and (27.1±0.2) % for the B .

8 Comparison of decay channels

The dependence of the calibration of the OS mistag probability on the decay channel is studied. The values of p 0, p 1 and 〈η c 〉 measured on the whole data sample for all the three channels separately, are shown in Table 5. The parameters p 1 are compatible with 1, within the statistical uncertainty. The differences p 0p 1η c 〉, shown in the fifth column, are compatible with zero, as expected. In the last column the correlation coefficients are shown.

To extract the calibration parameters in the B 0J/ψK ∗0 channel an unbinned maximum likelihood fit to mass, time and η c is performed. In analogy to the B +J/ψK + channel, the fit uses the probability density functions of η c , extracted from data for signal and background separately by using the sPlot [8] technique. The results confirm the calibration performed in the B +J/ψK + channel, albeit with large uncertainties. The results for the B 0D ∗− μ + ν μ channel are obtained from a fit to independent samples corresponding to different ranges of the calculated mistag probability as shown in Fig. 7. The trigger and offline selections, as well as signal spectra, differ for this decay channel with respect to the channels containing a J/ψ meson. Therefore the agreement in the resulting parameters is a validation of the calibration and its applicability to B decays with different topologies. In Fig. 8 the dependency of the measured OS mistag fraction as a function of the mistag probability is shown for the B +J/ψK + and B 0D ∗− μ + ν μ signal events. The superimposed linear fit corresponds to the parametrization of (10) and the parameters of Table 5.

Fig. 7
figure 7

Raw mixing asymmetry as a function of B decay time in B 0D ∗− μ + ν μ events, in the signal mass region, using the OS tagger. Events are split into seven samples of decreasing mistag probability η c

Fig. 8
figure 8

Measured mistag fraction (ω) versus calculated mistag probability (η c ) calibrated on B +J/ψK + signal events for the OS tagger, in background subtracted events. Left and right plots correspond to B +J/ψK + and B 0D ∗− μ + ν μ signal events. Points with errors are data, the red lines represent the result of the mistag calibration, corresponding to the parameters of Table 5

The output of the calibrated flavour tagging algorithms will be used in a large variety of time-dependent asymmetry measurements, involving different B decay channels. Figure 9 shows the calculated mistag distributions in the B +J/ψK +, B 0J/ψK ∗0 and \(B ^{0}_{s} \to J/\psi \phi\) channels. These events are tagged, triggered by the “lifetime unbiased” lines and have an imposed cut of t>0.3 ps. The event selection for the decay \(B ^{0}_{s} \to J/\psi \phi \) is described elsewhere [3]. The distributions of the calculated OS mistag fractions are similar among the channels and the average does not depend on the p T of the B. It has been also checked that the mistag probability does not depend on the signal B pseudorapidity.

Fig. 9
figure 9

Top: calibrated mistag probability distribution for (aB +J/ψK +, (bB 0J/ψK ∗0 and (c\(B ^{0}_{s} \to J/\psi \phi\) events. Bottom: distributions of the mean calibrated OS mistag probability as a function of signal p T for the (dB +, (eB 0 and (f\(B ^{0}_{s}\) channels. The plots show signal events extracted with the sPlot technique and with the requirement t>0.3 ps. The three p T distributions are fitted with straight lines and the slopes are compatible with zero

9 Event-by-event results

In order to fully exploit the tagging information in the CP asymmetry measurements, the event-by-event mistag probability is used to weight the events accordingly. The effective efficiency is calculated by summing the mistag probabilities on all signal events \(\sum_{i} {(1-2 \omega( \eta^{i}_{c})^{2} ) } /N\). We underline that the use of the per-event mistag probability allows the effective efficiency to be calculated on any set of selected events, also for non flavour-specific channels. Table 7 reports the event-by-event tagging power obtained using the calibration parameters determined with the B +J/ψK + events as reported in Table 5. The uncertainties are obtained by propagating the statistical and systematic uncertainties of the calibration parameters. In addition to the values for the three control channels the result obtained for \(B ^{0}_{s} \to J/\psi \phi\) events is shown. For all channels the signal is extracted using the sPlot technique. The results for the tagging power are compatible among the channels containing a J/ψ meson. The higher value for B 0D ∗− μ + ν μ is related to the higher tagging efficiency.

Table 7 Tagging efficiency, mistag probability and tagging power calculated from event-by-event probabilities for B +J/ψK +, B 0J/ψK ∗0, B 0D ∗− μ + ν μ and \(B ^{0}_{s} \to J/\psi \phi\) signal events. The quoted uncertainties are obtained propagating the statistical (first) and systematic (second) uncertainties on the calibration parameters determined from the B +J/ψK + events

10 Summary

Flavour tagging algorithms were developed for the measurement of time-dependent asymmetries at the LHCb experiment. The opposite-side algorithms rely on the pair production of b and \(\bar{b}\) quarks and infer the flavour of the signal B meson from the identification of the flavour of the other b hadron. They use the charge of the lepton (μ, e) from semileptonic B decays, the charge of the kaon from the bcs decay chain or the charge of the inclusive secondary vertex reconstructed from b-hadron decay products. The decision of each tagger and the probability of the decision to be incorrect are combined into a single opposite side decision and mistag probability. The use of the event-by-event mistag probability fully exploits the tagging information and estimates the tagging power also in non flavour-specific decay channels.

The performance of the flavour tagging algorithms were measured on data using three flavour-specific decay modes B +J/ψK +, B 0J/ψK ∗0 and B 0D ∗− μ + ν μ . The B +J/ψK + channel was used to optimize the tagging power and to calibrate the mistag probability. The calibration parameters measured in the three channels are compatible within two standard deviations.

By using the calibration parameters determined from B +J/ψK + events the OS tagging power was determined to be ε tag(1−2ω)2=(2.10±0.08±0.24) % in the B +J/ψK + channel, (2.09±0.09±0.24) % in the B 0J/ψK ∗0 channel and (2.53±0.10±0.27) % in the B 0D ∗− μ + ν μ channel, where the first uncertainty is statistical and the second is systematic. The evaluation of the systematic uncertainty is currently limited by the size of the available data sample.