1 Introduction

Quarkonia are bound states of either a charm and anti-charm quark pair (charmonia, e.g. \({\mathrm{J}/\psi }\), \(\chi _c\) and \({\psi (\mathrm{2S})}\)) or a bottom and anti-bottom quark pair (bottomonia, e.g. \(\Upsilon \)(1S), \(\Upsilon \)(2S), \(\chi _b\) and \(\Upsilon \)(3S)). While the production of the heavy quark pairs in \(\mathrm{pp}\) collisions is relatively well understood in the context of perturbative QCD calculations [13], their binding into quarkonium states is inherently a non-perturbative process and the understanding of their production in hadronic collisions remains unsatisfactory despite the availability of large amounts of data and the considerable theoretical progress made in recent years [4]. For instance none of the models are able to describe simultaneously different aspects of quarkonium production such as polarization, transverse momentum and energy dependence of the cross sections.

There are mainly three approaches used to describe the hadronic production of quarkonium: the Color-Singlet Model (CSM), the Color Evaporation Model (CEM) and the Non-Relativistic QCD (NRQCD) framework.

In the CSM [57], perturbative QCD is used to model the production of on-shell heavy quark pairs, with the same quantum numbers as the quarkonium into which they hadronize. This implies that only color-singlet quark pairs are considered. Historically, CSM calculations performed at leading order (LO) in \(\alpha _s\), the strong interaction coupling constant, have been unable to reproduce the magnitude and the \({p_\mathrm{T}}\) dependence of the \({\mathrm{J}/\psi }\) production cross section measured by CDF at the Tevatron [8]. Several improvements to the model have been worked out since then: the addition of all next-to-leading order (NLO) diagrams [9] as well as some of the next-to-next-to-leading order (NNLO) [10, 11]; the inclusion of other processes to the production of high \({p_\mathrm{T}}\) quarkonia such as gluon fragmentation [12] or the production of a quarkonium in association with a heavy quark pair [13] and the relaxation of the requirement that the heavy quark pair is produced on-shell before hadronizing into the quarkonium [14]. All these improvements contribute to a better agreement between theory and data but lead to considerably larger theoretical uncertainties and/or to the introduction of extra parameters that are fitted to the data.

In the CEM [1517], the production cross section of a given quarkonium state is considered proportional to the cross section of its constituting heavy quark pair, integrated from the sum of the masses of the two heavy quarks to the sum of the masses of the lightest corresponding mesons (D or B). The proportionality factor for a given quarkonium state is assumed to be universal and independent of its transverse momentum \({p_\mathrm{T}}\) and rapidity \(y\). It follows that the ratio between the yields of two quarkonium states formed out of the same heavy quarks is independent of the collision energy as well as of \({p_\mathrm{T}}\) and \(y\). This model is mentioned here for completeness but is not confronted to the data presented in this paper.

Finally, in the framework of NRQCD [18], contributions to the quarkonium cross section from the heavy-quark pairs produced in a color-octet state are also taken into account, in addition to the color-singlet contributions described above. The neutralization of the color-octet state into a color-singlet is treated as a non-perturbative process. It is expanded in powers of the relative velocity between the two heavy quarks and parametrized using universal long-range matrix elements which are considered as free parameters of the model and fitted to the data. This approach has recently been extended to NLO [1921] and is able to describe consistently the production cross section of quarkonia in \({\mathrm{p}\overline{\mathrm{p}}}\) and \(\mathrm{pp}\) collisions at Tevatron, RHIC and, more recently, at the LHC. However, NRQCD predicts a sizable transverse component to the polarization of the \({\mathrm{J}/\psi }\) meson, which is in contradiction with the data measured for instance at Tevatron [22] and at the LHC [2326].

Most of the observations and discrepancies described above apply primarily to charmonium production. For bottomonium production, theoretical calculations are more robust due to the higher mass of the bottom quark and the disagreement between data and theory is less pronounced than in the case of charmonium [27, 28]. Still, the question of a complete and consistent description of the production of all quarkonium states remains open and the addition of new measurements in this domain will help constraining the various models at hand.

In this paper we present measurements of the inclusive production cross section of several quarkonium states (namely \({\mathrm{J}/\psi }\), \({\psi (\mathrm{2S})}\), \(\Upsilon \)(1S) and \(\Upsilon \)(2S)) using the ALICE detector at forward rapidity (\(2.5<y<4\)) in \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV. Inclusive measurements contain, in addition to the quarkonium direct production, contributions from the decay of higher mass excited states: predominantly \({\psi (\mathrm{2S})}\) and \({\chi _c}\) for the \({\mathrm{J}/\psi }\); \(\Upsilon \)(2S), \({\chi _b}\) and \(\Upsilon \)(3S) for the \(\Upsilon \)(1S), and \(\Upsilon \)(3S) and \({\chi _b}\) for the \(\Upsilon \)(2S). For \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\), they contain as well contributions from non-prompt production, mainly from the decay of \(b\)-mesons. For the \({\mathrm{J}/\psi }\) meson, these measurements represent an increase by a factor of about 80 in terms of luminosity with respect to published ALICE results [29, 30]. For the \({\psi (\mathrm{2S})}\) and the \(\Upsilon \), we present here the first ALICE measurements in \(\mathrm{pp}\) collisions.

This paper is organized as follows: a brief description of the ALICE detectors used for this analysis and of the data sample is provided in Sect. 2; the analysis procedure is described in Sect. 3; in Sect. 4 the results are presented and compared to those obtained by other LHC experiments; finally, in Sect. 5 the results are compared to several theoretical calculations.

2 Experimental apparatus and data sample

2.1 Experimental apparatus

The ALICE detector is extensively described in [31]. The analysis presented in this paper is based on muons detected at forward pseudo-rapidity (\(-4<\eta <-2.5\)) in the muon spectrometer [29]Footnote 1. In addition to the muon spectrometer, the Silicon Pixel Detector (SPD) [32] and the V0 scintillator hodoscopes [33] are used to provide primary vertex reconstruction and a Minimum Bias (MB) trigger, respectively. The T0 Čerenkov detectors [34] are also used for triggering purposes and to evaluate some of the systematic uncertainties on the integrated luminosity determination. The main features of these detectors are listed in the following paragraphs.

The muon spectrometer consists of a front absorber followed by a 3 Tm dipole magnet, coupled to tracking and triggering devices. The front absorber, made of carbon, concrete and steel is placed between 0.9 and 5 m from the Interaction Point (IP). It filters muons from hadrons, thus decreasing the occupancy in the first stations of the tracking system. Muon tracking is performed by means of five stations, positioned between 5.2 and 14.4 m from the IP, each one consisting of two planes of Cathode Pad Chambers. The total number of electronic channels is close to 1.1\(\times 10^{6}\) and the intrinsic spatial resolution is about 70 \(\upmu \)m in the bending direction. The first and the second stations are located upstream of the dipole magnet, the third station is embedded inside its gap and the fourth and the fifth stations are placed downstream of the dipole, just before a 1.2 m thick iron wall (7.2 interaction lengths) which absorbs hadrons escaping the front absorber and low momentum muons (having a total momentum \(p<1.5\) GeV/c at the exit of the front absorber). The muon trigger system is located downstream of the iron wall and consists of two stations positioned at 16.1 and 17.1 m from the IP, each equipped with two planes of Resistive Plate Chambers (RPC). The spatial resolution achieved by the trigger chambers is better than 1 cm, the time resolution is about 2 ns and the efficiency is higher than 95 % [35]. The muon trigger system is able to deliver single and dimuon triggers above a programmable \({p_\mathrm{T}}\) threshold, via an algorithm based on the RPC spatial information [36]. For a given trigger configuration, the threshold is defined as the \({p_\mathrm{T}}\) value for which the single muon trigger efficiency reaches 50 % [35]. Throughout its entire length, a conical absorber (\(\theta <2^{\circ }\)) made of tungsten, lead and steel, shields the muon spectrometer against secondary particles produced by the interaction of large-\(\eta \) primary particles in the beam pipe.

Primary vertex reconstruction is performed using the SPD, the two innermost layers of the Inner Tracking System (ITS) [32]. It covers the pseudo-rapidity ranges \(|\eta |<2\) and \(|\eta |<1.4\), for the inner and outer layers respectively. The SPD has in total about \(10^7\) sensitive pixels on 240 silicon ladders, aligned using \(\mathrm{pp}\) collision data as well as cosmic rays to a precision of 8 \(\upmu \)m.

The two V0 hodoscopes, with 32 scintillator tiles each, are placed on opposite sides of the IP, covering the pseudo-rapidity ranges \(2.8<\eta < 5.1\) and \(-3.7<\eta <-1.7\). Each hodoscope is segmented into eight sectors and four rings of equal azimuthal and pseudo-rapidity coverage, respectively. A logical AND of the signals from the two hodoscopes constitutes the MB trigger, whereas the timing information of the two is used offline to reject beam-halo and beam-gas events, thanks to the intrinsic time resolution of each hodoscope which is better than 0.5 ns.

The T0 detectors are two arrays of 12 quartz Čerenkov counters, read by photomultiplier tubes and located on opposite sides of the IP, covering the pseudo-rapidity ranges \(4.61<\eta <4.92\) and \(-3.28<\eta <-2.97\), respectively. They measure the time of the collision with a precision of \({\sim }40\) ps in \(\mathrm{pp}\) collisions and this information can also be used for trigger purposes.

2.2 Data sample and integrated luminosity

The data used for the analysis were collected in 2011. About 1,300 proton bunches were circulating in each LHC ring and the number of bunches colliding at the ALICE IP was ranging from 33 to 37. The luminosity was adjusted by means of the beam separation in the transverse (horizontal) direction to a value of \(\sim 2\times 10^{30}\) cm\(^{-2}\) s\(^{-1}\). The average number of interactions per bunch crossing in such conditions is about 0.25, corresponding to a pile-up probability of \(\sim \)12 %. The trigger condition used for data taking is a dimuon-MB trigger formed by the logical AND of the MB trigger and an unlike-sign dimuon trigger with a \({p_\mathrm{T}}\) threshold of 1 GeV/c for each of the two muons.

About 4\(\times \)10\(^{6}\) dimuon-MB-triggered events were analyzed, corresponding to an integrated luminosity \(L_\mathrm{{int}}=1.35\pm 0.07\) pb\(^{-1}\). The integrated luminosity is calculated on a run-by-run basis using the MB trigger counts measured with scalers before any data acquisition veto, divided by the MB trigger cross section and multiplied by the dimuon-MB trigger lifetime (75.6 % on average). The MB trigger counts are corrected for the trigger purity (fraction of events for which the V0 signal arrival times on the two sides lie in the time window corresponding to beam-beam collisions) and for pile-up. The MB trigger cross section is measured with the van der Meer (vdM) scan method [37]. The result of the vdM scan measurement [38] is corrected by a factor \(0.990\pm 0.002\) arising from a small modification of the V0 high voltage settings which occurred between the vdM scan and the period when the data were collected. The resulting trigger cross section is \(\sigma _\mathrm{MB}=53.7\pm 1.9(\mathrm{syst})\) mb.

3 Data analysis

The quarkonium production cross section \(\sigma \) is determined from the number of reconstructed quarkonia \(N\) corrected by the branching ratio in dimuon \(\mathrm{BR}_{\mu ^+\mu ^-}\) and the mean acceptance times efficiency \({\langle {A\epsilon }\rangle }\) to account for detector effects and analysis cuts. The result is normalized to the integrated luminosity \(L_\mathrm{{int}}\):

$$\begin{aligned} \sigma =\frac{1}{L_\mathrm{{int}}} \frac{N}{\mathrm{BR}_{\mu ^+\mu ^-} \times {\langle {A\epsilon }\rangle }}, \end{aligned}$$
(1)

with \(\mathrm{BR}_{\mu ^+\mu ^-}=(5.93\pm 0.06)\) %, \((0.78\pm 0.09)\) %, \((2.48\pm 0.05)\) % and \((1.93\pm 0.17)\) % for \({\mathrm{J}/\psi }\), \({\psi (\mathrm{2S})}\), \(\Upsilon \)(1S) and \(\Upsilon \)(2S), respectively [39]. Pile up events have no impact on the reconstruction of the quarkonium yields and are properly accounted for by the luminosity measurement.

3.1 Signal extraction

Quarkonia are reconstructed in the dimuon decay channel and the signal yields are evaluated using a fit to the \(\mu ^+\mu ^-\) invariant mass distributions, as detailed in [29]. In order to improve the purity of the dimuon sample, the following selection criteria are applied:

  • both muon tracks in the tracking chambers must match a track reconstructed in the trigger system;

  • tracks are selected in the pseudo-rapidity range \(-4\le \eta \le -2.5\);

  • the transverse radius of the track, at the end of the front absorber, is in the range \(17.6\le R_\mathrm{{abs}} \le 89.5\) cm;

  • the dimuon rapidity is in the range 2.5 \(\le y \le \) 4;

  • a cut on the product of the total momentum of a given track and its distance to the primary vertex in the tranverse plane (called DCA) is applied for the bottomonium analysis in order to reduce the background under the \(\Upsilon \) signals. It is set to \(6\times \sigma _\mathrm{pDCA}\), where \(\sigma _\mathrm{pDCA}\) is the resolution on this quantity. The cut accounts for the total momentum and angular resolutions of the muon detector as well as for the multiple scattering in the front absorber. This cut is not applied to the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) analyses because it has negligible impact on the signal-to-background ratio for these particles.

These selection criteria help in removing hadrons escaping from (or produced in) the front absorber, low-\({p_\mathrm{T}}\) muons from pion and kaon decays, secondary muons produced in the front absorber and fake muon tracks, without significantly affecting the signals. Applying this selection criteria improves the signal-to-background ratio by 30 % for the \({\mathrm{J}/\psi }\) and by a factor two for the \({\psi (\mathrm{2S})}\). It also allows to reduce the background by a factor three in the \(\Upsilon \) mass region.

The \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) yields are evaluated by fitting the dimuon invariant mass distribution in the mass range \(2 < {m_{\mu \mu }}< 5\) GeV/c\(^2\). The function used in the fit is the sum of either two extended Crystal Ball (CB2) functionsFootnote 2 [40] or two pseudo-Gaussian functions [41] for the signals. The background is described by either a Gaussian with a width that varies linearly with the mass, also called Variable Width Gaussian (VWG), or the product of a fourth order polynomial function and an exponential function (Pol4 \(\times \) Exp).

The normalization factors of the signal functions are left free, together with the position and the width of the \({\mathrm{J}/\psi }\) signal. On the other hand, the position and the width of the \({\psi (\mathrm{2S})}\) are tied to the corresponding parameters of the \({\mathrm{J}/\psi }\) by forcing the mass difference between the two states to be equal to the one given by the Particle Data Group [39] and the mass resolution ratio to match the value obtained from a Monte Carlo (MC) simulation. The tail parameters for the \({\mathrm{J}/\psi }\) are determined by fitting the shape of the \({\mathrm{J}/\psi }\) signal obtained from the simulation. The same tail parameters are used for the \({\psi (\mathrm{2S})}\) as the resonances are separated by only 590 MeV/c\(^{2}\) so that the energy straggling and multiple Coulomb scattering effects of the front absorber on the decay muons are expected to be similar. All the parameters of the functions used to fit the background are left free. An example of fit to the dimuon invariant mass distribution in the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) mass region is shown in the left panel of Fig. 1.

Fig. 1
figure 1

Dimuon invariant mass distribution in the region of charmonia (left) and bottomonia (right). Solid (dotted) lines correspond to signal (background) fit functions. The sum of the various fit functions is also shown as a solid line. For the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\), a combination of two extended Crystal Ball functions is used for the signal and a variable width Gaussian function is used for the background. For the \(\Upsilon \) resonances, a combination of extended Crystal Ball functions is used for the signals and two power law functions for the background

The \(\Upsilon \)(1S), (2S) and (3S) signal extractions are performed as for the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) by fitting the dimuon invariant mass distribution in the mass range \(5 < {m_{\mu \mu }}< 15\) GeV/c\(^2\). Due to the limited statistics, only the \(\Upsilon \)(1S) and \(\Upsilon \)(2S) yields are measured in this analysis. The background is fitted with a sum of either two power law or two exponential functions with all parameters left free. Each of the three \(\Upsilon \) signals (1S, 2S and 3S) is fitted with a Gaussian or a CB2 function. The fit parameters of the \(\Upsilon \)(1S) signal are left free, whereas the width and mass position of the \(\Upsilon \)(2S) and \(\Upsilon \)(3S) are fixed with respect to the ones of the \(\Upsilon \)(1S) in the same way as the \({\psi (\mathrm{2S})}\) parameters are fixed to the \({\mathrm{J}/\psi }\). For the CB2 fit, the tail parameters of the function are fixed using the same method as for the charmonium signal extraction. An example of fit to the dimuon invariant mass distribution in the \(\Upsilon \)’s mass region is shown in the right panel of Fig. 1.

About 70,800 \({\mathrm{J}/\psi }\), 2,000 \({\psi (\mathrm{2S})}\), 380 \(\Upsilon \)(1S) and 100 \(\Upsilon \)(2S) have been measured with signal-to-background ratios (S/B), evaluated within three standard deviations with respect to the quarkonium pole mass, of 4, 0.2, 1 and 0.3, respectively.

In order to determine the \({p_\mathrm{T}}\) differential cross sections, the data sample is divided in thirteen, nine and five transverse momentum intervals for \({\mathrm{J}/\psi }\), \({\psi (\mathrm{2S})}\) and \(\Upsilon \)(1S), respectively. The differential cross section as a function of rapidity is evaluated in six intervals for the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) and three for the \(\Upsilon \)(1S). Given the available statistics, only the measurement of the \({p_\mathrm{T}}\)- and \(y\)-integrated \(\Upsilon \)(2S) cross section is possible. The quarkonium raw yields obtained from the differential study are reported in Sect. 7. For \({\mathrm{J}/\psi }\), the S/B ratio increases from \(2.2\) to \(8.5\) with increasing \({p_\mathrm{T}}\) and from \(3.7\) to \(5.4\) with increasing rapidity. For \({\psi (\mathrm{2S})}\), it increases from \(0.1\) to \(0.6\) with increasing \({p_\mathrm{T}}\) and from \(0.1\) to \(0.2\) with increasing rapidity. For the \(\Upsilon \)(1S), no variation of the S/B ratio is observed within statistical uncertainties.

3.2 Acceptance and efficiency corrections

The measured yields obtained from the fits to the dimuon invariant mass distributions are corrected by the acceptance times efficiency factor \({\langle {A\epsilon }\rangle }\) to determine the production yields of the four resonances.

In order to evaluate the \({\langle {A\epsilon }\rangle }\) factor, simulations of quarkonium production in \(\mathrm{pp}\) collisions at \(\sqrt{s}=7~\mathrm{TeV}\) are performed with realistic \({p_\mathrm{T}}\) and \(y\) distributions, obtained by fitting existing data measured at the same energy for \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) [42, 43], and by scaling CDF data [27] to \(\sqrt{s}=7\) TeV for the \(\Upsilon \). All resonances are forced to decay into two muons. Particle transport is performed using GEANT3 [44] and a realistic detector response is applied to the simulated hits in order to reproduce the performance of the apparatus during data taking. The same analysis cuts as used for the data are applied to the tracks reconstructed from these hits.

The simulations (one for each resonance) are performed on a run-by-run basis, using a realistic description of the ALICE muon spectrometer performance. The misalignment of the muon spectrometer is tuned to reproduce the mass resolution of the \({\mathrm{J}/\psi }\) measured from data. The resonances are generated in a \(y\) range that is wider than the range used for the measurements (\(2.5<y<4\)) in order to account for edge effects. In each \(y\) and \({p_\mathrm{T}}\) interval, the \({\langle {A\epsilon }\rangle }\) factor is calculated as the ratio of the number of reconstructed quarkonia over the number of quarkonia generated in this interval.

The \({\langle {A\epsilon }\rangle }\) factors, averaged over the entire data taking period, are \((13.22\pm 0.02)~\%\) for \({\mathrm{J}/\psi }\), \((16.64\pm 0.02)~\%\) for \({\psi (\mathrm{2S})}\), \((20.93\pm 0.05)~\%\) for \(\Upsilon \)(1S) and \((21.02\pm 0.05)~\%\) for \(\Upsilon \)(2S), where the uncertainties are statistical. The \({\langle {A\epsilon }\rangle }\) correction factors associated to the \({p_\mathrm{T}}\) and \(y\) differential yields are given in Sect. 7.

3.3 Systematic uncertainties

The main sources of systematic uncertainties on the production cross section come from the estimation of the number of measured quarkonia, the acceptance times efficiency correction factor and the integrated luminosity. The uncertainty on the dimuon branching ratio is also taken into account.

The systematic uncertainty on the signal extraction is evaluated using the Root Mean Square (RMS) of the results obtained with different signal functions (CB2 or pseudo-Gaussian functions for charmonia, CB2 or Gaussian functions for bottomonia), different background functions (VWG or Pol4\(\times \)Exp for charmonia, the sum of two exponential or two power law functions for bottomonia) and different fitting ranges (beside the nominal fitting ranges quoted in Sect. 3.1 the ranges \(2.5<{m_{\mu \mu }}<4.5\) GeV/c\(^2\) and \(8<{m_{\mu \mu }}<12\) GeV/c\(^2\) were also used for charmonia and bottomonia, respectively). The tail parameters of the signal functions are also varied within the limits determined by fits to the simulated quarkonium mass distributions in the \({p_\mathrm{T}}\) or \(y\) intervals used in the analysis. Finally, for the quarkonia analysis, different values for the ratio between the \({\psi (\mathrm{2S})}\) and the \({\mathrm{J}/\psi }\) mass resolution have also been tested, estimated using a fit to the \({p_\mathrm{T}}\)- and \(y\)-integrated invariant mass distribution with these parameters left free. The resulting systematic uncertainties averaged over \({p_\mathrm{T}}\) and \(y\) are \(2\) % for the \({\mathrm{J}/\psi }\), \(8\) % for the \({\psi (\mathrm{2S})}\), \(8\) % for the \(\Upsilon \)(1S) and \(9\) % for the \(\Upsilon \)(2S).

The systematic uncertainty on the acceptance times efficiency correction factor has several contributions: the parametrization of the input \({p_\mathrm{T}}\) and \(y\) distributions of the simulated quarkonia, the track reconstruction efficiency, the trigger efficiency and the matching between tracks in the muon tracking and triggering chambers. The acceptance times efficiency correction factors are evaluated assuming that all quarkonium states are unpolarized. If the \(\Upsilon \)(1S) production polarization is fully transverse or fully longitudinal, then the cross section changes by about 37 and 20 %, respectively. This result is consistent with previous studies made for charmonia [29, 30]. There is to date no evidence for a significant quarkonium polarization at \(\sqrt{s}=7\) TeV, neither for \({\mathrm{J}/\psi }\) [23], \({\psi (\mathrm{2S})}\) [24, 25], nor for \(\Upsilon \) [26]. Therefore, no systematic uncertainty due to the quarkonium polarization has been taken into account.

For \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\), the parametrization of the input \({p_\mathrm{T}}\) and \(y\) distributions is based on fits to existing data measured at the same energy and in the same rapidity range [42, 43]. The corresponding systematic uncertainty is obtained by varying these parametrizations within the statistical and systematic uncertainties of the data, and taking the RMS of the resulting \({\langle {A\epsilon }\rangle }\) distribution. Correlations between \({p_\mathrm{T}}\) and \(y\) observed by the LHCb collaboration [43] are also accounted for by evaluating the \({\langle {A\epsilon }\rangle }\) factors for each \({p_\mathrm{T}}\) (\(y\)) distribution measured in smaller \(y\) (\({p_\mathrm{T}}\)) intervals and using the largest difference between the resulting values as an additional systematic uncertainty, quadratically summed to the one obtained using the procedure described above. For the \(\Upsilon \), simulations are based on \({p_\mathrm{T}}\) and \(y\) parametrizations scaled from data measured by CDF [27] to \(\sqrt{s}=7\) TeV. The corresponding systematic uncertainty is evaluated by changing the energy of the scaled CDF data to \(\sqrt{s}=4\) TeV and \(\sqrt{s}=10\) TeV and evaluating the corresponding \({\langle {A\epsilon }\rangle }\). This corresponds to a variation of the input yields of at most 15 % as a function of rapidity and 40 % as a function of \({p_\mathrm{T}}\). We note that extrapolating results obtained at a different collision energy is a conservative approach with respect to using CMS [45, 46] and LHCb [28] data at \(\sqrt{s}=7\) TeV. The resulting uncertainties are 1.7 % for \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\), and 2.4 % for \(\Upsilon \)(1S) and \(\Upsilon \)(2S).

The single muon tracking efficiency can be evaluated both in data [29] and in simulations. A difference of about \(1.6\) % is observed which varies as a function of the muon pseudo-rapidity and \({p_\mathrm{T}}\). The impact of this difference on \({\langle {A\epsilon }\rangle }\) is quantified by replacing the single muon tracking efficiencies obtained from the simulated detector response with the values measured in the data. An additional uncertainty arising from the correlated inefficiency in the tracking chambers was evaluated and amounts to 2.5 % at the dimuon level. The resulting uncertainty on the corrected quarkonium yields amounts to 6.5 % for all resonances.

Concerning the trigger efficiency, a small difference is observed between data and simulations for the trigger response function. To account for this difference, a procedure similar to the one used for the systematic uncertainty on the track reconstruction efficiency is applied. The effect on \({\langle {A\epsilon }\rangle }\) amounts up to \(2\) % for all resonances. Additional uncertainties come from the method used to determine the RPC efficiency from data (\(2\) %) and from the efficiency of the MB trigger condition for events where a quarkonium is produced (\(2\) %). The latter uncertainty is evaluated by means of a sample of events collected with a stand-alone dimuon trigger (without MB condition): the difference between the number of quarkonia reconstructed in such sample with and without the offline requirement of the MB condition is retained as uncertainty.

The difference observed in the simulations for different \(\chi ^2\) cuts on the matching between the tracks reconstructed in the tracking chambers and those reconstructed in the trigger chambers leads to a systematic uncertainty of 1 % on \({\langle {A\epsilon }\rangle }\), independent from \({p_\mathrm{T}}\) and \(y\).

Finally, the uncertainty on the integrated luminosity amounts to 5 %. It includes contributions from the MB trigger cross section (3.5 % [38]), the MB trigger purity (3 %, evaluated by varying the cuts defining the beam-beam and beam-gas collisions), possible effects on the MB trigger cross section from V0 aging between the moment when the vdM scan was performed and the data taking period (1.5 %), the effects of V0 after-pulses and other instrumental effects on the MB trigger counts (1.5 %, evaluated from fluctuations in the ratio of the MB trigger rate to a reference trigger rate provided by the T0).

A summary of the different systematic sources is given in Table 1 and the systematic uncertainties associated to the \({p_\mathrm{T}}\) and \(y\) differential cross sections are listed in Sect. 7. Concerning the \({p_\mathrm{T}}\) and \(y\) dependence of these systematic uncertainties, the uncertainty associated to the luminosity is considered a global scale uncertainty, as is the uncertainty of the quarkonia branching ratio to dimuons. The one associated to the input MC parametrization is considered as largely point-to-point correlated. All other sources are considered as predominantly uncorrelated.

Table 1 Relative systematic uncertainties on the quantities associated to quarkonium cross section measurement. Into brackets, values correspond to the minimum and the maximum as a function of \({p_\mathrm{T}}\) and \(y\)

4 Results

4.1 Integrated and differential production cross sections of \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\)

The measured inclusive \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) production cross sections in \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV in the rapidity range \(2.5<y<4\) are:

\(\sigma _\mathrm{{\mathrm{J}/\psi }}=~6.69\pm 0.04(\mathrm{stat})\pm 0.63(\mathrm{syst})\) \(\upmu \)b, for \(0<{p_\mathrm{T}}<20\) GeV/c,

\(\sigma _{\psi (\mathrm{2S})}=1.13\pm 0.07(\mathrm{stat})\pm 0.19(\mathrm{syst})\) \(\upmu \)b, for \(0<{p_\mathrm{T}}<12\) GeV/c.

The measured \({\mathrm{J}/\psi }\) production cross section is in good agreement with the previously published ALICE result [29, 30].

Figure 2 shows the differential production cross sections of \({\mathrm{J}/\psi }\) (top) and \({\psi (\mathrm{2S})}\) (bottom) as a function of \({p_\mathrm{T}}\) (left) and rapidity (right). In all figures, the error bars represent the statistical uncertainties whereas the boxes correspond to the systematic uncertainties. The systematic uncertainty on the luminosity is quoted in the legend. This analysis extends the \({p_\mathrm{T}}\) range of the \({\mathrm{J}/\psi }\) measurement with respect to the previous ALICE measurement [29, 30] from 8 to 20 GeV/c.

Fig. 2
figure 2

Differential production cross sections of \({\mathrm{J}/\psi }\) (top) and \({\psi (\mathrm{2S})}\) (bottom) as a function of \({p_\mathrm{T}}\) (left) and \(y\) (right). The results are compared to previous ALICE results [29, 30] and LHCb measurements [42, 43]. The open symbols are the reflection of the positive-y measurements with respect to \(y=0\). The vertical error bars and the boxes represent the statistical and systematic uncertainties, respectively

The \({p_\mathrm{T}}\) differential cross sections are compared with the values reported by the LHCb collaboration [42, 43]. The LHCb data points in Fig. 2 correspond to the sum of prompt and \(b\)-meson decays quarkonium productions. For the \({\mathrm{J}/\psi }\) cross sections (Fig. 2, top left), a good agreement is observed between the two experiments. The comparison to the LHCb results for the \({p_\mathrm{T}}\) dependence of \({\psi (\mathrm{2S})}\) cross section (Fig. 2, bottom left) is not straightforward due to the different rapidity ranges. The ALICE measurement tends to be slightly higher than the one reported by LHCb, except at very low \({p_\mathrm{T}}\). Still, the results are in agreement within systematic uncertainties.

The differential cross sections of \({\mathrm{J}/\psi }\) as a function of rapidity (Fig. 2, top right) are compared to the previous measurements reported by ALICE [29, 30] and LHCb [42]. The results are in good agreement. Furthermore, the ALICE \({\mathrm{J}/\psi }\) measurement at mid-rapidity in the di-electron channel complements the forward rapidity measurement and allows to present the \({\mathrm{J}/\psi }\) differential cross section over a broad rapidity range for \({p_\mathrm{T}}\) down to zero. The rapidity dependence of the inclusive \({\psi (\mathrm{2S})}\) production cross section at forward rapidity (Fig. 2, bottom right) is measured for the first time at \(\sqrt{s}=7\) TeV.

The inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio at \(\sqrt{s}=7\) TeV, integrated over \({p_\mathrm{T}}\) and \(y\), is \(\sigma _{\psi (\mathrm{2S})}/\sigma _{\mathrm{J}/\psi }=0.170\pm 0.011(\mathrm{stat.})\pm 0.013(\mathrm{syst})\). To obtain this ratio, the same fit function (CB2 or pseudo-Gaussian function) is used for both resonances, for all the cases described in Sect. 3.3. The mean of the resulting distribution is used as the central value and its RMS is used as the systematic uncertainty on signal extraction. The other sources of systematic uncertainty cancel out in the ratio, except for the uncertainty on the \({\langle {A\epsilon }\rangle }\) factors. As a consequence of the adopted procedure, some differences between this value and the ratio of the integrated cross sections are expected.

Figure 3 presents the \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio as a function of \({p_\mathrm{T}}\) (left) and \(y\) (right). This ratio increases with \({p_\mathrm{T}}\), whereas it shows little or no dependence on rapidity. The comparison with the LHCb measurement (left) shows a reasonable agreement, even though this analysis presents the ratio between inclusive cross sections whereas the LHCb collaboration reports the ratio between prompt particle cross sections, thus removing the contribution from \(b\)-meson decays. Assuming that the \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio is independent of \(y\) over the entire rapidity range, as confirmed by ALICE measurements, and multiplying it by the branching ratio of \({\psi (\mathrm{2S})}\) decaying into \({\mathrm{J}/\psi }\) plus anything \(\mathrm{BR}_{{\psi (\mathrm{2S})}\rightarrow {\mathrm{J}/\psi }}=60.3\pm 0.7\) % [39], one gets the fraction of inclusive \({\mathrm{J}/\psi }\) coming from \({\psi (\mathrm{2S})}\) decay \(f^{\psi (\mathrm{2S})}=0.103\pm 0.007(\mathrm{stat})\pm 0.008(\mathrm{syst})\).

Fig. 3
figure 3

\({\psi (\mathrm{2S})}\)/\({\mathrm{J}/\psi }\) ratio as a function of \({p_\mathrm{T}}\) (left) compared to LHCb measurement [43] and as a function of rapidity (right)

4.2 Integrated and differential production cross sections of \(\Upsilon \)(1S) and \(\Upsilon \)(2S)

The measured inclusive \(\Upsilon \)(1S) and \(\Upsilon \)(2S) production cross sections, integrated over \(2.5 < y < 4\) and \(0<{p_\mathrm{T}}<12\) GeV/c, are:

$$\begin{aligned} \sigma _{\Upsilon (\mathrm{1S})}=54.2\pm 5.0 (\mathrm{stat})\pm 6.7(\mathrm{syst}) \hbox {nb}\\ \sigma _{\Upsilon (\mathrm{2S})}=18.4\pm 3.7(\mathrm{stat})\pm 2.9(\mathrm{syst}) \hbox {nb}. \end{aligned}$$

The total number of \(\Upsilon \)(1S) extracted from the data allows to measure its differential production cross section in five \({p_\mathrm{T}}\) intervals and three rapidity intervals. For the \(\Upsilon \)(2S), on the contrary, no differential analysis could be performed due to the limited number of events.

Figure 4 presents the \(\Upsilon \)(1S) differential production cross section as a function of \({p_\mathrm{T}}\) (left) and the differential cross sections of \(\Upsilon \)(1S) and \(\Upsilon \)(2S) as a function of rapidity (right). The \(\Upsilon \)(1S) \({p_\mathrm{T}}\) differential cross sections are compared to the values reported by the LHCb collaboration [28] in the same rapidity range (\(2.5 < y < 4\)). The results are in good agreement. The \(\Upsilon \)(1S) and \(\Upsilon \)(2S) differential cross sections as a function of rapidity (Fig. 4 right) are presented together with the LHCb [28] and CMS  [45, 46] measurements for \({p_\mathrm{T}}\) down to zero. The measurements from ALICE and LHCb are in good agreement for both \(\Upsilon \) states.

Fig. 4
figure 4

Differential cross section of \(\Upsilon \)(1S) as a function of \({p_\mathrm{T}}\) (left) and differential cross sections of \(\Upsilon \)(1S) and \(\Upsilon \)(2S) as function of rapidity (right), measured by ALICE, LHCb [28] and CMS [45, 46]. The open symbols are reflected with respect to \(y=0\)

The \(\Upsilon \)(2S)-to-\(\Upsilon \)(1S) cross section ratio at \(\sqrt{s}=7\) TeV integrated over \({p_\mathrm{T}}\) and \(y\) is: \(\sigma _{\Upsilon (\mathrm{2S})}/\sigma _{\Upsilon (\mathrm{1S})}=0.34\pm 0.10(\mathrm{stat})\pm 0.02(\mathrm{syst})\). This ratio is in agreement with the one measured by the LHCb experiment [28]. Using a branching ratio for \(\Upsilon \)(2S) decaying into \(\Upsilon \)(1S) plus anything \(\mathrm{BR}_{{\Upsilon (2S)}\rightarrow {\Upsilon (1S)}}=26.5\pm 0.5\) % [39], one gets the fraction of inclusive \(\Upsilon \)(1S) coming from \(\Upsilon \)(2S) decay \(f^{\Upsilon (\mathrm{2S})}=0.090\pm 0.027(\mathrm{stat})\pm 0.005(\mathrm{syst})\).

5 Model comparison

5.1 Differential production cross sections as a function of \({p_\mathrm{T}}\)

The measured inclusive \({\mathrm{J}/\psi }\) differential production cross section as a function of \({p_\mathrm{T}}\) is compared to three theoretical calculations performed in the CSM (Fig. 5): two complete calculations at LO and NLO respectively and a third calculation, called NNLO*, that includes the leading-\({p_\mathrm{T}}\) contributions appearing at NNLO [47]. In agreement with the authors, the calculations are scaled by a factor \(1/0.6\) to account for the fact that they correspond to direct \({\mathrm{J}/\psi }\) production, whereas they are compared to inclusive measurements. This scaling factor is obtained by assuming that about \(20\) % of the inclusive \({\mathrm{J}/\psi }\) come from \({\chi _c}\) decay [48], \(10\) % from \({\psi (\mathrm{2S})}\) (factor \(f^{\psi (\mathrm{2S})}\), Sect. 4) and \(9\) % from \(b\)-mesons [42]. The LO calculation underestimates the data for \({p_\mathrm{T}}>2\) GeV/c and the \({p_\mathrm{T}}\) dependence is much steeper than the measured one. At NLO, the \({p_\mathrm{T}}\) dependence is closer to that of the data, but the calculation still underestimates the measured cross section. The addition of some NNLO contributions further improves the agreement between data and theory concerning the \({p_\mathrm{T}}\) dependence and further reduces the difference between the two, at the price of larger theoretical uncertainties.

Fig. 5
figure 5

Inclusive \({\mathrm{J}/\psi }\) differential production cross section as a function of \({p_\mathrm{T}}\), compared to several scaled CSM calculations for direct \({\mathrm{J}/\psi }\) [47]. Details on the calculations are given in the text

Using a constant scaling factor for the direct-to-inclusive \({\mathrm{J}/\psi }\) production cross section ratio requires that the \({p_\mathrm{T}}\) distributions of direct and decay \({\mathrm{J}/\psi }\) have the same shape. This assumption is a rather crude approximation and for instance the LHCb collaboration has measured a significant increase of the fraction of \({\mathrm{J}/\psi }\) from \(b\)-meson decay with \({p_\mathrm{T}}\) up to \(30\) % for \({p_\mathrm{T}}>14\) GeV/c [42]. Properly accounting for these variations would improve the agreement between data and theory at large \({p_\mathrm{T}}\).

Figure 6 presents the comparison of the inclusive \({\mathrm{J}/\psi }\) differential production cross section (top), the inclusive \({\psi (\mathrm{2S})}\) differential production cross section (middle) and the ratio between the two (bottom) as a function of \({p_\mathrm{T}}\) to two NRQCD calculations for prompt \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) production at NLO from [49] (left) and [19] (right). As discussed with the authors, a number of theoretical uncertainties cancels out when forming the \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) ratio and the theory bands shown in the bottom panels are obtained by taking the ratio of the \({\psi (\mathrm{2S})}\) and \({\mathrm{J}/\psi }\) upper and lower bounds from top and middle panels separately, rather than forming all four combinations.

Fig. 6
figure 6

Inclusive \({\mathrm{J}/\psi }\) differential production cross section (top), inclusive \({\psi (\mathrm{2S})}\) differential production cross section (middle) and inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) ratio (bottom) as a function of \({p_\mathrm{T}}\) compared to two NRQCD calculations from [49] (left) and [19] (right)

The NRQCD calculations include both the same leading order Color-Singlet (CS) contributions as the one shown in Fig. 5 and Color-Octet (CO) contributions that are adjusted to experimental data by means of so-called Long-Range Matrix Elements (LRME). The two calculations differ in the LRME parametrization: the first (left panels of Fig. 6) uses three matrix elements whereas the second (right panels of Fig. 6) uses only two linear combinations of these three elements. Other differences include: the data sets used to fit these matrix elements, the minimum \({p_\mathrm{T}}\) above which the calculation is applicable and the way by which contributions from \({\chi _c}\) decays into prompt \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) productions are accounted for. The first calculation has significantly larger uncertainties than the second for both the \({\mathrm{J}/\psi }\) cross section and the \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) ratio. This is a consequence of the differences detailed above and in particular the fact that the fits start at a lower \({p_\mathrm{T}}\) and include a larger number of data sets.

Both calculations show reasonable agreement with data for all three observables. As it is the case for the CSM calculations, properly accounting for the contribution from \(b\)-meson decays to both \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) inclusive productions in either the data or the theory would further improve the agreement at high \({p_\mathrm{T}}\).

In the CSM, the direct \({\psi (\mathrm{2S})}\) to direct \({\mathrm{J}/\psi }\) ratio is a constant, independent of \({p_\mathrm{T}}\) and rapidity. It corresponds to the square of the ratio between the \({\psi (\mathrm{2S})}\) and \({\mathrm{J}/\psi }\) wave functions at the origin and amounts to about \(0.6\) [47]Footnote 3. This value, scaled by the direct-to-inclusive \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) ratios (\(0.6\) for \({\mathrm{J}/\psi }\), as discussed above, and \(0.85\) for \({\psi (\mathrm{2S})}\) [43]), becomes 0.42. It is larger than the \({p_\mathrm{T}}\)-integrated measurement quoted in Sect. 4 and matches the values measured for \({p_\mathrm{T}}>9\) GeV/c.

Concerning the increase of the inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio as a function of \({p_\mathrm{T}}\) observed in the data, a fraction originates from the contribution of \({\psi (\mathrm{2S})}\) and \({\chi _c}\) decays. Assuming that the direct production of all charmonium states follows the same \({p_\mathrm{T}}\) distribution, as it is the case in the CEM, the transverse momentum of \({\mathrm{J}/\psi }\) coming from the decay of the higher mass resonances must be smaller than the one of the parent particle. This results in an increase of the corresponding contribution to the inclusive cross section ratio as a function of \({p_\mathrm{T}}\). The \({p_\mathrm{T}}\) dependence resulting from this effect on the inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio has been investigated using PYTHIA [51] for decaying the parent particle into a \({\mathrm{J}/\psi }\). The result is normalized to our measured integrated \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio and compared to the data in Fig. 7. As expected, an increase of the ratio is observed with increasing \({p_\mathrm{T}}\) but it is not sufficient to explain the trend observed in the data. This indicates that the increase observed in the data cannot be entirely explained with simple decay kinematics arguments and that other effects must be taken into account. A non-constant ratio can already be expected in the simplest case of CSM, where different diagram contributions to S- and P- wave charmonia production are expected, resulting in different feed-down contributions to \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\). On top of this Color-Octet contributions can also be added, as done in the NRQCD framework. The proper accounting of such contributions is sufficient to reproduce the trend observed in the data, as shown in Fig. 6, bottom panels.

Fig. 7
figure 7

Inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratio as a function of \({p_\mathrm{T}}\) compared to a simulation in which all direct quarkonia are considered to have the same \({p_\mathrm{T}}\) distribution and only kinematic effects due to the decay of higher mass resonances are taken into account, using PYTHIA [51]

In Fig. 8, the inclusive \(\Upsilon \)(1S) differential production cross section as a function of \({p_\mathrm{T}}\) is compared to three CSM calculations [52] (left) and to NRQCD [19] (right).

Fig. 8
figure 8

Differential inclusive production cross section of \(\Upsilon \)(1S) as a function of \({p_\mathrm{T}}\) compared to three scaled CSM calculations of direct \(\Upsilon \)(1S) [52] (left) and a NRQCD calculation of inclusive \(\Upsilon \)(1S) [55, 56] (right)

The CSM calculations are the same as for the \({\mathrm{J}/\psi }\): two complete calculations at LO and NLO respectively and a calculation, called NNLO*, that includes the leading-\({p_\mathrm{T}}\) contributions appearing at NNLO [52]. They have been scaled by a factor \(1/0.6\) to account for the contributions of \(\Upsilon \)(2S) (\(9\) %, factor \(f^{\Upsilon (\mathrm{2S})}\), Sect. 4), \(\Upsilon \)(3S) (\(\sim 1\) % [28]) and \({\chi _b}\) (\({\chi _b}\)(1P) \(\sim 20\) % [53] and \({\chi _b}\)(2P)\(\sim 10\) % [54]) decaying into \(\Upsilon \)(1S). The comparison between these calculations and the data shows qualitatively the same features as for the \({\mathrm{J}/\psi }\) case: the LO calculation underestimates the data for \({p_\mathrm{T}}>4\) GeV/c and falls too rapidly with increasing \({p_\mathrm{T}}\). The \({p_\mathrm{T}}\) dependence of the NLO calculation is closer to that of the data, but the calculation still underestimates the cross section over the full \({p_\mathrm{T}}\) range. A good agreement is achieved at NNLO, but over a limited \({p_\mathrm{T}}\) range and with large theoretical uncertainties.

The NRQCD calculation is performed by the same group as in Fig. 6 (right) for the \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) [19]. It includes all the feed-down contributions from \(\Upsilon \)(2S), \(\Upsilon \)(3S) and \({\chi _b}\). In the limited \({p_\mathrm{T}}\) range of our measurement, the theory overestimates the data. This disagreement becomes smaller for increasing \({p_\mathrm{T}}\) as it is also the case for the LHCb data [28].

In the CSM, the direct \(\Upsilon \)(2S) to direct \(\Upsilon \)(1S) cross section ratio is a constant equal to \(0.45\) [52]. In order to compare this value to the measurement quoted in Sect. 4, it must be scaled by the direct-to-inclusive \(\Upsilon \)(1S) and \(\Upsilon \)(2S) ratios. For \(\Upsilon \)(1S), we use a scaling factor of 0.6, as discussed above. For \(\Upsilon \)(2S), we consider a 5 % contribution from \(\Upsilon \)(3S) [28] and neglect the contribution from \(\chi _b\), which has not been measured to date. We get an upper limit for the \(\Upsilon \)(2S) direct-to-inclusive ratio of 0.95 and consequently a lower limit for the scaled direct \(\Upsilon \)(2S)-to-\(\Upsilon \)(1S) ratio of 0.28. This lower limit is in good agreement with the measurement. We note that the measurement is also in good agreement with a NRQCD calculation performed at LO, as described in [57].

5.2 Differential production cross sections as a function of rapidity

Since the LO CSM calculations described in the previous section extend down to zero \({p_\mathrm{T}}\) they can be integrated over \({p_\mathrm{T}}\) and evaluated as a function of the quarkonium rapidity. The result is compared to the measured inclusive differential cross sections of \({\mathrm{J}/\psi }\) and \(\Upsilon \)(1S) in Fig. 9. As for the \({p_\mathrm{T}}\) differential cross sections, the calculations are scaled by the direct-to-inclusive ratios described in the previous section (\(1/0.6\) for \({\mathrm{J}/\psi }\) and \(\Upsilon \)(1S)). Extending the calculation down to zero \({p_\mathrm{T}}\) results in large theoretical uncertainties: a factor four to five between the lower and upper bounds. The magnitude of the calculations is in agreement with the measurements. It is also worth noting that these calculations have no free parameters.

Fig. 9
figure 9

Differential inclusive production cross sections of \({\mathrm{J}/\psi }\) (left) and \(\Upsilon \)(1S) (right) as a function of \(y\) compared to a CSM calculation at LO [52]

6 Conclusion

In conclusion, the inclusive production cross sections of \({\mathrm{J}/\psi }\), \({\psi (\mathrm{2S})}\), \(\Upsilon \)(1S) and \(\Upsilon \)(2S) as a function of \({p_\mathrm{T}}\) and \(y\) have been measured using the ALICE detector at forward rapidity (\(2.5<y<4\)) in \(\mathrm{pp}\) collisions at a centre of mass energy \(\sqrt{s}=7\) TeV. For \({\mathrm{J}/\psi }\), the measurements reported here represent an increase by a factor of about \(80\) in terms of luminosity with respect to published ALICE results, whereas they are the first ALICE measurements for the other three quarkonium states. The measured inclusive cross sections, integrated over \({p_\mathrm{T}}\) and \(y\) are: \(\sigma _\mathrm{{\mathrm{J}/\psi }}=~6.69\pm 0.04\pm 0.63\) \(\upmu \)b, \(\sigma _{\psi (\mathrm{2S})}=1.13\pm 0.07\pm 0.19\) \(\upmu \)b, \(\sigma _{\Upsilon (\mathrm{1S})}=54.2\pm 5.0\pm 6.7\) nb and \(\sigma _{\Upsilon (\mathrm{2S})}=18.4\pm 3.7\pm 2.9\) nb, where the first uncertainty is statistical and the second one is systematic, assuming no quarkonium polarization. Measuring both \({\mathrm{J}/\psi }\) and \({\psi (\mathrm{2S})}\) cross sections with the same apparatus and the same data set allows deriving the fraction of inclusive \({\mathrm{J}/\psi }\) that comes from \({\psi (\mathrm{2S})}\) decay with reduced systematic uncertainties: \(f^{{\psi (\mathrm{2S})}}=0.103\pm 0.007\pm 0.008\). Similarly, the fraction of inclusive \(\Upsilon \)(1S) that comes from \(\Upsilon \)(2S) decay is \(f^{\Upsilon \mathrm{(2S)}}=0.090\pm 0.027\pm 0.005\).

These results are in good agreement with measurements from the LHCb experiment over similar \({p_\mathrm{T}}\) and \(y\) ranges. For \(\Upsilon \)(1S) and \(\Upsilon \)(2S) they complement the measurements from CMS at mid-rapidity (\(|y|<2.4\)). They are also in good agreement with NRQCD calculations for which the matrix elements have been fitted to data sets from Tevatron, RHIC and LHC, among others. In the CSM, both LO and NLO calculations underestimate the data at large \({p_\mathrm{T}}\) as it was the case at lower energy. The addition of the leading-\({p_\mathrm{T}}\) NNLO contributions helps to reduce this disagreement at the price of larger theoretical uncertainties. LO calculations reproduce qualitatively the data at low \({p_\mathrm{T}}\) and the rapidity dependence of the \({p_\mathrm{T}}\)-integrated cross sections.

7 Integrated and differential quarkonia yields and cross sections

In the following tables, the systematic uncertainties correspond to the quadratic sum of the different sources presented in Sect. 3.3 without the contribution from the luminosity and the branching ratios. \({A\epsilon }\) corresponds to the acceptance times efficiency factor (Tables 2, 3, 4, 5 and 6).

Table 2 Integrated raw yields and cross sections of \({\mathrm{J}/\psi }\), \({\psi (\mathrm{2S})}\), \(\Upsilon (1S)\) and \(\Upsilon (2S)\) for \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV
Table 3 Differential raw yields and cross sections of \({\mathrm{J}/\psi }\) for \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV
Table 4 Differential raw yields and cross sections of \({\psi (\mathrm{2S})}\) for \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV
Table 5 Inclusive \({\psi (\mathrm{2S})}\)-to-\({\mathrm{J}/\psi }\) cross section ratios as a function of \(p_{{\mathrm T}}\) and y for pp collisions at \(\sqrt{s}=7\) TeV
Table 6 Differential raw yields and cross sections of \(\Upsilon \)(1S) for \(\mathrm{pp}\) collisions at \(\sqrt{s}=7\) TeV