Synaptic basis of a sub-second representation of time in a neural circuit model

Barri, A.; Wiechert, M. T.; Jazayeri, M.; DiGregorio, D. A.

doi:10.1038/s41467-022-35395-y

Download PDF

Article
Open access
Published: 22 December 2022

Synaptic basis of a sub-second representation of time in a neural circuit model

Nature Communications volume 13, Article number: 7902 (2022) Cite this article

5071 Accesses
5 Citations
29 Altmetric
Metrics details

Subjects

Abstract

Temporal sequences of neural activity are essential for driving well-timed behaviors, but the underlying cellular and circuit mechanisms remain elusive. We leveraged the well-defined architecture of the cerebellum, a brain region known to support temporally precise actions, to explore theoretically whether the experimentally observed diversity of short-term synaptic plasticity (STP) at the input layer could generate neural dynamics sufficient for sub-second temporal learning. A cerebellar circuit model equipped with dynamic synapses produced a diverse set of transient granule cell firing patterns that provided a temporal basis set for learning precisely timed pauses in Purkinje cell activity during simulated delay eyelid conditioning and Bayesian interval estimation. The learning performance across time intervals was influenced by the temporal bandwidth of the temporal basis, which was determined by the input layer synaptic properties. The ubiquity of STP throughout the brain positions it as a general, tunable cellular mechanism for sculpting neural dynamics and fine-tuning behavior.

Cellular-resolution mapping uncovers spatial adaptive filtering at the rat cerebellum input stage

Article Open access 30 October 2020

Stefano Casali, Marialuisa Tognolina, … Egidio D’Angelo

Local synaptic inhibition mediates cerebellar granule cell pattern separation and enables learned sensorimotor associations

Article 06 February 2024

Elizabeth A. Fleming, Greg D. Field, … Court Hull

Model simulations unveil the structure-function-dynamics relationship of the cerebellar cortical microcircuit

Article Open access 14 November 2022

Robin De Schepper, Alice Geminiani, … Egidio D’Angelo

Introduction

The neuronal representation of time on the sub-second timescale is a fundamental requisite for the perception of time-varying sensory stimuli, generation of complex motor plans, and cognitive anticipation of action^1,2,3,4. But how neural circuits acquire specific temporal contingencies to drive precisely timed behaviors remains elusive. A progressive increase in firing rate (“ramping”) towards a threshold can represent different elapsed times by altering the slope of the ramping behavior. Elapsed time can also be encoded by a population of neurons that fire in a particular sequence (“time cells”)^5,6,7,8. Sequential synaptic connections between neurons (synfire chains) can explain the neural sequences representing bird song⁹ and contribute to time delays necessary to cancel self-generated sensory stimuli in the electrosensory lobe of mormyrid fish¹⁰. Temporal dynamics of neural population activity can also be reproduced by training recurrent neural network models^11,12,13. Nevertheless, the search for a candidate mechanism for generating a temporal reference (biological timer) for neural dynamics is an ongoing challenge.

Short-term synaptic plasticity (STP) is the rapid change in synaptic strength occurring over tens of milliseconds to seconds that is thought to transform presynaptic activity into distinct postsynaptic spike patterns¹⁴. Depression and facilitation of synaptic strength can act as low-and high-pass filters, respectively¹⁵, and synaptic depression can mediate gain modulation^16,17. Network models of neocortical connectivity exhibit improved temporal pattern discrimination when augmented with STP¹⁸. Within recurrent neural networks, the long timescales of cortical synaptic facilitation provide the substrate for working memory¹⁹. Finally, low-gain recurrent network models that include STP also show enriched neural dynamics and generate neural representations of time²⁰. However, experimental evidence of STP-dependent circuit computations is rare and is associated mainly with sensory adaptation²¹.

The cerebellar cortex is a prototypical microcircuit known to be important for generating temporally precise motor²² and cognitive behaviors^23,24,25,26 on the sub-second timescale. It receives mossy fibers (MFs) from various sensory, motor and cortical areas. MFs are thought to convey contextual information and converge onto granule cells (GCs), the most numerous neuron in the brain. The excitatory GCs project onto the inhibitory molecular layer interneurons and Purkinje cells (PCs). PCs, being the sole output neurons of the cerebellar cortex, inhibit neurons in the deep cerebellar nuclei. According to the Marr-Albus-Ito model of cerebellar cortical circuit computations, precisely timed Purkinje cell activity can be learned by adjusting the synaptic weights formed by GCs with differing activity patterns^27,28. This largely feed-forward circuitry has been proposed to learn the temporal contingencies required for prediction from neural sequences across the population of GCs within the input layer²⁹. The synapses between MFs and GCs are highly variable in their synaptic strength and STP time course³⁰. Therefore, we hypothesized that STP of MF-GC synapses could be used as internal timers for a population clock within the cerebellar cortex to generate neural dynamics necessary for temporal learning.

To elaborate this hypothesis, we modeled the cerebellar cortex as a rate-based two-layer perceptron network that includes realistic MF-GC connectivity and STP dynamics. The model reproduces learned PC activity associated with a well-known temporal learning task: delay eyelid conditioning³¹. The timescales of STP determined the temporal characteristics of the GC population activity, which defined the temporal window of PC temporal learning. The width of PC pauses scaled proportionally with the learned time intervals, similar to experimentally observed scalar variability of the eyelid conditioning behavior³². Additionally, we found that STP-driven GC activity was well suited to implement a Bayesian estimator of time intervals³³. We propose that within neural circuits, dynamic synapses serve as tunable clocks that determine the bandwidth of neural circuit dynamics and enable learning temporally precise behaviors.

Results

Cerebellar cortex model with STP

The cerebellar cortex can be modeled as a two-layer perceptron that performs pattern separation of static inputs^27,28,34,35. Cerebellar models of temporal processing are generally supplemented with additional mechanisms that generate temporally varying activity patterns in the GC layer^10,29,36,37. To test whether heterogeneous MF-GC STP is sufficient to support temporal learning, we implemented STP of the MF-GC synapse in a simplified cerebellar cortex model, hereafter referred to as CCM_STP. This model deliberately omits all other potential sources of temporal dynamics. In particular, in most of the simulations presented here, we did not include recurrent connectivity (Fig. 1b). STP was simulated using a parallel vesicle pool model of the MF-GC synapse, similar to ref. 38. It comprises two readily releasable and depletable vesicle pools, synaptic facilitation, and postsynaptic desensitization. To reproduce the observed functional synaptic diversity, we set vesicle fusion probabilities (p_v), synaptic pool sizes (N), and synaptic facilitation to match the relative strengths, paired-pulse ratios, and transient behaviors across five different types of synapses that were previously characterized³⁰ (Fig 1a₂–a₆). Importantly, the longest timescale in CCM_STP is associated with a 2 s vesicle refilling time constant of the slow vesicle pool (τ_ref = 2s, Fig. 1a₁). To capture depression over long timescales^38,39, we introduced a phenomenological parameter (p_ref = 0.6) that effectively mimics a simplified form of activity-dependent recovery from depression (see Methods).

**Fig. 1: Cerebellar cortex model with short-term synaptic plasticity within the input layer (CCM_STP).**

The CCM_STP consisted of firing rate units representing MFs, GCs, a single PC, and a single molecular layer interneuron, i.e., each neuron’s activity was represented by a single continuous value corresponding to an instantaneous firing rate. Each GC received 4 MF synapses, randomly selected from the different synapse types according to their experimentally characterized frequency of occurrence³⁰. Importantly, we associated different synapse types with different MF firing rates (Fig. 1b, left panels, see Methods). High p_v MF inputs were paired with high average firing rates (primary sensory groups 1, 2) and low p_v synapses with MF inputs with comparatively low average firing rates (secondary/processed sensory groups 3, 4, 5), according to experimental observations^40,41. We will reconsider this relationship below.

To examine CCM_STP network dynamics, input MF activity patterns were sampled every second from respective firing rate distributions shown in Fig. 1b. Each change in MF patterns evoked transient changes in MF-GC synaptic weights, which in turn generated transient GC firing rate responses that decayed at different rates to a steady-state (Fig. 1c). Similar to experimentally recorded PC responses to sensory stimuli in vivo⁴², switches between different MF activty patterns also generated heterogeneous transient changes in the PC firing rate, whose directions and magnitudes were controlled by the ratio of the average excitatory to inhibitory weight (Fig. 1c, bottom). In contrast, when MF-GC STP was removed, the transient GC and PC responses disappeared (Fig. 1d). The amplitude of the firing rate transients increased as the difference from one MF pattern to the next increased, similar to previous theoretical work¹⁶. Sequential delivery of uncorrelated MF firing patterns in CCM_STP (Fig. 1e) generated GC and PC transients with broadly distributed amplitudes (Fig. 1f1,2), which were progressively reduced as the relative change in MF rate decreased (Fig. 1g). Thus, dynamic MF-GC synapses allow both GCs and PCs to represent the relative changes in sensory stimuli.

Simulating PC pauses during eyelid conditioning

We next explored whether MF-GC STP diversity permits learning of precisely timed PC pauses associated with delay eyelid conditioning, a prototypical example of a cerebellar cortex-dependent learning. In this task, animals learn to use a conditioned stimulus (CS) to precisely time eyelid closure in anticipation of an aversive unconditioned stimulus (US). This eyeblink is driven by a preceding decrease in PC firing rates^31,43 (Fig. 2a). Since the CS is typically constant until the time of the US and a precisely timed eyelid response can be learned even if the CS is replaced by direct and constant MF stimulation^44,45, we modeled CS delivery in the CCM_STP by an instantaneous switch to a novel MF input pattern that persists over the duration of the CS (Fig. 2a). Most GC activity transients exhibited a characteristic rapid increase or decrease in firing rate, followed by an exponential-like decay in firing rate (Fig. 2c). In contrast to other models of eyelid conditioning²⁹, the activity of most GCs in the CCM_STP peaked only once, occurring shortly (<50ms) after the CS onset (Fig. 2c). However, the distribution of GC firing rate decay times across the population was highly variable with a fraction of GCs showing decay times to 10% of the transient peak as long as 700 ms (Fig. 2c, d).

**Fig. 2: Simulating Purkinje cell pauses during eyelid conditioning.**

To test whether the GC population dynamics could act as a basis set for learning the precisely timed PC firing rate pauses known to drive the eyelid response, we subjected the GC-PC synaptic weights to a gradient descent-based supervised learning rule⁴⁶. The rule’s target signal consisted of a square pulse (zero firing rate at a specific time bin) at the designated time of the PC firing rate pause (Fig. 2e, dotted line). In the course of learning, there was a progressive acquisition of a pause in the PC firing rate (Fig. 2e). However, without MF-GC STP, the PC pause did not develop (Fig. 2e, pink). We tested learning of different delay intervals ranging from 25 ms to 700 ms and found that PC pauses could be generated for all delays. The PC pause amplitude and temporal precision (time and width) decreased with increasing CS-US delays (Fig. 2f), reminiscent of the shape of PC simple-spike pauses recorded during eyelid conditioning³¹.

Why might the learned PC pause amplitude and temporal precision be reduced for longer CS-US delays? The parameters associated with the learning algorithm (e.g. the number of iterations) are identical for each CS-US delay. The state of the GC population activity, in contrast, changes throughout the CS. Once all GC activity dynamics reach steady-state, temporal discrimination by PCs is diminished, and interval learning becomes impossible. In other words, for temporal learning to be effective, changes in GC firing rates must be prominent over the relevant timescale. Indeed, eyeblink conditioning simulations where slow or fast GCs were removed, the efficiency of generating PC pauses for short and long intervals were reduced (Fig. S2). CCM_STP simulations thus demonstrate that a GC temporal basis generated by MF-GC STP is sufficient to reproduce the cerebellar cortex computation underlying delay eyelid conditioning and suggests that the timescale of GC dynamics influences the timescale of behavioral learning.

Analysis of the synaptic mechanism underlying GC transient responses using a reduced model

PC temporal learning requires transient GC activity responses, which in our model only arise from STP at the MF-GC synapse^30,39. How are the dynamics of synapses and GCs determined by quantal and firing rate parameters? The complexity of the full CCM_STP with many interacting parameters makes it difficult to assess the effect of each synaptic parameter. To overcome this challenge, we developed a reduced MF-GC synapse model, which was analytically solvable for an instantaneous and persistent switch of MF rates. This allowed us to identify the key computational building blocks of CCM_STP and explore how they control the overall behavior of the model. Specifically, we omitted short-term facilitation and postsynaptic desensitization and reduced the synaptic model to a single population of high p_v synapses (“drivers”³⁰) and a single population of low p_v synapses (“supporters”³⁰), each with a fast and a slow refilling ready-releasable pool (Fig. 3b), thus obtaining a model where STP results from vesicle depletion only. Each GC received exactly two driver and two supporter MF inputs with random and pairwise distinct identities (Fig. 3a).

**Fig. 3: MF-GC synaptic time constants and their relative weights determine the time course of GC responses.**

In this reduced model, an instantaneous and persistent switch of MF firing rates generates an average postsynaptic current (I_syn(t)) for each vesicle pool that is remarkably simple. It features a sharp transient change, followed by a mono-exponential decay to a steady-state synaptic current amplitude, A_s, (Fig. 3c) and can be generally expressed as

$${I}_{{syn}}(t)={A}_{s}+{A}_{t}{e}^{-\frac{t}{{\tau }_{{syn}}}}$$

(1)

Here, A_s is a time-invariant component and ${A}_{t}{e}^{-\frac{t}{{\tau }_{{syn}}}}$ is a transient component with synaptic relaxation time constant τ_syn (Fig. 3c) and amplitude A_t. This transient component determines the synapse’s ability to encode the passage of time.

The solution of the synaptic dynamics model reveals the crucial dependence of τ_syn and A_t on the presynaptic and firing rate parameters (see “Methods”):

$${\tau }_{{syn}}=\frac{{\tau }_{{ref}}}{1+\alpha {p}_{v}m}$$

(2)

Here, $\alpha={\tau }_{{ref}}(1-{p}_{{ref}})$, m is the MF firing rate persisting during the CS, and the synaptic parameters p_v, τ_ref, and p_ref are defined as above. Equation (2) shows that τ_syn is inversely related to the MF firing rate during the CS and the release probability, p_v(Fig. 3d). Intuitively, this is because higher p_v and/or m lead to a higher rate of synaptic vesicle fusion, and hence depletion, driving the synaptic response amplitude to steady-state faster. Conversely, slow time constants arise from low p_v and/or low m with the maximum τ_syn being equal to the vesicle recovery time τ_ref.

The transient amplitude A_t is given by

$${A}_{t}=\frac{N{p}_{v}m}{1+\alpha {p}_{v}m}\frac{\alpha {p}_{v}(m-{m}_{{pre}})}{1+\alpha {p}_{v}{m}_{{pre}}}$$

(3)

Here, N is the number of release sites. Importantly, and in contrast to τ_syn, A_t depends on the presynaptic MF firing rate before the CS, m_pre, and the difference between the MF firing rates before and during the CS. In particular, for both rates sufficiently high, A_t becomes a linear function of the normalized difference between m and m_pre, i.e. ${A}_{t}\propto (m-{m}_{{pre}})/{m}_{{pre}}$ (Fig. 3e). A_t is sensitive to the relative and not the absolute change in presynaptic rate, as observed previously¹⁶.

The transient GC activity results from the sum of eight synaptic transient current components, (i.e. four inputs, each with two pools). To illustrate the interplay between the A_t and τ_syn, we compared the behavior of each synaptic input for a selected fast and slow GC (Figs. 3f, g). Generally, synaptic inputs from supporters display longer transient currents than synaptic inputs from drivers (Figs. 3f, g, middle panels) due to their lower firing rates (Figs. 3f, g, left panels) and low p_v (Fig. 3b). A_t is largely determined by the relative change in the respective presynaptic MF firing rates, $(m-{m}_{{pre}})/{m}_{{pre}}$(Fig. 3f and g, left panels). Thus, “fast” GCs are generated when the high p_v driver inputs exhibit large relative changes in firing rates (Fig.3f). “Slow” GCs are generated from synapses with a small relative change in driver firing rates, but large relative supporter (low p_v) rate changes paired with low supporter rates during the CS (Fig.3g). Taken together, in the reduced model τ_syn and A_t determine the effective timescales of the GC responses and are explicitly influenced by quantal parameters, synaptic time constants, and the diversity of MF firing rates.

The explicit influence of synaptic parameters on temporal learning

Our simulations suggest that delay eyelid conditioning across multiple delays necessitates GC population dynamics spanning multiple timescales (Fig. 2, Fig. S2). Since individual GC firing rate dynamics depend on the A_t and τ_syn of their synaptic inputs (Fig. 3), this implies that 1) the spectrum of τ_syn available to the network should cover the relevant timescales and 2) the A_t associated with different τ_syn, which can be understood as the relative weights of synaptic transient components, should be of comparable magnitude across τ_syn. To illustrate these points, we used the reduced CCM_STP to simulate eyelid response learning with different firing rate properties and examined the relationship between τ_syn, A_t, the GC temporal basis, and learning outcome. Importantly, since A_t and τ_syn are not independent, the quantity of interest is their joint distribution. We initially set up a reference simulation by choosing MF firing rate distributions such that the diversity of GC transient responses and the temporal learning performance (Fig. 4a) were comparable to the CCM_STP with native synapses (Fig. 2f). For this case, the joint distribution shows that A_t decreased with increasing ${\tau }_{{syn}}$. Note that A_t is maximal when the MF firing rates increased from zero m_pre to a finite m upon CS onset, maximizing m-m_pre (Eq. 3, see also Fig. S3b, c). We quantified learning accuracy by calculating an error based on 1) the PC response amplitude, 2) its full width at half maximum and 3) the temporal deviation of its minimum from the target delay (Fig. 4a, fifth panel, Fig. S2a, see “Methods”"). Importantly, the degradation in temporal precision of the learned PC pauses for longer CS-US intervals was concomitant with the reduction of the A_t associated with longer τ_syn (Fig. 4a). This suggests that inspection of the joint distribution of τ_syn and A_t can provide insight into the temporal learning performance of the network.

**Fig. 4: Learning performance depends on MF firing rate distributions.**

When changing only the mean firing rate of supporter MFs (μ_S) from 25 Hz to 70 Hz, the synaptic time constants were shortened due to the inverse relationship between τ_syn and the mossy-fiber firing rate m (Fig. 4b, second panel). Consequently, and expectedly, the distribution of GC firing rate decay times was shifted to shorter values, and learning performance was degraded for all CS-US intervals, except the 25 ms delay (Fig. 4b). Lowering the mean firing rate of driver MFs (μ_D) from 200 Hz to 100 Hz and increasing the standard deviation (σ_D) from 15 Hz to 50 Hz, led to an overall increase of the time constants contributed by driver synapses, as well as an increase in their relative weight (A_t; Fig. 4c, second panel, marginals). As a result, the joint probability distribution shows a shift towards faster weighted time constants. It also follows that GC transients are accelerated, and learning precision is decreased for long CS-US intervals. Removing synaptic currents originating from driver synapses only disrupted learning PC pauses for the shortest CS-US interval (Fig. 4d). Reduced model simulations with systematic parameter scans across a wide range of MF firing rate distributions for both synapse types suggested that good synaptic regimes for temporal learning are achieved when driver synaptic weights are comparable or smaller than those of the slow supporting synapses (Fig. S4).

All the results taken together suggest that optimal learning occurs when the spectrum of τ_syn available to the network covers behaviorally relevant timescales with balanced relative weights (A_t). Synaptic and GC activity timescales can therefore be tuned by simultaneously modulating p_v and the absolute scale of m to provide the necessary distribution of τ_syn, whereas the relative change of MF firing can be used to tune the weight (A_t) of τ_syn.

Firing rate and synaptic parameters that improve temporal learning performance

Thus far, we used the reduced model to explore how MF firing rates and synaptic properties influenced the timescales of GC activity and the temporal precision of learned PC pauses. The model, however, was constrained by (1) the use of only two synapse types, (2) fixed release probabilities (p_v), (3) MF firing rates that were consistently higher for high p_v synapses than their low p_v counterparts, and (4) an equal number of driver and supporter synapses. We next considered how the relaxation of these assumptions and specific parameter combinations could influence the precision of learned PC pauses. In particular, we simulated reduced models where, in addition to MF firing rates, p_v was sampled from continuous distributions.

Equation (2) suggests that a positive correlation between p_v and m should broaden the distribution of τ_syn and broaden the time window of learning. Specifically, we expect learning performance to improve when high(low) firing rate MFs are, on average, paired with high(low) p_v synapses. We chose uniformly distributed p_v and MF firing rates and split both of these equally into two contiguous groups (Fig. 5a). We performed training simulations in which we paired high p_v(driver) synapses with high firing rates, or we paired low p_v (supporter) synapses and high MF firing rates, and vice versa (Fig. 5b). Formally, this is equivalent to adjusting the rank correlation (c_rk) between the p_v category (supporter or driver) and the m category (high or low, Fig. 5b). We found better learning performance when p_v and m were positively correlated (Fig. 5c, Fig. S5). Indeed, primary vestibular afferents that form driver-like synapses have been shown to have high firing rates^30,40 while supporter-like secondary vestibular afferents have low firing rates^30,41.

**Fig. 5: Correlating release probability and MF firing rates improves learning performance.**

Inspired by the number of synapse types observed experimentally³⁰, we augmented the number of synapse groups from 2 to 5 without changing the p_v and firing rate distributions (Fig. 5d). We reasoned that the introduction of a larger number of MG-GC synapse types would in principle permit a stronger linear correlation between p_v and m to occur (Fig. 5e), leading to a broader τ_syn spectrum (not shown) and an improved learning of PC pauses. Indeed, for high c_rk, the learning performance of the five group CCM_STP was better than that of the two-group CCM_STP (compare Fig. 5c and Fig. 5f, Fig. S5). These simulation results suggest that good temporal learning performance of CCM_STP can be achieved not simply by generating variability in parameters, but by structuring, or tuning, the relationship between p_v and m.

Equipped with an understanding of how the synaptic and MF rate parameters can generate different synaptic time constants, we set out to further improve the temporal learning for longer CS-US delays by adjusting the variance of the clustered MF rate distributions. To increase the weighting of long τ_syn, we inversely scaled the variance of the MF firing rate distributions with respect to the mean firing rate (Fig. 5g), thereby increasing A_t (Fig. 4c). As expected, PC pause learning was better than when using equal-width MF groups (Fig. 5g, Fig. S5). An additional enhancement of learning performance could be achieved by adding a small fraction of zero-rate MFs to the lowest group (Figs. 5g, 6% zero MFs, same fraction as in Fig. 4a), which provide maximal A_t(see Fig. 4). Finally, taking into account the experimental finding that low p_v synapses are more frequent than high p_v synapses³⁰, we doubled the fraction of MFs and release probabilities in the lowest group, resulting in the best performance of all versions of CCM_STP tested here (Fig. 5g). These simulations show that positive correlations between vesicle release probability and presynaptic firing rate broaden the temporal bandwidth of circuit dynamics and improve temporal learning.

STP permits learning optimal estimates of time intervals

Humans and animals have an unreliable sense of time and their timing behavior exhibits variability that scales linearly with the base interval⁴⁷. Previous work has found that humans seek to optimize their time interval estimates by relying on their prior expectations. A canonical example of this optimization is evident in the so-called ready-set-go-task⁴⁸ in which subjects have to measure and subsequently reproduce different time intervals. It has been shown that when the intervals are drawn from a previously learned probability distribution (i.e., prior), subjects integrate their noisy measurements with the prior to generate optimal Bayesian estimates. For example, when the prior distribution is uniform, interval estimates are biased towards the mean of the prior, and biases are generally larger for longer intervals that are associated with more variable measurements (Fig. 6c). Such Bayes-optimal temporal computations are evident in a wide range of timing tasks such as time interval reproduction⁴⁸, coincidence detection⁴⁹, and cue combination⁵⁰.

**Fig. 6: STP-generated temporal basis enables the computation of Bayesian time-interval estimates.**

A recent study developed a cerebellar model called TRACE for temporal Bayesian computations³³. TRACE implements Bayesian integration by incorporating two features. First, it assumes that GCs form a temporal basis set that exhibits temporal scaling. This feature accounts for the scalar variability of timing. Second, it assumes that prior-dependent learning alters the GC-PC synapses. This feature allows the dentate nucleus neurons (DNs) downstream of PCs to represent a Bayesian estimate of the time interval.

In our analysis of eyelid conditioning (Fig. 2), we showed that CCM_STP generates PC firing rate pauses whose width and amplitude linearly scale with time (Fig. 6a). Therefore, we reasoned that CCM_STP might have the requisite features for Bayesian integration. To test this possibility quantitatively, we presented our model with variable intervals drawn from various prior distributions. The interval was introduced as a tonic input to MFs, similar to the CS in the eyelid simulations. The onset of this tonic input caused an abrupt switch of the MF input rates that persisted over the course of a trial. During learning, we subjected the model to intervals sampled randomly from a desired prior distribution.

We tested CCM_STP with five different uniform distributions of ready-set intervals (25-150 ms, 50–200 ms, 100–300 ms, 200–400 ms, 300–500 ms), resulting in PC pauses that broadened for longer interval distributions, and integrated DN activity that could easily match the Bayesian least-square model³³ by adjusting a single parameter, the Weber fraction w_weber (see “Methods”"; Fig. 6d, h). The reduced model interval estimates were more similar to the Bayesian estimates than for CCM_STP with native synaptic parameters, especially for the 200–400 ms and 300–500 ms intervals (Fig. 6h–k). Nevertheless, in both cases the CCM_STP simulations show that a GC basis generated by MF-GC STP is sufficient for driving Bayesian-like learning of time intervals spanning several hundreds of milliseconds. It should be noted that our GC temporal basis was not explicitly constructed to accommodate scalar properties. Nevertheless, as in the TRACE model, we observed that interval estimates were biased towards the mean and that these biases were larger for longer intervals. These results suggest that a GC basis set generated from the diverse properties of native MF-GC synapses likely exhibits a scalar property necessary for generating optimally timed behaviors.

Discussion

In order to generate temporally precise behaviors, the brain must establish an internal representation of time. This theoretical study posits that the diversity of synaptic dynamics is a fundamental mechanism for encoding sub-second time in neural circuits. By using eyelid conditioning as a benchmark task for the CCM_STP, we elucidated the conditions under which the variability in MF-GC synaptic dynamics generates a GC temporal basis set that represents elapsed time and is sufficient for temporal learning on a sub-second scale. According to David Marr’s levels of analysis of information processing systems⁵¹, our study connects all three levels, from the circuit computation (learning timed PC pauses) to its underlying algorithm (learning with a temporal basis set), and the fundamental biological mechanism (STP diversity).

STP diversity as a timer for neural dynamics

Cerebellar adaptive filter models posit that GCs act as a heterogeneous bank of filters that decompose MF activity into various time-varying activity patterns - or temporal basis functions - which are selected and summed by a synaptic learning rule at the PC to produce an output firing pattern that generates behaviors that minimize error signals arriving via climbing fibers^36,37. CCM_STP can be viewed as an adaptive filter in which MF-GC synapses act as non-linear elements whose filter properties are determined by the experimentally defined synaptic parameters and modulated by the presynaptic MF firing rates.

Recent theoretical work proposes that a scale-invariant neuronal representation of a temporal stimulus sequence can be obtained by using a population of leaky integrators that produce exponentially decaying neural activity transients⁵². Indeed, exponential-like activity has been observed in the entorhinal cortex—a region that projects to the hippocampus⁶. The exponential-like population activity is reminiscent of the GC temporal basis set in CCM_STP following persistent firing rate changes. However, the MF-GC synaptic inputs are always a mixture of multiple exponential components. Nevertheless, our work suggests that STP could be a plausible biological mechanism explaining exponential dynamics in neuronal populations⁶ and merits further theoretical and experimental investigation.

The use of an instantaneous and persistent change in MF activity was motivated by the fact that eyelid conditioning can be achieved if the CS is replaced with a constant MF stimulation^44,45,53. Recent evidence from pons recordings during reaching suggests that MF activity can be persistent with little dynamics⁵⁴. For dynamic changes in MF rates, STP is likely to generate outputs that are phase-shifted and/or the derivatives of their input⁵⁵. Using heterogeneity of MF-GC STP as a mechanism for adaptive filtering, even time-varying inputs will effectively be diversified within the GC layer and improve the precision of temporal learning.

Synapses within the prefrontal cortex⁵⁶ and at thalamocortical connections⁵⁷ exhibit diverse firing rate inputs and release probabilities⁵⁸, generating synaptic dynamics that could drive complex neural dynamics. Reminiscent of PC firing rate pauses during eyelid conditioning, hippocampal time cells are thought to be generated by a linear combination of exponentially decaying input activity patterns from upstream entorhinal cortical neurons⁶. More generally, it has been shown that STP also provides a critical timing mechanism within a recurrent neural network model of neocortical activity by facilitating temporal pattern descrimination¹⁸. We note that all synapses in this study featured only a single STP timescale, but we expect that the addition of heterogeneous STP would further diversify the network’s dynamics and enhance its computational properties. Thus, these previous studies and our present study underscore the proposal that STP diversity is a tunable timing mechanism for generating neural dynamics across brain regions.

Timing mechanisms in the cerebellar cortex

In addition to MF-GC STP, the cerebellar cortex is equipped with multiple mechanisms potentially enabling temporal learning⁵⁹. Indeed, time-varying MF inputs could directly provide a substrate for learning elapsed time⁶⁰, but whether the observed diversity of MF firing is sufficient to mediate temporally precise learning is unknown and merits further exploration. Within the cerebellar cortex, unipolar brush cells are thought to provide delay lines to diversify GC activity patterns^10,61,62, but these cell types are rare outside the mammalian vestibular cerebellum. The diversity of GC STP⁶³ could add to the diversity of the effective GC-layer basis set⁶⁴. Consistent with the importance of MF-GC STP, delay eyelid conditioning was selectively altered due to the loss of fast EPSCs in AMPAR KO mice⁶⁵. Simulations including realistic NMDA and spillover dynamics⁶⁶ can further enrich the temporal scales available to the network⁶⁷. It would be of particular interest to investigate the role of MF-GC STP in the context of recurrent GC-Golgi-Cell-cell network models that have been shown to generate rich GC temporal basis sets^12,29. Finally, we note that MF-GC STP and other timing mechanisms described above are not mutually exclusive but presumably act in concert with the diverse intrinsic properties of GCs⁶⁸ and PCs⁶⁹ to cover different timescales of learning or increase mechanistic redundancy.

Predictions of the CCM_STP

Our theory makes several testable predictions. The transient response amplitude of PCs, which is proportional to the relative change in firing rate, can serve as a detector of rapid changes in MF firing patterns (novelty) and thus amplify pattern discrimination similar to that demonstrated for synapse-dependent delay coding³⁰. Consistent with this prediction, single whisker deflections have been shown to generate transient PC activity⁴².

CCM_STP predicts that persistent changes in MF activity would generate exponential-like GC activity profiles (Figs. 2, 4). However, although the majority of simulated GCs shown here are active at the onset of the CS, this is not a necessary feature of CCM_STP. When we included a single, average-subtracting Golgi cell (possibly representing the “common mode” of Golgi Cell population activity⁶⁴), more GCs showed delayed onset firing and the variability of onset and peak times (Fig. S6). This did not affect the learning performance of simulated delay eyelid conditioning (Fig. S6). Note that our implementation of Golgi cell feedback is simplified and does not account for reciprocal inhibition between multiple Golgi cells, which in simulations has also been shown to generate diverse GC activity^12,29. To test these predictions, MFs could be driven at constant rates using direct electrical or optogenetic stimulation of the cerebellar peduncle in vivo or the white matter in acute brain slices, with and without intact Golgi cell inhibition. Unfortunately, high-temporal resolution population recordings of GCs are challenging due to the small size of GC somata. In the future, small impendence silicon probe recordings⁷⁰ or ultra-fast optical indicators⁷¹ might permit experimentally testing our hypotheses. If successful, we predict that the time course of GC responses should be diverse and exponential-like, with prominent delayed activity in some granule cells when Golgi cells are intact. Furthermore, decreasing or increasing the MF firing rate should in turn slow or accelerate GC responses, respectively. Finally, for complex behavioral experiments in which the MF activity is dynamic (and measurable), one could examine which circuit connectivity of the CCM_STP best reproduces the measured GC activity.

The CCM_STP is one of the few network models directly linking quantal synaptic parameters and presynaptic activity dynamics to population activity dynamics and temporal learning. Figures 3 and 4 show that the relative weight and temporal span of synaptic time constants dictate the distribution of GC firing rate decay times and, in turn, the timescales of temporal learning. Analytical solutions for simple synapse models (Eq. (3)) provide insight into how synaptic parameters influence STP. For example, high levels of correlation between p_v and m, coupled with balanced relative weights of the synaptic time constants, generated a learning performance superior to the native synapses (Fig. 5d). Therefore, CCM_STP predicts that MFs forming driver synapses (high p_v) would have a high baseline and stimulated firing rates, while MFs forming supporter synapses (low p_v) would exhibit low baseline and stimulated firing rates, albeit with large relative changes in firing rates. Indeed, vestibular neurons, which have been shown to exhibit high firing rates^72,73, produced MF-GC synapses with high release probability³⁰. In the C3 zone of the anterior lobe in cats, specific firing rates were associated with different MF types⁷⁴. It is tempting to hypothesize that nature tunes presynaptic activity and synaptic dynamics (perhaps by homeostatic or activity-dependent mechanisms) in order to preconfigure the window of temporal associations required for a particular behavior.

Choice of the cerebellar learning rule

The learning rule we used here was adapted from a previous modeling study that investigated cerebellar adaptation of the vestibular ocular reflex and was argued to be biologically plausible⁷⁵. This synaptic weight update rule is mathematically equivalent to a gradient descent in which the error magnitude is transmitted via the climbing fiber⁷⁵. Consequently, CCM_STP learning rule features graded climbing-fiber responses and a gradual reduction in climbing-fiber spiking that is concomitant with the progression of learning. These phenomena have been observed experimentally^43,76. Moreover, a recent study that thoroughly investigated the role of the climbing fiber spike in cerebellar learning found that the GC and climbing-fiber spike pairings necessary for the induction of long-term depression/potentiation under physiological conditions are compatible with a stochastic gradient descent rule⁴⁶. The CCM_STP learning rule can be seen as a deterministic variant of this.

Synaptic implementation of a Bayesian computation

Bayesian theories of behavior provide an attractive framework for understanding how organisms, including humans, optimize time perception and precise actions despite the cumulative uncertainty in sensory stimuli, neural representations, and generation of actions^48,77. We found that CCM_STP could generate biased time estimates consistent with Bayesian computations. In general, the magnitude of biases for a Bayesian agent depends on the magnitude of timing variability (i.e., Weber fraction). In our simulations, model parameters corresponding to native synapses from the vestibular cerebellum produced biases that were optimal for a typical weber fraction of 0.12. However, CCM_STP is flexible and can be adjusted to generate optimal biases for a wide range of weber fractions. The exact relationship between model parameters and w_weber is an important question for future research. We note that the timescales of synaptic properties observed empirically in the vestibular cerebellum³⁰ are only suitable for generating optimal estimates for relatively short time intervals. Therefore, whether the synaptic mechanisms that underlie CCM_STP could accommodate timing behavior for longer timescales remains to be seen. One intriguing hypothesis is that synaptic parameters in different cerebellar regions are tuned to generate optimal estimates for different time intervals, similar to the timing variability observed for cerebellar long-term synaptic plasticity rules⁷⁸.

Methods

MF-GC synapse model

The synaptic weight between the jth MF and the ith GC is denoted by W_ij. The firing rate of the jth MF is represented by m_j(t) and the average current per unit time transmitted by the synapse between GC i and MF j is

$$\begin{array}{c}{I}_{{syn},{ij}}(t)={W}_{{ij}}\left(t\right)\cdot {m}_{j}\left(t\right).\end{array}$$

(1)

Time-dependent MF-GC synaptic weights were modeled using two ready-releasable vesicle pools³⁸, each according to the general form established by Tsodyks and Markram⁷⁹. A similar model was shown to accurately describe STP at the MF-GC synapse³⁸. Accordingly, one vesicle pool was comparatively small, with a high release probability and a low rate of recovery from vesicle depletion (0.5 s⁻¹), while the other was comparatively large, with low release probability and a high rate of recovery from depletion (20ms⁻¹)³⁸. We refer to these pools as’slow’ and’fast’, respectively. In the Hallermann model³⁸, the slow pool is refilled by vesicles from the fast pool. For the sake of mathematical tractability, we modeled the pools as being refilled independently (see scheme in Fig. 1).

To model vesicle depletion, we use the variables x^slow and x^fast, denoting the fraction of neurotransmitter available at the slow and fast vesicle pool. The state of the pools between GC i and MF j at time t is then described by

$${\dot{x}}_{{ij}}^{{slow}}(t) =\frac{1-{x}_{{ij}}^{{slow}}(t)}{{\tau }_{{ref}}^{{slow}}}-{u}_{{ij}}^{{slow}}(t)\cdot (1-{p}_{{ref}})\cdot {x}_{{ij}}^{{slow}}(t)\cdot {m}_{j}(t)\\ {\dot{x}}_{{ij}}^{{fast}}(t) =\frac{1-{x}_{{ij}}^{{fast}}(t)}{{\tau }_{{ref}}^{{fast}}}-{u}_{{ij}}^{{fast}}(t)\cdot {x}_{{ij}}^{{fast}}(t)\cdot {m}_{j}(t),$$

(2)

where, ${\tau }_{{ref}}^{{slow}}$ and ${\tau }_{{ref}}^{{fast}}$ are the time constants of recovery from vesicle depletion for the slow and fast pools, and are identical for all synapses. The variables ${u}_{ij}^{{slow}}(t)$ and ${u}_{{ij}}^{{fast}}(t)$ denote the pools’ respective release probabilities at time t. Experimental data show that, in response to trains of action potentials, MF-GC synapses approach synaptic steady-state transmission with a long time constant^38,39. This feature can be captured with a serial pool model³⁸ (see scheme in Fig. S7). In order to capture this behavior with a parallel pool model, we added the phenomenological parameter p_ref to the slow pool’s dynamical equation. In mechanistic terms, p_ref can be thought of as the probability of immediately refilling a synaptic docking site after the release of a vesicle. This mechanism effectively mimics a simplified form of activity-dependent recovery from depression. The final release probabilities ${u}_{{ij}}^{{slow}}(t)$ and ${u}_{{ij}}^{{fast}}(t)$ are modulated by synaptic facilitation according to

$$\begin{array}{c}{\dot{u}}_{{ij}}^{{slow}}(t) =\frac{{p}_{v,{slow}}^{\alpha }-{u}_{{ij}}^{{slow}}(t)}{{\tau }_{F}^{\alpha }}+{p}_{v,{slow}}^{\alpha }\cdot (1-{u}_{{ij}}^{{slow}}(t))\cdot {m}_{j}(t)\\ {\dot{u}}_{{ij}}^{{fast}}(t) =\frac{{p}_{v,{fast}}^{\alpha }-{u}_{{ij}}^{{fast}}(t)}{{\tau }_{F}^{\alpha }}+{p}_{v,{fast}}^{\alpha }\cdot (1-{u}_{{ij}}^{{fast}}(t))\cdot {m}_{j}(t).\end{array}$$

(3)

Here, ${p}_{v,{fast}}^{\alpha }$ and ${p}_{v,{slow}}^{\alpha }$ denote the release probabilities for the fast and slow pools, respectively, and ${\tau }_{F}^{\alpha }$ is the facilitation time constant. The index α denotes different synapse types (groups from Chabrol et al.³⁰) and varies from 1 to 5. The average number of vesicles released at any time t can be written as:

$$\begin{array}{c}{n}_{{ij}}^{{slow}}(t) ={N}_{{slow}}^{\alpha }\cdot {u}_{{ij}}^{{slow}}(t)\cdot {x}_{{ij}}^{{slow}}(t)\\ {n}_{{ij}}^{{fast}}(t) ={N}_{{fast}}^{\alpha }\cdot {u}_{{ij}}^{{fast}}(t)\cdot {x}_{{ij}}^{{fast}}(t).\end{array}$$

(4)

Postsynaptic receptor desensitization induces an additional component of depression of phasic MF-GC synaptic transmission. As both pools share the same postsynaptic target, we model desensitization via the modulation of a single variable ${q}_{{ij}}(t)$ for each synapse type, which represents the synaptic quantal size and which is influenced by the total number of vesicles released from both pools:

$$\begin{array}{c}{\dot{q}}_{{ij}}(t)=\frac{{q}_{0}-{q}_{{ij}}(t)}{{\tau }_{D}}-{\varDelta }_{D}\cdot {q}_{{ij}}(t)\cdot \frac{{n}_{{ij}}^{{slow}}(t)+{n}_{{ij}}^{{fast}}(t)}{{N}_{{tot}}}\cdot {m}_{j}(t)\end{array}$$

(5)

where ${N}_{{tot}}^{\alpha }={N}_{{slow}}^{\alpha }+{N}_{{fast}}^{\alpha }$, τ_D is the time constant of recovery from desensitization, q₀ is the quantal size in the absence of ongoing stimulation and Δ_D is a proportionality factor that determines the fractional reduction of ${q}_{{ij}}(t)$. As explained below, we set q₀ = 1, i.e. q_ij(t) is normalized. Both τ_D and Δ_D are identical across all synapse types. Finally, the total synaptic weight is equal to the sum of the contributions from both vesicle pools:

$$\begin{array}{c}{W}_{{ij}}(t)={q}_{{ij}}(t)\cdot \left({n}_{{ij}}^{{slow}}(t)+{n}_{{ij}}^{{fast}}(t)\right),\end{array}$$

(6)

Synaptic parameters for generating diverse synaptic strength and dynamics

We set the synaptic parameters of our model to reproduce the average behavior of the 5 MF-GC synapse groups which were determined in ref. 30 based on unitary response current amplitudes, pair pulse ratios, and response coefficients of variation.

The vesicle pool refilling time constants ${\tau }_{{ref}}^{{slow}}$ and ${\tau }_{{ref}}^{{fast}}$ were set to the values measured at the MF-GC synapse in ref. 38 and were identical for all synapse groups. The time constant of facilitation ${\tau }_{F}^{\alpha }$ for groups 1–4 was taken from ref. 39. The time constant of recovery from desensitization, τ_D, was set equal to the value reported in ref. 38 for all groups, and the parameters Δ_D was chosen so as to obtain the relative reduction in quantal size reported in the same ref. 38. To qualitatively account for the slow approach to steady-state transmission observed in MF-GC synapses^38,39 we set p_ref to a value of 0.6 for all synapse types.

To set the presynaptic quantal parameters, we matched model quantal parameters, q₀, N and p_v, to the average of those measured in ref. 30 for each synapse group. The estimation of the experimental values ${q}_{0,\exp }^{\alpha }$, ${N}_{\exp }^{\alpha }$ and ${p}_{v,\exp }^{\alpha }$ was carried out via multiple-probability fluctuation analysis³⁰, which assumes a single vesicle pool. To constrain the corresponding parameters of our two-pool model, we assumed:

$$\begin{array}{c}{N}_{\exp }^{\alpha } ={N}_{{tot}}^{\alpha }={N}_{{slow}}^{\alpha }+{N}_{{fast}}^{\alpha }\end{array}$$

$$\begin{array}{c}{p}_{v,\exp }^{\alpha } =\frac{{N}_{{slow}}^{\alpha }{p}_{v,{slow}}^{\alpha }+{N}_{{fast}}^{\alpha }{p}_{v,{fast}}^{\alpha }}{{N}_{{tot}}^{\alpha }}\end{array}$$

(7)

while keeping ${p}_{v,{slow}}^{\alpha } > {p}_{v,{fast}}^{\alpha }$. Since the quantal size did not significantly differ between groups³⁰, we set q₀ = 1 for all groups for simplicity. As group 4 featured almost no STP, we modeled these synapses without slow pool.

The above equations do not have a unique solution. In order to constrain the synaptic parameters further, we additionally required that the relative unitary response current amplitudes between synapse groups and their pair pulse ratios approximately equal the experimentally measured ones. To account for the fact that group 5’s pair pulse ratio is larger than one, we set τ_F = 30 ms for this group, as in ref. 30.

Finally, we extracted the relative occurrence of each synapse type from ref. 30.

A set of synaptic parameters that reproduces the behavior of the five synapse groups from ref. 30 that we used in Figs. 1, 2, and 6 is summarized in Table 1.

Table 1 Synaptic parameters used in full model

Full size table

MF firing rate parameters

MF firing rate distributions of the full CCM_STP were set according to the broad range described in the literature^{40,41,70,72,73,80,81,82,83,84}. MFs forming synapse types 1 and 2, which convey primary sensory information, were set to high firing frequencies according to experimental observations^40,41 (see Fig. 1b, left panels). In contrast, the firing rates for the other synapses types were lower^70,83. For the full model, this led to synapses with high p_v being associated with MF inputs with comparatively higher average firing rates (primary sensory groups 1, 2) and synapses with low p_v being associated with MF inputs with comparatively lower average firing rates (secondary/processed sensory groups 3, 4, 5). We chose to describe MF firing rate distributions by Gaussian distributions whose negative tails were set to zero. Means and standard deviations of the Gaussian distributions were set such that the means and standard deviations of the resulting thresholded distributions resulted in the values summarized in Table 2.

Table 2 MF firing rate parameters used in the full model

Full size table

Cerebellar cortical circuit model

The standard cerebellar cortex model with STP (CCM_STP) consists of firing rate units corresponding to 100 MFs, 3000 GCs, a single PC, and a single molecular layer interneuron (MLI). The PC linearly sums excitatory inputs from GCs and inhibition from the MLI. Each GC receives four MF synapses, randomly selected from the different synapse types according to their experimentally characterized frequency of occurrence³⁰. The synaptic inputs to the GCs and their firing rates are given by:

$${I}_{gc,i}(t) =\mathop{\sum}\limits_{j\in K}{I}_{syn,ij}(t)=\mathop{\sum}\limits_{j\in K}{W}_{ij}(t){m}_{j}(t) \\ {\tau }_{g}\,{\dot{gc}}_{i}(t) =-g{c}_{i}(t)+{\alpha }_{i}\cdot \,{{\max }}({I}_{gc,i}(t)-{\theta }_{i},0)$$

(8)

where the granule cell membrane time constant τ_g = 10ms. In the above equation, K is a set of four indices, randomly drawn from all MF. We require that at least one MF per GC belongs to groups 1, 2 or 5, as observed experimentally³⁰. The gain ${\alpha }_{i}$ and threshold ${\theta }_{i}$ are set individually for each GC i as explained below.

MLI activity is assumed to represent the average rate of the GC population, thus allowing each GC to have a net excitatory or inhibitory effect depending on the difference between the MLI-PC inhibitory weight and the respective GC-PC excitatory weight:

$${mli}(t)=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}{{gc}}_{i}\left(t\right),$$

(9)

The synaptic weights between the ith GC and the PC and between the MLI and PC were defined as ${J}_{E,i}$ and ${J}_{I}$, respectively. The total synaptic input to the PC is thus given by

$${I}_{{pc}}(t) =\mathop{\sum }\limits_{i=1}^{N}\frac{{J}_{E,i}}{N}{{gc}}_{i}(t)-{J}_{I}{mli}(t)+{I}_{{spont}}\\ =\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}({J}_{E,i}-{J}_{I}){{gc}}_{i}(t)+{I}_{{spont}}.$$

(10)

${I}_{{spont}}$ is an input that maintains the spontaneous firing of the PC at 40 Hz.

Finally, the PC firing rate is given by

$$\begin{array}{c}{pc}(t)=\max ({I}_{pc}(t),0).\end{array}$$

(11)

In Fig. 1, the GC-PC weights ${J}_{E,i}$ were drawn from an exponential distribution with mean equal to 1. To decrease or increase the ratio of the average excitatory to inhibitory weight, in Figs. 1c and 1d we set ${J}_{I}=1.025$ and ${J}_{I}=0.975,$ respectively. The full CC model and the reduced model (described below) were numerically integrated using the Euler method with step size 0.5 ms.

GC Threshold and gain adjustment

Changing the statistics of the MF firing rate distributions changes the fraction of active GCs at any given time and the average GC firing rates. To avoid the confounding impact that co-varying these quantities has on learning performance when comparing different MF parameter sets, we adjusted GC thresholds, ${\theta }_{i}$ and gains ${\alpha }_{i}$ such that, at steady state, the fraction of active GCs and the average GC firing rates were identical for all MF parameter choices. Specifically, we drew 1000 random MF patterns from the respective firing rate distributions, and we calculated the steady inputs values of the synaptic dynamics as follows:

$${\left({u}_{{ij}}^{{slow},\mu }\right)}^{*} ={p}_{v,{slow}}^{\alpha }\cdot \frac{1+{\tau }_{F}^{\alpha }\cdot {m}^{\mu }}{1+{p}_{v,{slow}}^{\alpha }\cdot {\tau }_{F}^{\alpha }\cdot {m}_{j}^{\mu }}\\ {\left({u}_{{ij}}^{{fast},\mu }\right)}^{*} ={p}_{v,{fast}}^{\alpha }\cdot \frac{1+{\tau }_{F}^{\alpha }\cdot {m}^{\mu }}{1+{p}_{v,{fast}}^{\alpha }\cdot {\tau }_{F}^{\alpha }\cdot {m}_{j}^{\mu }}$$

(12)

$${\left({x}_{{ij}}^{{slow},\mu }\right)}^{*} =\frac{1}{1+{\left({u}_{{ij}}^{{slow},\mu }\right)}^{*}\cdot {\tau }_{{ref}}^{{slow}}\cdot \left(1-{p}_{{ref}}\right)\cdot {m}_{j}^{\mu }}\\ {\left({x}_{{ij}}^{{fast},\mu }\right)}^{*} =\frac{1}{1+{\left({u}_{{ij}}^{{fast},\mu }\right)}^{*}\cdot {\tau }_{{ref}}^{{fast}}\cdot {m}_{j}^{\mu }}$$

(13)

$$\begin{array}{c}{\left({q}_{{ij}}^{\mu }\right)}^{*}=\frac{{N}_{{tot}}}{{N}_{{tot}}+{\varDelta }_{D}\cdot {\tau }_{D}\cdot \left({\left({n}_{{ij}}^{s{low},\mu }\right)}^{*}+{\left({n}_{{ij}}^{{fast},\mu }\right)}^{*}\right)\cdot {m}_{j}^{\mu }}\end{array}$$

(14)

With these, we obtained, for each GC, the distribution of steady-state inputs and firing rates:

$${\left({I}_{{gc},i}^{\mu }\right)}^{*} =\mathop{\sum }\limits_{j\in K}{\left({W}^{\mu }\right)}_{{ij}}^{*}{m}_{j}^{\mu }\left(t\right) \\ {\left({{gc}}_{i}^{\mu }\right)}^{*} ={\alpha }_{i}\cdot \max \left({\left({I}_{{gc},i}^{\mu }\right)}^{*}-{\theta }_{i},0\right)$$

(15)

We then adjusted ${\alpha }_{i}$ and ${\theta }_{i}$ for each GC to maintain an average steady-state GC firing rate of 5 Hz for all patterns. The lifetime sparsity of each GC was set to 0.2, which is within the range of experimental observations^84,85. Throughout the article, this adjustment was carried out every time we changed synaptic parameters (Fig. 5), the parameters of the MF firing rate distributions (Fig. 4) or the MF to synapse connectivity (Fig. 5).

Supervised learning rule

Purkinje cell pauses associated with eyelid conditioning acquisition were generated by adjusting J_E,i using a supervised learning rule. The target PC firing rate ${I}_{{target}}(t)$ was set as a Dirac pulse in which the PC rate is zero in the time bin around ${t}_{{target}}$ following the start of the CS.:

$$\begin{array}{c}{I}_{{target}}(t)={I}_{{spont}}\cdot \left[1-S\left(t-{t}_{{target}}\right)\right]\end{array}$$

(16)

where $S=1$ in the time bin around ${t}_{{target}}$ and $S=0$ otherwise. We quantify the deviation of the PC firing rate from the target rate by the least squares loss E that is to be minimized during learning:

$$E =\frac{1}{2}{\int }_{-{T}_{{pre}}}^{{T}_{{CS}}}{{{{{{{\rm{d}}}}}}t}\,\widetilde{w}}_{{err}}^{2}(t){\epsilon }^{2}(t)\\ =\frac{1}{2}{\int }_{-{T}_{{pre}}}^{{T}_{{CS}}}{{{{{{\rm{d}}}}}}t}\,{\widetilde{w}}_{{err}}^{2}(t){\left({I}_{{pc}}(t)-{I}_{{target}}(t)\right)}^{2}$$

(17)

$[0,{T}_{{CS}}]$ is the time interval after CS onset (at $t=0$) during which we require the PC to follow the target signal and $[-{T}_{{pre}},0]$ is a time interval before CS onset during which the PC should fire at its spontaneous rate. $\epsilon (t)$ denotes the deviation between the target and the actual PC output at time t. ${\widetilde{w}}_{{err}}$ is a factor that we use to increase the sensitivity of the loss E function to the target time, and is given by:

$${\widetilde{w}}_{{err}}(t) =\frac{{w}_{{err}}(t)}{{\int }_{-{T}_{{pre}}}^{{T}_{{CS}}}d{t{{\hbox{'}}}}{w}_{{err}}({t{{\hbox{'}}}})}\\ {w}_{{err}}(t) =\left\{\begin{array}{cc}3.5 & {{\mbox{if}}}\,t={t}_{{target}}\\ 1 & {{\mbox{else}}}\hfill\end{array}\right.$$

(18)

In all main figures, we used ${T}_{{CS}}=1.4s$ and ${T}_{{pre}}=0.1s$.

GC-PC weights ${J}_{E,i}$ were modified during learning using gradient descent to reduce the error E at each step of the learning algorithm:

$${J}_{i} \leftarrow {J}_{i}+\varDelta {J}_{i}\\ \varDelta {J}_{i} =\eta \frac{\partial E}{\partial {J}_{i}} \\ =\frac{\eta }{N}{\int }_{-{T}_{{pre}}}^{{T}_{{CS}}}{{{{{\rm{d}}}}}}t{\widetilde{w}}_{{err}}^{2}(t)\cdot \epsilon (t)\cdot {{gc}}_{i}(t)$$

(19)

Here, $\eta$ is a learning rate. For our simulations, we modified this basic rule in two ways. Firstly, similar to ref. 75, we explicitly simulated a climbing fiber (CF) rate, cf, that is modulated by the error signal $\epsilon (t)={I}_{{pc}}(t)-{I}_{{target}}(t)$ according to

$$\begin{array}{c}{cf}(t)=\max (c{f}_{{spont}}+\beta \epsilon (t),0)\end{array}$$

(20)

where $c{f}_{{spont}}$ is the spontaneous CF rate and β a proportionality factor. The CF rate was then used to update the synaptic weight according to the following equation:

$$\varDelta {J}_{i}=\frac{\eta }{N}{\int }_{-{T}_{{pre}}}^{{T}_{{CS}}}{{{{{\rm{d}}}}}}\,t\,{\widetilde{w}}_{{err}}^{2}(t)\cdot (c{f}_{{spont}}-{cf}(t))\cdot {{gc}}_{i}(t)$$

(21)

where we also set ${J}_{E,i}=0$ when a learning iteration resulted in a negative weight. As the CF rate is required to be positive or zero, this formulation limits the error information transmitted to the PC compared to the simple gradient rule. This learning rule yields synaptic long-term depression when CF and GC are simultaneously active and long-term potentiation when GCs are active alone, consistent with experimental data on GC-PC synaptic plasticity⁵⁹.

Furthermore, recent experimental findings suggest that the temporal properties of GC-PC plasticity rules are tuned to compensate for the typical delays expected for error information arriving in the cerebellar cortex⁷⁸. Here, we did not explicitly model CF error information delays, and for the sake of simplicity, directly modeled the timing of PC activity to show that the GC basis set is sufficient to generate an appropriately timed PC pause.

To increase the learning speed, we added a Nesterov acceleration scheme to Eq. (21)⁸⁶, introducing a momentum term to the gradient, i.e. weight updates made during a given iteration of the algorithm depended on the previous iteration. The implementation we chose additionally features an adaptive reset of the momentum term, improving convergence properties⁸⁶. This addition is for practical convenience and does not reflect biological mechanisms.

For the weight learning, we subsampled the simulated GC rates by a factor of 10 and set η = 0.0025, $\beta=0.5$ and the initial distribution of weights to ${J}_{E,i}={J}_{I}=10$ for all $i$. For all eyelid response learning simulations, we chose $c{f}_{{spont}}=1{Hz}$ (Figs. 2, 4, 5).

Error measure of learned Purkinje cell pause

We defined the error between the PC pause and the ${I}_{{target}}$ (see Fig. 4, S3, S4 and S5) in the following way:

$${\epsilon }_{{tot}}=\left(1-\frac{{\epsilon }_{{amp}}}{{h}_{{spont}}}\right)+\frac{{\epsilon }_{{fwhm}}}{s}+5\cdot \frac{{\epsilon }_{t}}{s}$$

(22)

The first term depends on the amplitude of the PC pause relative to baseline firing, yielding a small error when the amplitude goes to zero. The second term corresponds to the normalized width of the PC pause. Finally, the third term is the normalized deviation of the pause’s minimum from the target time, ${\epsilon }_{t}$. To increase the importance of this term, we scaled it by a factor 5. The error measure in Figs. S4 and S5 is the sum of ${\epsilon }_{{tot}}$ over all tested delays.

Reduced CC model

The reduced synaptic model included only two synapse types. We also neglected facilitation and desensitization, yielding constant release probabilities and constant normalized quantal size:

$${u}_{{ij}}^{{slow}}(t) ={p}_{v,{slow}}^{\alpha }\\ {u}_{{ij}}^{{fast}}(t) ={p}_{v,{fast}}^{\alpha }\\ {q}_{{ij}}(t) =1.$$

(23)

We obtain for the vesicle pool dynamics:

$${\dot{x}}_{{ij}}^{{slow}}(t) =\frac{1-{x}_{{ij}}^{{slow}}(t)}{{\tau }_{{ref}}^{{slow}}}-{p}_{v,{slow}}^{\alpha }(1-{p}_{{ref}}){x}_{{ij}}^{{slow}}(t)\cdot {m}_{j}(t)\\ {\dot{x}}_{{ij}}^{{fast}}(t) =\frac{1-{x}_{{ij}}^{{fast}}(t)}{{\tau }_{{ref}}^{{fast}}}-{p}_{v,{fast}}^{\alpha }{x}_{{ij}}^{{fast}}(t)\cdot {m}_{j}(t).$$

(24)

and the total synaptic weight becomes

$${W}_{{ij}}(t)={N}_{{slow}}^{\alpha }\cdot {p}_{v,{slow}}^{\alpha }\cdot {x}_{{ij}}^{{slow}}(t)+{N}_{{fast}}^{\alpha }\cdot {p}_{v,{fast}}^{\alpha }\cdot {x}_{{ij}}^{{fast}}(t).$$

(25)

Here the index $\alpha$ denotes membership in the driver or supporter category. The synaptic currents of the reduced model are computed as in the full model. Each GC receives exactly two driver and two supporter MF inputs with random and pairwise distinct identities. To eliminate any non-synaptic dynamics from the reduced model, we removed the GC membrane time constant yielding GC dynamics that follow the synaptic input instantaneously:

$${{gc}}_{i}(t)={\alpha }_{i}\cdot \max ({I}_{{gc},i}(t)-{\theta }_{i},0).$$

(26)

Finally, GC threshold and gain adjustments were carried out similarly to the full CCM_STP where instead of Eqs. (12) and (14) we used Eq. (23).

Synaptic parameters of the reduced model

The parameters of the reduced model were set to create two synapse types that capture the essence of the experimentally observed synaptic behavior: a strong and fast driver synapse, and a weak and slow supporter synapse. All synaptic parameters of the model used in Figs. 3, 4 and 6 are summarized in Table 3.

Table 3 Synaptic parameters used in reduced model

Full size table

In Fig. 5, firing rates and release probabilities were randomly drawn from uniform distributions. In detail, the release probabilities of the slow pool, ${p}_{v,{slow}}$, were drawn from distributions with a lower and upper bound of 0.1 and 0.9, respectively, (Fig. 5a, d, g, and h), and the corresponding release probabilities of the fast pool were calculated according to ${p}_{v,{fast}}=\tfrac{2}{3}{p}_{v,{slow}}$, keeping them strictly lower. The lower and upper bounds of the distribution of firing rates used in panels a and d were 5 Hz and 270 Hz, resulting in firing rate standard deviations of ${{{{{{\rm{\sigma }}}}}}}_{{rate}}\approx 38.2$ Hz for the two-groups case (Fig. 5a) and ${{{{{{\rm{\sigma }}}}}}}_{{rate}}\approx 15.3$ Hz for the five groups case (Fig. 5d). The bounds of the distributions in panels g and h were chosen to match the average group firing rates equal to those in panel d and firing rate standard deviations that increased with the group index, i.e. ${{{{{{\rm{\sigma }}}}}}}_{{rate}}\approx \{{{{{\mathrm{5.0,7.6,10.2,12.7,15}}}}}.3\}$ Hz for groups 1 to 5, respectively. Finally, the sizes of the slow vesicle pool were fixed at ${N}_{{slow}}=4$ and the size of the fast vesicle pools were set to decrease with the group index, i.e. ${N}_{{fa}{st}}=\{{{{{\mathrm{16,6}}}}}\}$ for the two-groups case, and ${N}_{{fast}}=\{{{{{\mathrm{16,12,8,6,6}}}}}\}$ for the five groups case. Finally, the desired rank correlation between ${p}_{v}$ identities and MF identities was achieved by creating a Gaussian copula reflecting their statistical dependency and reordering the marginal ${p}_{v}$ and MF distributions accordingly.

Derivation of ${{{{{{\boldsymbol{\tau }}}}}}}_{{{{{{\boldsymbol{syn}}}}}}}$ and ${{{{{{\boldsymbol{A}}}}}}}_{{{{{{\boldsymbol{t}}}}}}}$

In the reduced model, we derived an analytical solution to the synaptic current driving a GC in response to the CS. Since the equations describing slow and fast vesicle pool dynamics are formally very similar, we describe the derivation for a single slow pool only. Additionally, we suppress all indices for the sake of readability. We assume that the MF rate $m(t)$ switches instantaneously from ${m}_{{preCS}}$ to ${m}_{{CS}}$ at time ${t{{\hbox{'}}}}=0$. Integration of equations (Eq. (24)) from ${t{{\hbox{'}}}}=0$ to t yields:

$$x(t)=({x}_{{preCS}}^{*}-{x}_{CS}^{*})\exp \left(-\left(\frac{1}{{\tau }_{{ref}}}+{p}_{v}(1-{p}_{{ref}}){m}_{{CS}}\right)t\right)+{x}_{{CS}}^{*},$$

(27)

Here, ${x}_{{preCS}}^{*}$ and ${x}_{{CS}}^{*}$ denote the steady-state values of x before (preCS) and after (CS) the firing rate switch. They are given by

$${x}_{\gamma }^{*}=\frac{1}{1+\alpha {p}_{v}{m}_{\gamma }},$$

(28)

with

$$\alpha =\left\{\begin{array}{cc}{\tau }_{{ref}}(1-{p}_{{ref}}) & {{{{\mbox{for}}}}\; {{{\mbox{slow}}}}\; {{{\mbox{pool}}}}}\\ {\tau }_{{ref}} & {{{{\mbox{for}}}}\; {{{\mbox{fast}}}}\; {{{\mbox{pool}}}}}\end{array}\right.$$

(29)

Equation (27) defines the synaptic time constant that governs the speed of transition from a steady-state value before the CS to a steady-state value during the CS:

$${\tau }_{{syn}}={\tau }_{{ref}}\cdot {x}_{CS}^{*}=\frac{{\tau }_{{ref}}}{1+\alpha {p}_{v}{m}_{{CS}}}$$

(30)

This equation is similar to one derived previously^55,87. The total synaptic current per unit time for a single pool during the CS is given by

$${I}_{{syn}}(t)=N{p}_{v}x(t){m}_{{CS}}$$

(31)

Combining Eqs. (27) and(31) we obtain

$$I(t) =\frac{N{p}_{v}{m}_{CS}}{1+\alpha {p}_{v}{m}_{CS}}\left[1+\frac{\alpha {p}_{v}({m}_{CS}-{m}_{preCS})}{1+\alpha {p}_{v}{m}_{preCS}}\exp \left(-\frac{t}{{\tau }_{syn}}\right)\right] \\ =\underbrace{{A}_{s}}_{{{{{{\rm{steady}}}}}}\,{{{{{\rm{state}}}}}}}+\underbrace{{A}_{t}}_{{{{{{\rm{transient}}}}}}\,{{{{{\rm{amplitude}}}}}}}\exp \left(-\frac{t}{{\tau }_{syn}}\right)$$

(32)

Thus, the transient amplitude for a single vesicle pool is

$${A}_{t}=\frac{N{p}_{v}{m}_{{CS}}}{1+\alpha {p}_{v}{m}_{{CS}}}\frac{\alpha {p}_{v}({m}_{{CS}}-{m}_{{preCS}})}{1+\alpha {p}_{v}{m}_{{preCS}}}$$

(33)

For a single synapse, the total transient amplitude is the sum of the individual fast pool and slow pool transients:

$${A}_{t}^{{tot}}={A}_{t}^{{slow}}+{A}_{t}^{{fast}}$$

(34)

To generate the surface plots in Fig. 4 and Fig S3 we generated 10⁵ firing rates from the driver and supporter MF rate distributions, respectively, and used Equations (30), (33) and (34) to calculate the corresponding values of the ${A}_{t}$ and ${\tau }_{{syn}}$. From these, the plots of the joint ${A}_{t}$ and ${\tau }_{{syn}}$ distribution and the marginal distributions were generated using a two- or one-dimensional kernel density estimator, respectively⁸⁸. Note that, formally, ${\tau }_{{syn}}$ is maximal when ${m}_{{CS}}=0$. In that case, however, there is no synaptic transmission as ${A}_{t}^{{tot}}={A}_{t}^{{slow}}={A}_{t}^{{fast}}=0$. When plotting the joint ${A}_{t}$-${\tau }_{{syn}}$ distribution in Fig. 4 and Fig S3, we therefore omitted time constants and transient amplitudes corresponding to ${m}_{{CS}}=0$.

Bayesian estimation of time intervals

To learn the mapping between t_m and t_e, we presented CCM_STP with variable intervals drawn from various prior distributions (t_s) subjected to measurement noise. The interval was introduced as a tonic input to MFs, similar to the CS in the eyelid simulations. The onset of this tonic input caused an abrupt switch of the MF input rates that persisted over the course of a trial. For each iteration of our learning algorithm, we generated target signals sampled randomly from one of five different uniform prior distributions: 25–150 ms, 50–200 ms, 100–300 ms, 200–400 ms, 300–500 ms. Learning was carried out separately for each interval and for 12000 iterations. We found that to achieve the correct biases for the two longest intervals, we had to introduce a higher CF baseline firing rate, $c{f}_{{spont}}=5$ Hz. The other learning parameters were kept the same as in the eyelid learning simulations.

In keeping with ref. 33, we modeled the DN neuron as an integrator, whose rate was calculated according to

$${{{{{{\rm{d}}}}}}n}\left(t\right)=\int \left({{{{{{\rm{I}}}}}}}_{{ext}}-{J}_{{pc}}{pc}(t)\right){{{{{{\rm{d}}}}}}t},$$

(35)

where the ${J}_{{pc}}$ is the weight of the inhibitory PC-DN synapse and ${{{{{{\rm{I}}}}}}}_{{ext}}=\left\langle {pc}\right\rangle$ is an external excitatory input to DN. It was set equal to the average PC firing rate during the interval period to ensure that excitation and inhibition onto the DN are of comparable size. For simplicity, we set ${J}_{{pc}}=1$.

In order to map the DN rate to a time axis (Fig. 6f, j), we rescaled every individual DN output curve according to:

$$\widehat{{dn}}\left(t\right)=\left({t}_{s,\max }-{t}_{s,\min }\right)\frac{{dn}\left(t\right)-{{dn}}_{\min }}{{{dn}}_{\max }-{{dn}}_{\min }}+{t}_{s,\min },$$

(36)

where ${t}_{s,\max }$ and ${t}_{s,\min }$ are the maximum and minimum of the respective prior interval and ${{dn}}_{\max }$ and ${{dn}}_{\min }$ are the maximum and minimum values of the DN firing rate. Since the transformation described in Eq. (36) is linear, the essential features exhibited by the DN firing rate (i.e. its biases) are preserved.

To show how the theoretical Bayesian least squares (BLS) interval estimate can be obtained, we follow the reasoning from ref. 33. It is assumed that to estimate a time-interval, t_s, subjects perform a noisy measurement, t_m, according to:

$$p\left({t}_{m}|{t}_{s}\right)=\frac{1}{\sqrt{2\pi {({w}_{{weber}}{t}_{s})}^{2}}}{e}^{-\frac{{({t}_{s}-{t}_{m})}^{2}}{2{({w}_{{weber}}{t}_{s})}^{2}}}.$$

(37)

Note that the standard deviation of the estimate of t_m increases with the length of the interval t_s with proportionality factor w_weber, which is the weber fraction. Given the prior distribution of time intervals, $\Pi ({t}_{s})$, the Bayesian estimate of t_s given t_m is:

$$p\left({t}_{s}|{t}_{m}\right)\propto \Pi \left({t}_{s}\right)p\left({t}_{m}|{t}_{s}\right).$$

(38)

The BLS estimate is the expected value of the previous expression:

$${t}_{e}=E[p\left({t}_{s}|{t}_{m}\right)].$$

(39)

We performed a least squares fit of the BLS model to the CCM_STP outputs (from all five interval distributions simultaneously) with w_weber as a single free parameter.

Recurrent Golgi cell inhibition

To probe the effect of recurrent inhibition in the reduced CCM_STP, we added one Golgi cell (GoC) that received excitatory inputs from all GCs and formed inhibitory synapses onto all GCs. For simplicity, we assumed that the GoC fires with a rate ${goc}$ equal to the average GC firing rate, similarly to the MLI, and that all GoC to GC synapses have identical weights, J_goc:

$$goc(t) =\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}g{c}_{i}(t)=\langle gc(t)\rangle \\ {I}_{gc,i}(t) =\mathop{\sum}\limits_{j\in K}{W}_{ij}(t){m}_{j}(t)-{J}_{goc}\cdot goc(t)\\ g{c}_{i}(t) ={\alpha }_{i}\cdot \,{{\max }}({I}_{gc,i}(t)-{\theta }_{i},0).$$

(40)

The above equations imply that, in this configuration, the GoC acts as an activity-dependent GC threshold.

To ensure that the overall GC activity level in the reduced CCM_STP with GoC inhibition is comparable to the case without, we require the same criterion as above: an average GC rate of 5 Hz and a fraction of activated GCs of 0.2 in steady state. Since the average GC input now depends on the average GC firing rate itself, manual adjustment of GC thresholds, ${\theta }_{i}$, and gains, ${\alpha }_{i}$, carried out as above, is not feasible.

Instead, a steady-state solution of the set of Eq. (40) satisfying our requirements has to be found numerically. We first set up the CC network without the GoC and adjusted GC thresholds, ${\theta }_{i}$, and gains, ${\alpha }_{i}$, according to the procedure described above. Note that in the reduced model, due to every GC receiving the same combination of inputs (i.e. 2 supporter and two driver inputs), both ${\theta }_{i}$ and ${\alpha }_{i}$ are similar across GCs. We thus made the additional simplification of setting $\theta={{{{{\rm{E}}}}}}({\theta }_{i})$ and $\alpha={{{{{\rm{E}}}}}}({\alpha }_{i})$ for all GCs. We then reduced GC thresholds by 10% and introduced the GoC.

To obtain the average steady-state GC firing rate we assumed that the synaptic currents of a single GC are normally distributed across MF input patterns or, equivalently, across GCs. Mean and variance of the GC inputs are:

$$\left\langle {I}_{gc}^{\ast }\right\rangle ={{{{{\rm{E}}}}}}\left({I}_{gc,i}^{\ast }\right)={{{{{\rm{E}}}}}}\left(\mathop{\sum}\limits_{j\in K}{W}_{ij}^{\ast }\cdot {m}_{j}\right)-{J}_{goc}\cdot \left\langle g{c}^{\ast }\right\rangle \\ {\sigma }_{{I}^{\ast }}^{2} ={{{{{\rm{Var}}}}}}\left({I}_{gc,i}^{\ast }\right)={{{{{\rm{Var}}}}}}\left(\mathop{\sum}\limits_{j\in K}{W}_{ij}^{\ast }\cdot {m}_{j}\right)$$

(41)

We can then express the average GC firing rate in the $N\to \infty$ limit as:

$$\langle gc\ast \rangle=\alpha {\int }_{-\infty }^{+\infty }\max \left(\left\langle {I}_{gc}^{\ast }\right\rangle+{\sigma }_{I}^{\ast }\cdot \xi -\tilde{\theta },\,0\right)\exp \left(-\frac{{\xi }^{2}}{2}\right)\frac{d\xi }{\sqrt{2\pi }}$$

(42)

where $\widetilde{\theta }=0.9\theta$. The fraction of active GCs $f$ can be written as:

$$f=\frac{1}{2}{{{{{\rm{erfc}}}}}}\left(\frac{\theta -\left\langle {I}_{{gc}}^{*}\right\rangle }{\sqrt{2}{\sigma }_{{I}^{*}}}\right)$$

(43)

We can now impose that

$$\left\langle {{gc}}^{*}\right\rangle =5{{{{{\rm{Hz}}}}}}\\ f =0.2$$

(44)

and find a self-consistent solution of Eqs. (41), (42), and (43) by adjusting the parameters ${J}_{{goc}}$ and $\alpha$. To do so we used the hybrid numerical root-finder from the GNU scientific library⁸⁹ with default step size.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

No experimental data were generated in this study.

Code availability

Figures were generated with Matlab (R2019b) and python (3.8). All simulations were performed with C++11 using the GNU scientific library (2.6)⁸⁹ and the armadillo library (11.0.1)⁹⁰. The code is available on the following GitHub repository: https://github.com/alessandrobarri/cerebellar_cortex_input_STP.

References

Broome, B. M., Jayaraman, V. & Laurent, G. Encoding and decoding of overlapping odor sequences. Neuron 51, 467–482 (2006).
Article CAS Google Scholar
Crowe, D. A., Averbeck, B. B., Chafee, M. V. & Georgopoulos, A. P. Dynamics of parietal neural activity during spatial cognitive processing. Neuron 47, 885–891 (2005).
Article CAS Google Scholar
Harvey, C. D., Coen, P. & Tank, D. W. Choice-specific sequences in parietal cortex during a virtual-navigation decision task. Nature 484, 62–68 (2012).
Article ADS CAS Google Scholar
Sauerbrei, B. A. et al. Cortical pattern generation during dexterous movement is input-driven. Nature 577, 386–391 (2020).
Article CAS Google Scholar
Zhou, S., Masmanidis, S. C. & Buonomano, D. V. Neural sequences as an optimal dynamical regime for the readout of time. Neuron 108, 651–658.e5 (2020).
Article CAS Google Scholar
Bright, I. M. et al. A temporal record of the past with a spectrum of time constants in the monkey entorhinal cortex. Proc. Natl Acad. Sci. 117, 20274–20283 (2020).
Article ADS CAS Google Scholar
MacDonald, C. J., Lepage, K. Q., Eden, U. T. & Eichenbaum, H. Hippocampal “time cells” bridge the gap in memory for discontiguous events. Neuron 71, 737–749 (2011).
Article CAS Google Scholar
Pastalkova, E., Itskov, V., Amarasingham, A. & Buzsaki, G. Internally generated cell assembly sequences in the rat hippocampus. Science 321, 1322–1327 (2008).
Article ADS CAS Google Scholar
Long, M. A., Jin, D. Z. & Fee, M. S. Support for a synaptic chain model of neuronal sequence generation. Nature 468, 394–399 (2010).
Article ADS CAS Google Scholar
Kennedy, A. et al. A temporal basis for predicting the sensory consequences of motor commands in an electric fish. Nat. Neurosci. 17, 416–422 (2014).
Article CAS Google Scholar
Laje, R. & Buonomano, D. V. Robust timing and motor patterns by taming chaos in recurrent neural networks. Nat. Neurosci. 16, 925–933 (2013).
Article CAS Google Scholar
Yamazaki, T. & Tanaka, S. The cerebellum as a liquid state machine. Neural Netw. 20, 290–297 (2007).
Article MATH Google Scholar
Toyoizumi, T. & Abbott, L. F. Beyond the edge of chaos: amplification and temporal integration by recurrent networks in the chaotic regime. Phys. Rev. E: Stat. Nonlin. Soft Matter Phys. 84, 051908 (2011).
Article ADS CAS Google Scholar
Dittman, J. S., Kreitzer, A. C. & Regehr, W. G. Interplay between facilitation, depression, and residual calcium at three presynaptic terminals. J. Neurosci. 20, 1374–1385 (2000).
Article CAS Google Scholar
Abbott, L. F. & Regehr, W. G. Synaptic computation. Nature 431, 796–803 (2004).
Article ADS CAS Google Scholar
Abbott, L. F., Varela, J. A., Sen, K. & Nelson, S. B. Synaptic depression and cortical gain control. Science 275, 220–224 (1997).
Article CAS Google Scholar
Rothman, J. S., Cathala, L., Steuber, V. & Silver, R. A. Synaptic depression enables neuronal gain control. Nature 457, 1015–1018 (2009).
Article ADS CAS Google Scholar
Buonomano, D. V. & Merzenich, M. M. Temporal information transformed into a spatial code by a neural network with realistic properties. Science 267, 1028–1030 (1995).
Article ADS CAS Google Scholar
Mongillo, G., Barak, O. & Tsodyks, M. Synaptic theory of working memory. Science 319, 1543–1546 (2008).
Article ADS CAS Google Scholar
Buonomano, D. V. & Maass, W. State-dependent computations: spatiotemporal processing in cortical networks. Nat. Rev. Neurosci. 10, 113–125 (2009).
Article CAS Google Scholar
Chadderton, P., Schaefer, A. T., Williams, S. R. & Margrie, T. W. Sensory-evoked synaptic integration in cerebellar and cerebral cortical neurons. Nat. Rev. Neurosci. 15, 71–83 (2014).
Article CAS Google Scholar
Popa, L. S., Hewitt, A. L. & Ebner, T. J. Predictive and Feedback Performance Errors Are Signaled in the Simple Spike Discharge of Individual Purkinje Cells. J. Neurosci. 32, 15345–15358 (2012).
Article CAS Google Scholar
Burguière, E. et al. Spatial navigation impairment in mice lacking cerebellar LTD: a motor adaptation deficit? Nat. Neurosci. 8, 1292–1294 (2005).
Article Google Scholar
Moberget, T., Gullesen, E. H., Andersson, S., Ivry, R. B. & Endestad, T. Generalized role for the cerebellum in encoding internal models: evidence from semantic processing. J. Neurosci. 34, 2871–2878 (2014).
Article CAS Google Scholar
Gao, Z. et al. A cortico-cerebellar loop for motor planning. Nature 563, 113–116 (2018).
Article ADS CAS Google Scholar
Chabrol, F. P., Blot, A. & Mrsic-Flogel, T. D. Cerebellar contribution to preparatory activity in motor neocortex. Neuron 103, 506–519.e4 (2019).
Article CAS Google Scholar
Marr, D. A theory of cerebellar cortex. J. Physiol. 202, 437–470 (1968).
Article Google Scholar
Albus, J. S. A theory of cerebellar function. Math. Biosci. 10, 25–61 (1971).
Article Google Scholar
Medina, J. F. & Mauk, M. D. Computer simulation of cerebellar information processing. Nat. Neurosci. 3, 1205–1211 (2000).
Article CAS Google Scholar
Chabrol, F. P., Arenz, A., Wiechert, M. T., Margrie, T. W. & DiGregorio, D. A. Synaptic diversity enables temporal coding of coincident multisensory inputs in single neurons. Nat. Neurosci. 18, 718–727 (2015).
Article CAS Google Scholar
Halverson, H. E., Khilkevich, A. & Mauk, M. D. Relating cerebellar Purkinje cell activity to the timing and amplitude of conditioned eyelid responses. J. Neurosci. 35, 7813–7832 (2015).
Article CAS Google Scholar
White, N. E., Kehoe, E. J., Choi, J. S. & Moore, J. W. Coefficients of variation in timing of the classically conditioned eyeblink in rabbits. Psychobiology 28, 520–524 (2000).
Narain, D., Remington, E. D., Zeeuw, C. I. D. & Jazayeri, M. A cerebellar mechanism for learning prior distributions of time intervals. Nat. Commun. 9, 469 (2018).
Litwin-Kumar, A., Harris, K. D., Axel, R., Sompolinsky, H. & Abbott, L. F. Optimal degrees of synaptic connectivity. Neuron 93, 153–1164.e7 (2017).
Cayco-Gajic, N. A., Clopath, C. & Silver, R. A. Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nat. Commun. 8, 1116 (2017).
Fujita, M. Adaptive filter model of the cerebellum. Biol. Cybern. 45, 195–206 (1982).
Article CAS MATH Google Scholar
Dean, P., Porrill, J., Ekerot, C.-F. & Jörntell, H. The cerebellar microcircuit as an adaptive filter: experimental and computational evidence. Nat. Rev. Neurosci. 11, 30–43 (2010).
Article CAS Google Scholar
Hallermann, S. et al. Bassoon speeds vesicle reloading at a central excitatory synapse. Neuron 68, 710–723 (2010).
Article CAS Google Scholar
Saviane, C. & Silver, R. A. Fast vesicle reloading and a large pool sustain high bandwidth transmission at a central synapse. Nature 439, 983–987 (2006).
Article ADS CAS Google Scholar
Park, H. J., Lasker, D. M. & Minor, L. B. Static and dynamic discharge properties of vestibular-nerve afferents in the mouse are affected by core body temperature. Exp. Brain Res. 200, 269–275 (2010).
Article Google Scholar
Arenz, A., Silver, R. A., Schaefer, A. T. & Margrie, T. W. The contribution of single synapses to sensory representation in vivo. Science 321, 977–980 (2008).
Article ADS CAS Google Scholar
Bosman, L. W. J. et al. Encoding of whisker input by cerebellar Purkinje cells: Whisker encoding by Purkinje cells. J. Physiol. 588, 3757–3783 (2010).
Article CAS Google Scholar
Ohmae, S. & Medina, J. F. Climbing fibers encode a temporal-difference prediction error during cerebellar learning in mice. Nat. Neurosci. 18, 1798–1803 (2015).
Article CAS Google Scholar
Steinmetz, J. E., Lavond, D. G. & Thompson, R. F. Classical conditioning of the rabbit eyelid response with mossy fiber stimulation as the conditioned stimulus. Bull. Psychon. Soc. 23, 245–248 (1985).
Article Google Scholar
Khilkevich, A., Zambrano, J., Richards, M.-M. & Mauk, M. D. Cerebellar implementation of movement sequences through feedback. eLife 7, e06262 (2018).
Bouvier, G. et al. Cerebellar learning using perturbations. eLife 45, (2018).
Gibbon, J. Scalar expectancy theory and Weber’s law in animal timing. Psychol. Rev. 84, 47 (1977).
Article Google Scholar
Jazayeri, M. & Shadlen, M. N. Temporal context calibrates interval timing. Nat. Neurosci. 13, 1020–1026 (2010).
Article CAS Google Scholar
Miyazaki, M., Nozaki, D. & Nakajima, Y. Testing Bayesian models of human coincidence timing. J. Neurophysiol. 94, 395–399 (2005).
Article Google Scholar
Egger, S. W., Remington, E. D., Chang, C.-J. & Jazayeri, M. Internal models of sensorimotor integration regulate cortical dynamics. Nat. Neurosci. 22, 1871–1882 (2019).
Article CAS Google Scholar
Marr, D. Vision: A Computational Investigation Into the Human Representation and Processing of Visual Information (MIT Press, 1982).
Shankar, K. H. & Howard, M. W. A scale-invariant internal representation of time. Neural Comput 24, 134–193 (2012).
Article MathSciNet MATH Google Scholar
Albergaria, C., Silva, N. T., Pritchett, D. L. & Carey, M. R. Locomotor activity modulates associative learning in mouse cerebellum. Nat. Neurosci. 21, 725–735 (2018).
Article CAS Google Scholar
Guo, J.-Z. et al. Disrupting cortico-cerebellar communication impairs dexterity. eLife 10, e65906 (2021).
Article CAS Google Scholar
Puccini, G. D., Sanchez-Vives, M. V. & Compte, A. Integrated mechanisms of anticipation and rate-of-change computations in cortical circuits. PLoS Comput. Biol. 3, e82 (2007).
Article ADS Google Scholar
Wang, Y. et al. Heterogeneity in the pyramidal network of the medial prefrontal cortex. Nat. Neurosci. 9, 534–542 (2006).
Article CAS Google Scholar
Diaz-Quesada, M., Martini, F. J., Ferrati, G., Bureau, I. & Maravall, M. Diverse thalamocortical short-term plasticity elicited by ongoing stimulation. J. Neurosci. 34, 515–526 (2014).
Article CAS Google Scholar
Buzsáki, G. & Mizuseki, K. The log-dynamic brain: how skewed distributions affect network operations. Nat. Rev. Neurosci. 15, 264–278 (2014).
Article Google Scholar
Gao, Z. & van Beugen, B. J. & De Zeeuw, C. I. Distributed synergistic plasticity and cerebellar learning. Nat. Rev. Neurosci. 13, 619–635 (2012).
Article CAS Google Scholar
Gilmer, J. I., Farries, M. A., Kilpatrick, Z., Delis, I. & Person, A. L. An Emergent Temporal Basis Set Robustly Supports Cerebellar Time-series Learning. https://doi.org/10.1101/2022.01.06.475265 (2022).
Zampini, V. et al. Mechanisms and functional roles of glutamatergic synapse diversity in a cerebellar circuit. eLife 5, e15872 (2016).
Article Google Scholar
Guo, C., Huson, V., Macosko, E. Z. & Regehr, W. G. Graded heterogeneity of metabotropic signaling underlies a continuum of cell-intrinsic temporal responses in unipolar brush cells. Nat. Commun. 12, 5491 (2021).
Article ADS CAS Google Scholar
Dorgans, K. et al. Short-term plasticity at cerebellar granule cell to molecular layer interneuron synapses expands information processing. eLife 8, e41586 (2019).
Article Google Scholar
Gurnani, H. & Silver, R. A. Multidimensional population activity in an electrically coupled inhibitory circuit in the cerebellar cortex. Neuron 109, 1739–1753.e8 (2021).
Kita, K. et al. GluA4 enables associative memory formation by facilitating cerebellar expansion coding. bioRxiv https://doi.org/10.1101/2020.12.04.412023 (2020).
DiGregorio, D. A., Nusser, Z. & Silver, R. A. Spillover of glutamate onto synaptic AMPA receptors enhances fast transmission at a cerebellar synapse. Neuron 35, 521–533 (2002).
Article CAS Google Scholar
Yamazaki, T. & Tanaka, S. A spiking network model for passage-of-time representation in the cerebellum: Cerebellar passage-of-time representation. Eur. J. Neurosci. 26, 2279–2292 (2007).
Article Google Scholar
Straub, I. et al. Gradients in the mammalian cerebellar cortex enable Fourier-like transformation and improve storing capacity. eLife 9, e51771 (2020).
Johansson, F., Jirenhed, D.-A., Rasmussen, A., Zucca, R. & Hesslow, G. Memory trace and timing mechanism localized to cerebellar Purkinje cells. Proc. Natl Acad. Sci. 111, 14930–14934 (2014).
Article ADS CAS Google Scholar
Van Dijck, G. et al. Probabilistic identification of cerebellar cortical neurones across species. PLoS ONE 8, e57669 (2013).
Article ADS Google Scholar
Liu, Z. et al. Sustained deep-tissue voltage recording using a fast indicator evolved for two-photon microscopy. Cell 185, 48 (2022).
Article Google Scholar
Sadeghi, S. G., Chacron, M. J., Taylor, M. C. & Cullen, K. E. Neural variability, detection thresholds, and information transmission in the vestibular system. J. Neurosci. 27, 771–781 (2007).
Article CAS Google Scholar
Medrea, I. & Cullen, K. E. Multisensory integration in early vestibular processing in mice: the encoding of passive vs. active motion. J. Neurophysiol. 110, 2704–2717 (2013).
Article Google Scholar
Bengtsson, F. & Jorntell, H. Sensory transmission in cerebellar granule cells relies on similarly coded mossy fiber inputs. Proc. Natl Acad. Sci. 106, 2389–2394 (2009).
Article ADS CAS Google Scholar
Clopath, C., Badura, A., De Zeeuw, C. I. & Brunel, N. A Cerebellar learning model of vestibulo-ocular reflex adaptation in wild-type and mutant mice. J. Neurosci. 34, 7203–7215 (2014).
Article CAS Google Scholar
Najafi, F. & Medina, J. F. Beyond “all-or-nothing” climbing fibers: graded representation of teaching signals in Purkinje cells. Front. Neural Circuits 7, 115 (2013).
Remington, E. D., Narain, D., Hosseini, E. A. & Jazayeri, M. Flexible sensorimotor computations through rapid reconfiguration of cortical dynamics. Neuron 98, 1005–1019.e5 (2018).
Article CAS Google Scholar
Suvrathan, A., Payne, H. L. & Raymond, J. L. Timing rules for synaptic plasticity matched to behavioral function. Neuron 92, 959–967 (2016).
Article CAS Google Scholar
Markram, H., Wang, Y. & Tsodyks, M. Differential signaling via the same axon of neocortical pyramidal neurons. Proc. Natl Acad. Sci. 95, 5323–5328 (1998).
Article ADS CAS Google Scholar
Van Kan, P. L., Gibson, A. R. & Houk, J. C. Movement-related inputs to intermediate cerebellum of the monkey. J. Neurophysiol. 69, 74–94 (1993).
Article Google Scholar
Beraneck, M. & Cullen, K. E. Activity of vestibular nuclei neurons during vestibular and optokinetic stimulation in the alert mouse. J. Neurophysiol. 98, 1549–1565 (2007).
Article CAS Google Scholar
Dale, A. & Cullen, K. E. The nucleus prepositus predominantly outputs eye movement-related information during passive and active self-motion. J. Neurophysiol. 109, 1900–1911 (2013).
Article Google Scholar
Muzzu, T., Mitolo, S., Gava, G. P. & Schultz, S. R. Encoding of locomotion kinematics in the mouse cerebellum. PLoS ONE 13, e0203900 (2018).
Chen, S., Augustine, G. J. & Chadderton, P. Serial processing of kinematic signals by cerebellar circuitry during voluntary whisking. Nat. Commun. 8, 232 (2017).
Article ADS Google Scholar
Giovannucci, A. et al. Cerebellar granule cells acquire a widespread predictive feedback signal during motor learning. Nat. Neurosci. 20, 727–734 (2017).
O’Donoghue, B. & Candes, E. Adaptive restart for accelerated gradient schemes. Found. Comput. Math. 15, 715–732 (2015).
Article MathSciNet MATH Google Scholar
Goldman, M. S., Maldonado, P. & Abbott, L. F. Redundancy reduction and sustained firing with stochastic depressing synapses. J. Neurosci. 22, 584–591 (2002).
Article CAS Google Scholar
Botev, Z. I., Grotowski, J. F. & Kroese, D. P. Kernel density estimation via diffusion. Ann. Stat. 38, 2916–2957 (2010).
Galassi, M. & Theiler, J. GNU Scientific Library Reference Manual. 3rd edn.
Sanderson, C. & Curtin, R. Armadillo: a template-based C++ library for linear algebra. J. Open Source Softw. 1, 26 (2016).
Article ADS Google Scholar

Download references

Acknowledgements

A.B. thanks Gianluigi Mongillo and Zuzanna Piwkowska Zvonkine for helpful discussions. We thank the DiGregorio Lab for feedback on this manuscript. This work is supported by the Institut Pasteur, Centre National de la Recherche Scientifique, Fondation pour la Recherche Médicale (FRM EQU202003010555), Fondation pour l’Audition (FPA-RD-2018-8), BioPsy Laboratory of Excellence, and the Agence Nationale de la Recherche (ANR-17-CE16-0019, and ANR-18-CE16-0018, ANR-19-CE16 0019-02, ANR-21-CE16-0036-01), which were awarded to the laboratory of DAD.

Author information

Authors and Affiliations

Institut Pasteur, Université Paris Cité, Synapse and Circuit Dynamics Laboratory, CNRS UMR 3571, Paris, France
A. Barri, M. T. Wiechert & D. A. DiGregorio
McGovern Institute for Brain Research, Massachusetts Institute of Technology, Cambridge, MA, USA
M. Jazayeri
Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, USA
M. Jazayeri

Authors

A. Barri
View author publications
You can also search for this author in PubMed Google Scholar
M. T. Wiechert
View author publications
You can also search for this author in PubMed Google Scholar
M. Jazayeri
View author publications
You can also search for this author in PubMed Google Scholar
D. A. DiGregorio
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All simulations and analyses were performed by A.B. A.B., M.W., M.J., and D.A.D. conceived the project and wrote the manuscript.

Corresponding authors

Correspondence to A. Barri or D. A. DiGregorio.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Communications thanks the anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Barri, A., Wiechert, M.T., Jazayeri, M. et al. Synaptic basis of a sub-second representation of time in a neural circuit model. Nat Commun 13, 7902 (2022). https://doi.org/10.1038/s41467-022-35395-y

Download citation

Received: 31 March 2022
Accepted: 29 November 2022
Published: 22 December 2022
DOI: https://doi.org/10.1038/s41467-022-35395-y

This article is cited by

Heterogeneous encoding of temporal stimuli in the cerebellar cortex
- Chris. I. De Zeeuw
- Julius Koppen
- Devika Narain
Nature Communications (2023)
Activity map of a cortico-cerebellar loop underlying motor planning
- Jia Zhu
- Hana Hasanbegović
- Nuo Li
Nature Neuroscience (2023)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Cerebellar cortex model with STP

Simulating PC pauses during eyelid conditioning

Analysis of the synaptic mechanism underlying GC transient responses using a reduced model

The explicit influence of synaptic parameters on temporal learning

Firing rate and synaptic parameters that improve temporal learning performance

STP permits learning optimal estimates of time intervals

Discussion

STP diversity as a timer for neural dynamics

Timing mechanisms in the cerebellar cortex

Predictions of the CCMSTP

Choice of the cerebellar learning rule

Synaptic implementation of a Bayesian computation

Methods

MF-GC synapse model

Synaptic parameters for generating diverse synaptic strength and dynamics

MF firing rate parameters

Cerebellar cortical circuit model

GC Threshold and gain adjustment

Supervised learning rule

Error measure of learned Purkinje cell pause

Reduced CC model

Synaptic parameters of the reduced model

Derivation of \({{{{{{\boldsymbol{\tau }}}}}}}_{{{{{{\boldsymbol{syn}}}}}}}\) and \({{{{{{\boldsymbol{A}}}}}}}_{{{{{{\boldsymbol{t}}}}}}}\)

Bayesian estimation of time intervals

Recurrent Golgi cell inhibition

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Comments

Search

Quick links

Predictions of the CCM_STP