Characterization of anomalous diffusion classical statistics powered by deep learning (CONDOR)

Alessia Gentili; Giorgio Volpe

doi:10.1088/1751-8121/ac0c5d

1. Introduction

Anomalous diffusion (AnDi) refers to diffusion phenomena characterized by a mean square displacement (MSD) that grows in time with an exponent α that is either smaller (subdiffusion) or greater (superdiffusion) than one (standard Brownian diffusion) [1]. Typically, this is represented by a nonlinear power-law scaling (MSD(t) ∼ t^α) [1]. The growing interest in anomalous diffusion shown by the physics community lies in the fact that it extends the concept of Brownian diffusion and helps describe a broad range of phenomena of both natural and human origin [2]. Examples include molecular encounters in reactions [3], cellular signalling [4, 5], the foraging of animals [6], search strategies [7], human travel patterns [8, 9], the spread of epidemics and pandemics [10, 11], as well as trends in financial markets [12, 13] and climate records [14].

Various methods have been introduced over time to characterize anomalous diffusion in real data. Traditional methods are based on the direct analysis of trajectory statistics [15] via the examination of features such as the MSD [16–19], the velocity autocorrelation function [20] and the power spectral density [21, 22] among others [15, 23–34]. More recently, the blossoming of machine learning approaches has expanded the toolbox of available methods for the characterization of anomalous diffusion with algorithms based on random forests [35, 36], gradient boosting techniques [35], recurrent neural networks [37] and convolutional neural networks [38, 39].

Despite the significant progress made, the characterization of anomalous diffusion data still present non-negligible challenges, especially to maintain the approach insightful, easy to implement and parameter-free [40]. First, the measurement of anomalous diffusion from experimental trajectories is often hampered by the lack of significant statistics (i.e. single and short trajectories), by limited localization precision, by irregular sampling as well as by changes of the diffusion properties in time. Then, these limitations become even more pronounced when dealing with non-ergodic processes [1], as the information of interest can be well hidden within individual trajectories. Finally, while machine learning approaches are powerful and parameter-free, more traditional statistics-based approaches typically offer deeper insight on the underlying diffusion process.

These shortcomings have resulted in the launch of the AnDi challenge by a team of international scientists in March 2020 (https://andi-challenge.org) [40, 41]. The goal of the challenge was to push the methodological frontiers in the characterization of three different aspects of anomalous diffusion for 1D, 2D and 3D noise-corrupted trajectories [41]: (1) the inference of the anomalous diffusion exponent α; (2) the classification of the diffusion model; and (3) the segmentation of trajectories characterized by a change in anomalous diffusion exponent and/or model at an unspecified time instant.

Here, in response to the AnDi challenge, we describe a novel method (CONDOR: Classifier Of aNomalous DiffusiOn tRajectories) for the characterization of single anomalous diffusion trajectories based on the combination of classical statistics analysis tools and supervised deep learning. Specifically, our method applies a deep feed-forward neural network to cluster parameters extracted from the statistical features of individual trajectories. By combining advantages from both approaches (classical statistics analysis and machine learning), CONDOR is highly performant and parameter-free. It allows us to identify the underlying anomalous diffusion model with high accuracy (≳84%) and infer its exponent α with a small mean absolute error (≲0.16) in single 1D, 2D and 3D trajectories corrupted by localization noise. CONDOR was always among the top performant methods in the AnDi challenge for both the inference of the α-exponent and the model classification of 1D, 2D and 3D anomalous diffusion trajectories. In particular, CONDOR was the leading method for the inference task in 2D and 3D and for the classification task in 3D. Finally, in this article, we extend CONDOR to the segmentation of trajectories.

2. Methods

CONDOR combines classical statistics analysis of single anomalous diffusion trajectories with supervised deep learning. The trajectories used in this work are generated numerically (with the andi-dataset Python package [42]) from the 5 models considered in the AnDi challenge: the annealed transient time motion (attm) [43], the continuous-time random walk (ctrw) [44], the fractional Brownian motion (fbm) [45], the Lévy walk (lw) [46] and the scaled Brownian motion (sbm) [47]). Briefly, AnDi challenge datasets contain 1D, 2D or 3D trajectories from these 5 models sampled at regular time steps (t_n = nΔt, with n a positive integer and Δt a unitary time interval). The anomalous exponent α varies in the range [0.05, 2] with steps of 0.05, as allowed by the respective model [41]. The shortest trajectories have at least 10 samples per dimension (T_min = 10Δt). Prior to the addition of the localization error, the trajectories have a unitary standard deviation σ_s of the distribution of the displacements. Along each dimension i (i = 1, ..., d with d = 1, 2, 3), the trajectories are then corrupted with additive Gaussian noise (with zero mean and variance σ_i) associated to a finite localization precision to better simulate experimental endeavours [41, 42]. The variance σ_i of the additive Gaussian noise can assume values of 0.1, 0.5 or 1.0. The overall level of noise can then be computed as ${\sigma }_{\text{noise}}=\sqrt{{\sum }_{i=1}^{d}\enspace {\sigma }_{i}^{2}}$ [42]. The inverse of this value is then used to compute the signal-to-noise ratio (SNR). Figure 1 shows examples of such trajectories without noise (figures 1(a)–(c)) and with the addition of corrupting noise (figure 1(d)). By visual inspection, these examples show how differences between single trajectories are not always explicit, thus making the characterization of anomalous diffusion challenging. Already in the absence of noise, the same diffusion process (e.g. for α = 1 in figure 1(a) and for α = 0.5 in figure 1(b)) can be described by different models and the same model (e.g. sbm in figure 1(c)) can assume different values of the exponent α. These occurrences are not always easy to set apart (e.g. attm and fbm in figure 1(b)). The presence of corrupting noise (figure 1(d)) and/or short trajectories (shaded areas in figures 1(a)–(c)) add extra complexity.

**Figure 1.** Representative anomalous diffusion trajectories. (a)–(c) Examples of single 1D anomalous diffusion trajectories in the absence of corrupting noise for (a) different models of standard diffusion (anomalous diffusion exponent α = 1), for (b) different models of anomalous diffusion (anomalous diffusion exponent α = 0.5), and for (c) the same anomalous diffusion model (sbm) with different values of α. The models considered here are the annealed transient time motion (attm), the continuous-time random walk (ctrw), the fractional Brownian motion (fbm), the Lévy walk (lw) and the scaled Brownian motion (sbm). Note that fbm and sbm processes are undistinguishable when α = 1 as in (a). (d) Same trajectories as in (a) corrupted with Gaussian noise (of zero mean and variance σ_noise = 1) associated to a finite localization precision, corresponding to a signal-to-noise ratio (SNR) of 1. For better comparison, each sequence of positions x_n taken at times t_n is represented after subtracting its average and after rescaling its displacements to their standard deviation to give the normalized sequence ${\hat{x}}_{n}$ [35]. The grey-shaded areas highlight a small portion (10 data points) of each trajectory.
Download figure:
Standard image High-resolution image

**Figure 1.** Representative anomalous diffusion trajectories. (a)–(c) Examples of single 1D anomalous diffusion trajectories in the absence of corrupting noise for (a) different models of standard diffusion (anomalous diffusion exponent α = 1), for (b) different models of anomalous diffusion (anomalous diffusion exponent α = 0.5), and for (c) the same anomalous diffusion model (sbm) with different values of α. The models considered here are the annealed transient time motion (attm), the continuous-time random walk (ctrw), the fractional Brownian motion (fbm), the Lévy walk (lw) and the scaled Brownian motion (sbm). Note that fbm and sbm processes are undistinguishable when α = 1 as in (a). (d) Same trajectories as in (a) corrupted with Gaussian noise (of zero mean and variance σ_noise = 1) associated to a finite localization precision, corresponding to a signal-to-noise ratio (SNR) of 1. For better comparison, each sequence of positions x_n taken at times t_n is represented after subtracting its average and after rescaling its displacements to their standard deviation to give the normalized sequence ${\hat{x}}_{n}$ [35]. The grey-shaded areas highlight a small portion (10 data points) of each trajectory.
Download figure:
Standard image High-resolution image

In this section, we provide a detailed description of our method (CONDOR) to characterise such single anomalous diffusion trajectories. The fundamental intuition behind our approach is that different trajectories produced by the same diffusion model should share similar statistical features to set them apart from other models [1]. Nonetheless, in the presence of noise or with few data points, these similarities and differences do not need be explicit in single trajectories (figure 1). Thus, CONDOR first manipulates single trajectories to extract this hidden statistical information and then employs the power of deep learning to analyse and cluster it. CONDOR workflow is described in section 2.1 and shown in figure 2. We will first discuss the feature engineering step used to preprocess the trajectories and extract a set of N relevant features primarily based on classical statistics analysis (section 2.2). These features become the inputs for the deep-learning part of the characterization algorithm based on deep feed-forward neural networks (figure 3), which, in order, classifies the diffusion model and infers the anomalous diffusion exponent α (figure 2). These two steps are performed by multiple neural networks, whose architecture and connection is explained in section 2.3. Section 2.4 discusses the training of CONDOR neural networks. Section 2.5 discusses how CONDOR's performance varies with the number of inputs. In section 2.6, we explain how CONDOR outputs can be used to perform the segmentation of trajectories, where the model and/or its anomalous exponent change in time. Finally, in section 2.7, we provide a brief explanation of the code behind CONDOR [48].

**Figure 2.** CONDOR workflow. CONDOR workflow is divided in three main parts: preprocessing (section 2.2), classification and inference (section 2.3). In the preprocessing step, the vector of positions x associated to a single trajectory is used to extract a vector I of N = 1 + 92d inputs primarily based on classical statistics (with d = 1, 2, 3 being the trajectory dimension). The vector I is the basic input for each neural network in the classification and inference algorithms. The output of the classification is determined by the output of up to three deep feed-forward neural networks and is the predicted model $\tilde {M}$ among five possible categories: the annealed transient time motion (attm), the continuous-time random walk (ctrw), the fractional Brownian motion (fbm), the Lévy walk (lw) and the scaled Brownian motion (sbm). Finally, the inference algorithm predicts the value of the exponent α ∈ [0.05, 2] basing its decision on I and the predicted $\tilde {M}$ for each trajectory. In particular, the predicted $\tilde {\alpha }$ is the arithmetic average of two distinct predictions: ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ where $\tilde {M}$ is used as selection element or additional input for the deep learning algorithm, respectively.
Download figure:
Standard image High-resolution image

**Figure 2.** CONDOR workflow. CONDOR workflow is divided in three main parts: preprocessing (section 2.2), classification and inference (section 2.3). In the preprocessing step, the vector of positions x associated to a single trajectory is used to extract a vector I of N = 1 + 92d inputs primarily based on classical statistics (with d = 1, 2, 3 being the trajectory dimension). The vector I is the basic input for each neural network in the classification and inference algorithms. The output of the classification is determined by the output of up to three deep feed-forward neural networks and is the predicted model $\tilde {M}$ among five possible categories: the annealed transient time motion (attm), the continuous-time random walk (ctrw), the fractional Brownian motion (fbm), the Lévy walk (lw) and the scaled Brownian motion (sbm). Finally, the inference algorithm predicts the value of the exponent α ∈ [0.05, 2] basing its decision on I and the predicted $\tilde {M}$ for each trajectory. In particular, the predicted $\tilde {\alpha }$ is the arithmetic average of two distinct predictions: ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ where $\tilde {M}$ is used as selection element or additional input for the deep learning algorithm, respectively.
Download figure:
Standard image High-resolution image

**Figure 3.** Basic neural network architecture in CONDOR. Architecture of the deep three-layer feed-forward neural network chosen for both classification and inference tasks. Inputs I_n (with n = 1, 2, ..., N) are processed by two hidden layers and an output layer to yield outputs O_q (with q = 1, 2, ..., Q). The hidden layers have K = 20 neurons each with a sigmoidal activation function (∫). The output layer has Q neurons with a softmax activation function (σ). Each neuron has a summer ( ${\Sigma}$ ) that gathers its weighted inputs and bias to return a scalar output. For each neuron, weights v and bias b are optimised during the training of the network.
Download figure:
Standard image High-resolution image

**Figure 3.** Basic neural network architecture in CONDOR. Architecture of the deep three-layer feed-forward neural network chosen for both classification and inference tasks. Inputs I_n (with n = 1, 2, ..., N) are processed by two hidden layers and an output layer to yield outputs O_q (with q = 1, 2, ..., Q). The hidden layers have K = 20 neurons each with a sigmoidal activation function (∫). The output layer has Q neurons with a softmax activation function (σ). Each neuron has a summer ( ${\Sigma}$ ) that gathers its weighted inputs and bias to return a scalar output. For each neuron, weights v and bias b are optimised during the training of the network.
Download figure:
Standard image High-resolution image

2.1. CONDOR workflow

CONDOR's workflow comprises three main activities executed in order (figure 2):

The preprocessing of the trajectories to extract the inputs for CONDOR's neural networks;
The classification of the anomalous diffusion model M;
The inference of the anomalous diffusion exponent α.

Additionally, a forth activity (i.e. segmentation) can be executed in case one is also interested in identifying a change in time of the underlying model and/or of the α exponent in a trajectory.

The first step (preprocessing) manipulates the trajectories to extract a vector I of inputs primarily based on classical statistics. The number of inputs is independent of the length of the trajectory but depends on its dimensionality. A detailed description of these inputs and of the feature engineering process used to extract them follows in section 2.2.

In the following activity (classification), the inputs calculated in the previous step are used by a series of three neural networks to identify the underlying anomalous diffusion model behind each trajectory (figure 2). A detailed description of the architecture of these networks follows in section 2.3. The output of the first network used for classification (step 1 in figure 2) is the main classification step, where most of the trajectories are already allocated to the correct model. The following two networks (steps 2 and 3 in figure 2) perform an additional check on the classification obtained from the previous steps (step 2 on step 1 and step 3 on step 1 and step 2 combined) to either confirm their classification or reassign it for a small percentage of the trajectories (usually ⩽10%). Practically, the output of the first network is directly used to classify trajectories as lw and is key to inform the following two networks (steps 2 and 3 in figure 2) on which trajectories in the dataset to select for the additional check. Specifically, the second network (step 2) accepts the trajectories classified as attm, fbm or sbm by the first network (step 1) and reverifies their classification; the third network (step 3) instead takes the trajectories classified as ctrw by the first network (step 1) and those classified as attm by the second (step 2) and reverifies their classification. Steps 2 and 3 are particularly useful to avoid misclassification of attm trajectories. Indeed, in the attm model, the diffusion coefficient can vary in space or time (in the AnDi challenge only time variations were considered [41, 43]). This feature can be easily confused with features of other processes: namely, the time-dependent diffusivity of the sbm [47], the time irregularities between consecutive steps in the ctrw [44] and the correlations among the increments in the fbm [45].

The next activity (inference) is to determine the anomalous diffusion exponent α (figure 2). The estimated value $\tilde {\alpha }$ is obtained as the arithmetic average of the outputs of two different approaches based on deep learning (figure 2), as the two approaches lead to predicting an underestimated α value ( ${\tilde {\alpha }}_{1}$ ) and an overestimated α value ( ${\tilde {\alpha }}_{2}$ ), respectively. In the first approach, each trajectory is sent to a different set of networks (out of five possible sets, one per model) using the output $\tilde {M}$ of the classification as a selection element. In the second approach (figure 2), each trajectory is sent to the same set of networks where the output $\tilde {M}$ of the model classification is used as an additional input together with the list of inputs identified in the feature engineering step. A detailed description of the architecture of the networks for the inference task follows in section 2.3.

Finally, the outputs of both classification and inference can additionally be used to identify the presence of a change point in a trajectory. This activity is not based on the direct use of neural networks but rather on the analysis of CONDOR's classification and inference outputs on all segments of each trajectory identified with an overlapping moving window. A detailed description of the algorithm used for the segmentation task follows in section 2.6.

2.2. Feature engineering and CONDOR input definition

Before the classification and inference steps (figure 2), we analyzed each trajectory to extract N = 1 + 92d inputs (with d = 1, 2, 3 being the trajectory dimensionality) primarily based on classical statistics (table 1). We would like to stress the fact that the total number of inputs is independent of the duration of each trajectory and only depends on its dimensionality. The inputs used in this work were ultimately selected by trial and error based on the final performance of the deep learning algorithm.

Table 1. Set of 92d quantities (with d = 1, 2, 3) used to extract statistical information from each dimension of every trajectory as part of the N inputs for the deep learning algorithm. The approximate entropy (ApEnⁱ) is introduced as a way to quantify the amount of regularity and unpredictability of fluctuations over the considered time series. $m=2,3,\dots ,\frac{{T}_{\mathrm{min}}}{{\Delta}t}-1$ and u = 1, 2. Note that we do not define ApEnⁱ for Δⁱ(m) when $m=\frac{{T}_{\mathrm{min}}}{{\Delta}t}-2=8$ and $m=\frac{{T}_{\mathrm{min}}}{{\Delta}t}-1=9$ , as the numerical calculation of the approximate entropy requires at least three data points, which is not always the case for short trajectories.

	${\hat{\mathbf{w}}}^{i}$	${\hat{\mathbf{W}}}^{i}$	Δⁱ(m)	rⁱ	fⁱ	IPSDⁱ	ΔMSDⁱ	WLⁱ(u)	IACFⁱ	fAsymⁱ	tAsymⁱ	AbChⁱ
μⁱ	✓	✓	✓		✓	✓	✓	✓
${\mu }_{1/2}^{i}$	✓	✓	✓	✓	✓	✓	✓	✓
σⁱ		✓	✓		✓	✓	✓	✓
${\beta }_{1}^{i}$	✓	✓	✓		✓	✓	✓	✓
${\beta }_{2}^{i}$	✓	✓	✓		✓	✓	✓	✓
ApEnⁱ	✓	✓	(✓)		✓	✓	✓	✓
Other									✓	✓	✓	✓
#/Dimension	5	6	46	1	6	6	6	12	1	1	1	1

To extract the N inputs for every trajectory in 1D, 2D and 3D, we analysed the vector of the positions along each dimension ${\mathbf{x}}^{i}\equiv ({x}_{1}^{i},{x}_{2}^{i},\dots \enspace {x}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}}^{i})$ (with i = 1, ... d) independently, after first subtracting its average and rescaling its displacements ${w}_{j}^{i}={x}_{j+1}^{i}-{x}_{j}^{i}$ (with $j=1,\dots \enspace \frac{{T}_{\mathrm{max}}}{{\Delta}t}-1$ ) to their standard deviation to obtain the normalised displacements ${\hat{w}}_{j}^{i}$ [35]. Practically, for each dimension, this step produces new time sequences ${\hat{\mathbf{x}}}^{i}\equiv ({\hat{x}}_{1}^{i},{\hat{x}}_{2}^{i},\dots \enspace {\hat{x}}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}}^{i})$ of positions reconstructed from the cumulative sum of the normalised displacements of the original trajectories. Therefore these new time sequences have zero mean ( $\langle {\hat{\mathbf{x}}}^{i}\rangle =\mathbf{0}$ ) and their displacements have a unitary standard deviation. Importantly, this step allows for a better comparison between trajectories of different dimensions, durations and noise levels while keeping the performance of the deep learning algorithm optimal [35].

The first input is ln T_max with T_max being the duration of each trajectory. Independently of the dimensionality d of the trajectory, this input is only defined once, and is useful to enable the deep learning algorithm to properly weight different trajectories based on their duration.

The next subset of the N inputs is then extracted from sixteen mathematical quantities calculated from the new time sequences ${\hat{\mathbf{x}}}^{i}$ to obtain as much statistical information as possible from each individual trajectory. For each dimension i of a trajectory, the quantities in question are (table 1):

The vector of the normalised displacements ${\hat{\mathbf{w}}}^{i}\equiv \left({\hat{w}}_{1}^{i},{\hat{w}}_{2}^{i},\dots \enspace {\hat{w}}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}-1}^{i}\right)$ , where ${\hat{w}}_{j}^{i}={\hat{x}}_{j+1}^{i}-{\hat{x}}_{j}^{i}$ with $j=1,\dots \enspace \frac{{T}_{\mathrm{max}}}{{\Delta}t}-1$ ;
The vector ${\hat{\mathbf{W}}}^{i}\equiv \left(\vert {\hat{w}}_{1}^{i}\vert ,\vert {\hat{w}}_{2}^{i}\vert ,\dots \enspace \vert {\hat{w}}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}-1}^{i}\vert \right)$ of the absolute values of the normalized displacements ${\hat{\mathbf{w}}}^{i}$ ;
The eight vectors ${\mathbf{\Delta }}^{i}(m)\equiv \left({{\Delta}}_{1}^{i}(m),\dots \enspace {{\Delta}}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}-m}^{i}(m)\right)$ of the absolute values ${{\Delta}}_{j}^{i}(m)=\vert {\hat{x}}_{j+m}^{i}-{\hat{x}}_{j}^{i}\vert$ of the displacements ${\hat{x}}_{j+m}^{i}-{\hat{x}}_{j}^{i}$ (with $j=1,\dots \enspace \frac{{T}_{\mathrm{max}}}{{\Delta}t}-m$ ) taken at time lags τ = mΔt with $m=2,3,\dots \frac{{T}_{\mathrm{min}}}{{\Delta}t}-1$ ;
The vector ${\mathbf{r}}^{i}\equiv \left({r}_{1}^{i},{r}_{2}^{i},\dots \enspace {r}_{\frac{{T}_{\mathrm{max}}}{{\Delta}t}-2}^{i}\right)$ of the ratios between consecutive normalised displacements ${r}_{j}^{i}=\frac{{\hat{w}}_{j+1}^{i}}{{\hat{w}}_{j}^{i}}$ with $j=1,\dots \enspace \frac{{T}_{\mathrm{max}}}{{\Delta}t}-2$ ;
The vector ${\mathbf{f}}^{\enspace i}\equiv \left(\vert {\hat{X}}_{1}^{i}\vert ,\vert {\hat{X}}_{2}^{i}\vert ,\dots \enspace \vert {\hat{X}}_{\lceil \frac{{T}_{\mathrm{max}}}{2{\Delta}t}\rceil }^{i}\vert \right)$ of the absolute values of the discrete Fourier transform of ${\hat{\mathbf{w}}}^{i}$ centered at the zero-frequency component and normalised to the trajectory length, ${\hat{\mathbf{X}}}^{i}=\frac{{\Delta}t}{{T}_{\mathrm{max}}}\mathcal{F}\left\{{\hat{\mathbf{w}}}^{i}\right\}$ ;
The vector $\mathbf{I}\mathbf{P}\mathbf{S}{\mathbf{D}}^{i}\equiv \left(\vert \mathrm{I}\mathrm{P}\mathrm{S}{\mathrm{D}}_{1}^{i}\vert ,\vert \mathrm{I}\mathrm{P}\mathrm{S}{\mathrm{D}}_{2}^{i}\vert ,\dots \enspace \vert \mathrm{I}\mathrm{P}\mathrm{S}{\mathrm{D}}_{\lfloor \frac{{T}_{\mathrm{max}}}{\delta {\Delta}t}\rfloor }^{i}\vert \right)$ with ${\mathrm{I}\mathrm{P}\mathrm{S}\mathrm{D}}_{j}^{i}={\Vert}{\hat{\mathbf{X}}}^{i}(j\delta ){{\Vert}}^{2}=\frac{1}{{\delta }^{2}}{\Vert}\mathcal{F}\left\{{\hat{\mathbf{w}}}^{i}(j\delta )\right\}{{\Vert}}^{2}$ $\left(j=1,...\lfloor \frac{{T}_{\mathrm{max}}}{\delta {\Delta}t}\rfloor \right)$ , where ${\hat{\mathbf{w}}}^{i}(j\delta )\equiv ({\hat{w}}_{(j-1)\delta +1}^{i},\dots \enspace {\hat{w}}_{j\delta }^{i})$ are all segments of ${\hat{\mathbf{w}}}^{i}$ selected with a non-overlapping moving window of width δΔt (with δ = 3). Practically, the values ${\mathrm{I}\mathrm{P}\mathrm{S}\mathrm{D}}_{j}^{i}$ correspond to the integrals of the normalised power spectral densities of ${\hat{\mathbf{w}}}^{i}(j\delta )$ . Here, the power spectral densities are estimated as a periodogram and normalised to the length δΔt of the segments ${\hat{\mathbf{w}}}^{i}(j\delta )$ ;
The vector $\mathbf{\Delta }\mathbf{M}\mathbf{S}{\mathbf{D}}^{i}\equiv \left({\Delta}\mathrm{M}\mathrm{S}{\mathrm{D}}_{1}^{i},{\Delta}\mathrm{M}\mathrm{S}{\mathrm{D}}_{2}^{i},\dots \enspace {\Delta}\mathrm{M}\mathrm{S}{\mathrm{D}}_{\lfloor \frac{{T}_{\mathrm{max}}}{2{\Delta}t}\rfloor -2}^{i}\right)$ of the absolute values of the variations ${\Delta}{\mathrm{M}\mathrm{S}\mathrm{D}}_{j}^{i}=\left\vert \frac{{\mathrm{M}\mathrm{S}\mathrm{D}}_{j+1}^{i}-{\mathrm{M}\mathrm{S}\mathrm{D}}_{j}^{i}}{{j}^{2}{\Delta}{t}^{2}}\right\vert$ (with $j=1,\dots \enspace \lfloor \frac{{T}_{\mathrm{max}}}{2{\Delta}t}\rfloor -2$ ) of the time-averaged mean squared displacement (MSD) normalised to the corresponding time lag τ = jΔt of ${\mathrm{M}\mathrm{S}\mathrm{D}}_{j}^{i}$ ;
Two vectors WLⁱ(u) (with u = 1, 2) of the absolute values of the numerical approximation to the first two translations τ = (u − 1)Δt of the continuous wavelet transform of ${\hat{\mathbf{w}}}^{i}$ (calculated with the default settings of the MATLAB^TM function cwt). The continuous wavelet transform is a continuous mathematical function used for the analysis of localised variations in noisy time series; it is calculated by performing the convolution of a signal (here, ${\hat{\mathbf{w}}}^{i}$ ) with a orthonormal family of functions ${\phi }_{u}^{s}(t)=\phi (\frac{t-u}{s})$ obtained by continuously scaling (by s) and translating (by τ) the mother wavelet ϕ(t) (here, the analytic Morse wavelet) [49].

For each of these quantities (but rⁱ) and along each trajectory's dimension i, we calculated the mean (μⁱ), the median ( ${\mu }_{1/2}^{i}$ ), the standard deviation (σⁱ), the skewness ( ${\beta }_{1}^{i}$ ), the kurtosis ( ${\beta }_{2}^{i}$ ) (table 1). Note that the standard deviation for ${\hat{\mathbf{w}}}^{i}$ is always one by definition and hence is not used as an input. Additionally, for each quantity (but rⁱ), we also calculated the approximate entropy (ApEnⁱ), which is a statistical technique used to quantify the amount of regularity and unpredictability of fluctuations in time signals (here calculated numerically with the MATLAB^TM function approximateEntropy) [50]: the closer to zero the value of ApEnⁱ, the more regular and predictable the time series is. For rⁱ, we only calculated the median ${\mu }_{1/2}^{i}$ instead. For every dimension i of the trajectory, based on the aforementioned quantities, we obtain 88 inputs.

To further improve the performance of the deep learning algorithm, the following extra four inputs were also added for each dimension to extract additional information about regular and irregular time patterns in the trajectories (correlations, power variations and abrupt changes):

The integral IACFⁱ of the autocorrelation function ACFⁱ(τ) of ${\hat{\mathbf{w}}}^{i}$ normalised to the trajectory length over a time window of width δΔt (with δ = 3) at the right of the autocorrelation peak at τ = 0, i.e. calculated as $\frac{{\Delta}t}{{T}_{\mathrm{max}}}{\sum }_{k=0}^{2}\mathrm{A}\mathrm{C}\mathrm{F}(k{\Delta}t)$ ;
The level of frequency asymmetry fAsymⁱ in the normalized power spectral density of ${\hat{\mathbf{w}}}^{i}$ calculated as ${\sum }_{k}\vert {\hat{X}}_{k}^{i}{\vert }^{2}\enspace \mathrm{sgn}(k-\frac{{F}_{\mathrm{N}}}{2})$ with F_N being the Nyquist frequency and $k=1,\dots \enspace \lceil \frac{{T}_{\mathrm{max}}}{2{\Delta}t}\rceil$ ;
The level of time asymmetry tAsymⁱ in the time series of all IPSDⁱ calculated as ${\sum }_{j}{\mathrm{I}\mathrm{P}\mathrm{S}\mathrm{D}}_{j}^{i}\enspace \mathrm{sgn}(j-\lceil \frac{{T}_{\mathrm{max}}}{2\delta {\Delta}t}\rceil )$ with $j=1,\dots \enspace \lfloor \frac{{T}_{\mathrm{max}}}{\delta {\Delta}t}\rfloor$
The absence of abrupt changes AbChⁱ in the variance of the elements of ${\hat{\mathbf{x}}}^{i}$ defined as ${\mathrm{A}\mathrm{b}\mathrm{C}\mathrm{h}}^{i}=1-\frac{{N}_{\text{AC}}{\Delta}t}{{T}_{\mathrm{max}}}$ being N_AC the number of points where an abrupt change is identified using the MATLAB^TM function ischange with a high threshold value (threshold = 20) to reduce these instances to points where there is a significant change in signal variance [51].

2.3. CONDOR architecture

In the model classification task (figure 2), the N inputs calculated in the previous step are first fed to a deep three-layer feed-forward neural network (architecture in figure 3) with two sigmoid hidden layers of twenty neurons each and one softmax output layer of five neurons to predict the model $\tilde {M}$ of anomalous diffusion among the five possible categories (attm, ctrw, fbm, lw and sbm). We settled for this specific architecture with two dense layers because it is deep enough to efficiently classify noisy vectors arbitrarily while still avoiding that its learning rate is slowed down by an excessively complex architecture. The output of the previous neural network (step 1 in figure 2) is then refined by two additional networks (steps 2 and 3 in figure 2) with similar architectures (but with three output neurons and two output neurons, respectively) to improve the classification among trajectories respectively classified as attm, fbm and sbm and as attm and ctrw, as these models can be more easily confused. Alternatively, step 1 can be replaced by a binary classifier to extract trajectories classified as lw followed by a second classifier with four outputs to classify the remaining trajectories among attm, ctrw, fbm ans sbm. Steps 2 and 3 are then applied directly to the outputs of this second classifier. While this alternative approach offers a slightly improved performance after step 1 in lower dimensions (≈1%), the overall performance in all dimensions is optimal after step 2 and 3 and is independent of step 1 implementation.

Once the model is identified, the next step is to infer the anomalous diffusion exponent α. The estimated value $\tilde {\alpha }$ is obtained as the arithmetic average of the outputs of two different approaches based on deep learning (figure 2). In the first approach ( ${\tilde {\alpha }}_{1}$ in figure 2), the output $\tilde {M}$ of the classification is used to cluster the trajectories into five categories based on the identified model. Each category is analysed independently by a different set of neural networks (architectures as in figure 3). The neural networks for attm and ctrw have five output categories predicting ${\tilde {\alpha }}_{1}$ in the range 0.05 to 1 [41, 42]. The neural network for lw has five output categories predicting ${\tilde {\alpha }}_{1}$ in the range 1 to 2 [41, 42]. Finally, the prediction for fbm and sbm is based on two levels of neural networks (1 × 2) with a single network in the first level having two equally spaced output categories predicting ${\tilde {\alpha }}_{1}$ in the range 0.05 to 2; each category is then refined by an additional neural network with five equally spaced output categories in the corresponding subrange identified by the network in the first level. In the second approach ( ${\tilde {\alpha }}_{2}$ in figure 2), the output $\tilde {M}$ of the model classification is used as an additional input together with the list of inputs identified in section 2.2. These new set of inputs feed two levels of neural networks (1 × 4), each with the same overall architecture as in figure 3. The only neural network in the first level has four equally spaced output categories predicting ${\tilde {\alpha }}_{2}$ in the range 0.05 to 2; each of this category is then refined by an additional neural network with five equally spaced output categories in the corresponding α subrange identified by the network in the first level. As both approaches ( ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ ) rely on the results of classification, misclassication of trajectories can be detrimental for the inference task. If misclassification is of concern (e.g. in the presence of strong localization noise), a more robust approach for inference is based on using a variation of the second approach alone ( ${\tilde {\alpha }}_{2}$ ) where the model classification is no longer used as additional input.

2.4. Training of CONDOR

After defining their architecture, we trained each network of CONDOR independently on a set of trajectories for which either the ground-truth values of the models (for classification) or of α (for inference) was known. The networks that we used during the AnDi challenge (and whose results are discussed in this manuscript) were typically trained on well-balanced datasets containing trajectories of different length and disrupted by different noise levels (tables S1–S12). Due to how the AnDi challenge was set up, the ground truth information about the models was not available for the datasets used for the inference task in the challenge. We thus used the results from our own classification to train the networks for the inference task, which indeed rely on knowing the model (or its prediction). As our classification method is highly performant, errors due to misclassification are small. Nonetheless, if the ground truth about the model is also available, the networks for inference can also be trained directly based on this information, thus minimizing potential reduction in performance due to misclassification (see section 3.3). Each training was performed using the scaled conjugate gradient backpropagation (scg) algorithm (for speed) on mixed datasets with up to three hundred thousand trajectories from different models and values of α, with different durations and levels of corrupting noise [42]. The key training information is concisely summarised in table 2.

Table 2. Summary of CONDOR architecture and training key information for both classification task and inference task.

Task	Classification	Inference
Architecture	Feedforward neural network	Feedforward neural network
	Two hidden layers (20, 20)	Two hidden layers (20, 20)
	Output layer (2, 3 or 5)	Output layer (2, 4 or 5)
# Networks	3	14
Output	Probability of the input	Probability of the α exponent
Output	Belonging to each model	Belonging to a specific range
Training algorithm	Scaled conjugate	Scaled conjugate
Training algorithm	Gradient backpropagation	Gradient backpropagation
Training dataset size	300k trajectories	300k trajectories
Training dataset size	(see tables S1 and S2)	(see tables S3–S12)
Training dataset usage	70% training	70% training
	15% validation	15% validation
	15% test	15% test
Performance function	Cross-entropy	Cross-entropy
Validation fails	100	100

For training the networks used for classification, we typically used well-balanced sets of N_t trajectories extracted from a dataset of 300k trajectories per dimension (≈60k trajectories per model per dimension). While the training of the network for step 1 (figures 2 and 4) requires a balanced combination of trajectories of all models (tables S1 and S2), the networks for steps 2 and 3 are trained on subsets of the full training dataset, which only contains trajectories of the relevant models (tables S1 and S2): hence, for step 2, we only used trajectories belonging to attm, fbm or sbm (i.e. up to ≈180k trajectories per dimension); and for step 3, we only used trajectories belonging to attm and ctrw (i.e. up to ≈120k trajectories per dimension). Similarly, for training the networks used for inference, we typically used well-balanced sets of N_t trajectories extracted from a dataset of 300k trajectories per dimension (≈75k trajectories per dimension for each of the following α ranges: $\alpha \in \left(0,0.5\right]$ , $\alpha \in \left(0.5,1\right]$ , $\alpha \in \left(1,1.5\right]$ and $\alpha \in \left(1.5,2\right]$ ). For the first approach ( ${\tilde {\alpha }}_{1}$ in figures 2 and 4), the networks for each model are trained using a subset of the full dataset containing the trajectories that have been classified as the corresponding model by CONDOR. Each network is trained using only those trajectories classified in the corresponding model and with all possible α values returned by the specific network (i.e. allowed by the corresponding model, section 2.3) (tables S3–S10). In the case of attm, ctrw and lw, misclassified trajectories can have a ground truth value of α that falls outside the range of values allowed by the specific model. To account for this during the training of the networks associated to these three models, we assign these instances to the closest output α category (i.e. the category that minimises the error in estimating α). For the second approach ( ${\tilde {\alpha }}_{2}$ in figures 2 and 4), each network is trained purely using trajectories selected based on their α value from the full training dataset, for which the ground truth value of α is known. In particular, each network is trained using those trajectories with an α value falling among all possible output α categories returned by the specific network (tables S11 and S12).

**Figure 4.** Evolution of CONDOR performance with the size of the training dataset. (a)–(c) F₁ score as a function of the overall number of trajectories (N_t) (i.e. including fractions for training, validation and testing) employed to train the three consecutive networks of the classification task (figure 2). The N_t trajectories are a subset of the total 1D, 2D or 3D training datasets for classification of 300k trajectories each and the F₁ score is calculated only on the test fraction (15% of trajectories). N_t is normalized to the size of a single AnDi challenge dataset (N_c = 10k trajectories). The F₁ score is given at the output of (a) the first network, (b) the second network and (c) the third network used to refine the model predictions in the classification task (figure 2). At each of these steps, only the best performing network (black squares) is carried forward to the next step. (d) and (e) MAE as a function of overall number of trajectories N_t/N_c (i.e. including fractions for training, validation and testing) for (d) ${\tilde {\alpha }}_{1}$ (dashed lines) and ${\tilde {\alpha }}_{2}$ (dotted lines) and (e) for $\tilde {\alpha }$ (solid lines) as defined in figure 2. The N_t trajectories are a subset of the total 1D, 2D or 3D training datasets for inference of 300k trajectories each and the MAE is calculated only on the test fraction (15% of trajectories). For each N_t/N_c, the arithmetic average $\tilde {\alpha }$ is calculated using the best performing networks used to infer ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ with the same number of training trajectories. The networks with the lowest MAE for ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ are also shown in (d) (black circles). In all panels, each training point was obtained by incrementally increasing the previous set of N_t trajectories used for training with an additional well-balanced set of 10k trajectories from the full dataset. The shaded areas correspond to one standard deviation around the mean values (calculated from the performance of 5 distinct training attempts).
Download figure:
Standard image High-resolution image

The sets of N_t trajectories used for training each data point in figure 4 were randomly split among training (70%), validation (15%) and test (15%) trajectories. Overall, training with scg is very efficient on a standard desktop computer (processor Intel(R) Core(TM) i7-4790 CPU @ 3.60 GHz), being the size of the training dataset the main limiting factor. Training the three consecutive networks used for classification (figure 2) took at most 1 h and a half, while training the fourteen networks for the inference task took at most 3 h. Training time can be reduced by up to 2 orders of magnitude employing a GPU. After training, CONDOR is very computationally efficient on new data. For example, CONDOR can classify ≈80k trajectories/s on a standard desktop computer.

Figure 4 shows the performance of the classification task and of the inference task with the size of the training dataset. We evaluated the performance of the classification with the F₁ score:

$\begin{equation*}{F}_{\mathrm{1}}=\frac{\mathrm{T}\mathrm{P}}{\mathrm{T}\mathrm{P}+\frac{1}{2}(\mathrm{F}\mathrm{P}+\mathrm{F}\mathrm{N})},\end{equation*}$

where TP, FP and FN are the true positive, false positive and false negative rates, respectively. In particular, we computed the micro average of the F₁ score with the Scikit-learn Python's library [52]. F₁ can vary between 0 (no trajectory classified correctly) and 1 (all trajectories classified correctly). For balanced datasets, as those of the AnDi challenge (i.e. where each class is equally represented), the micro average of the F₁-score coincides with the accuracy obtained as TP/N_traj, where N_traj is the number of trajectories in the dataset. To evaluate the performance of the inference of the α-exponent, we used the mean absolute error (MAE) instead:

$\begin{equation*}\mathrm{M}\mathrm{A}\mathrm{E}=\frac{1}{{N}_{\text{traj}}}\sum\limits _{j=1}^{{N}_{\text{traj}}}\vert \tilde {{\alpha }_{j}}-{\alpha }_{j}\vert ,\end{equation*}$

where ${\tilde {\alpha }}_{j}$ and α_j are the predicted and the ground truth values of the anomalous diffusion coefficient of the jth trajectory, respectively. MAE has a lower limit of 0, meaning that the coefficients of all the N_traj trajectories have been predicted correctly.

Figures 4(a)–(c) shows the F₁ score measured on the test fraction (15% of trajectories) of the training dataset at the output of the three consecutive networks used for classification (figure 2) for the 1D, 2D and 3D cases. For each step, we increased the number of trajectories used for training progressively until the F₁ score reached near saturation. Past this point any gain in F₁ score is relatively small and by far outbalanced by the increased computational cost associated to longer training times. The performance improves with the dimension d of the trajectories and, interestingly, at each step, saturation of the F₁ score for higher d appears to take place with smaller training datasets, as each trajectory contains intrinsically d times more information. Only the best performing network at each step is selected (black squares in figures 4(a)–(c)) and carried forward to the next step (the details on how these networks were trained are provided in tables 2, S1–S12). Independently of the dimensionality of the problem, the first network produces the highest increase of F₁ score with the size of the training dataset (figure 4(a)): for example, going from 10k to 100k trajectories, the F₁ score increases approximately by 4% in 1D, 5% in 2D and 6% in 3D. The combined contribution of the following two networks to the overall F₁ score is less pronounced, especially for higher dimensions (figures 4(b) and (c)); nonetheless they play a key role in refining the classification among attm, fbm and sbm (figure 4(b)) or among attm and ctrw (figure 4(c)), thus boosting the final F₁ score approximately by 4% in 1D, 3% in 2D and 1% in 3D.

Similarly, figures 4(d) and (e) shows the MAE obtained on the test fraction (15% of trajectories) of the training dataset for the inference of the α-exponent according to ${\tilde {\alpha }}_{1}$ , ${\tilde {\alpha }}_{2}$ and their arithmetic average $\tilde {\alpha }$ (as defined in figure 2) for the 1D, 2D and 3D cases. As for F₁, we report the MAE of each method as a function of the size of the training dataset progressively until it reaches near saturation. In general, the MAE lowers with the number of trajectories used for training and with their dimension d. As noted earlier, independently of the dimensionality of the problem, the arithmetic average $\tilde {\alpha }$ provides a better estimate of the α exponent with respect to both ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ , leading to an overall MAE at saturation that is better than those obtained with the individual methods by at least 0.01 in all the dimensions.

As can be seen in tables S2, S7–S10, S12 and in figure 4, the optimal size of the specific training subset varies among the different networks of CONDOR. Increasing this size past the optimal point can lead to overlearning the training dataset and has a detrimental effect on CONDOR performance. For instance, this can be clearly appreciated in the training of the second classification step (figure 4(b), step 2).

2.5. CONDOR performance vs number of inputs

In general, the performance of each network of CONDOR increases with the number of inputs used: namely, when very few inputs are used (approximately up to N/3), adding an extra input causes a big relative change in performance; when several inputs are already in use (approximately above 2N/3), the relative change in performance obtained by adding any extra input is relatively smaller as CONDOR reaches a near-saturation performance value. On average, we observed that this optimal performance is always obtained when the full set of inputs is used. This near-saturation value is reached faster in higher dimensions as CONDOR has more statistical information available to learn from (i.e. removing an input is more detrimental in lower dimensions). Interestingly, different subsets of inputs can ultimately provide a similar performance (although not optimal). This means that different inputs can convey similar statistical information that CONDOR learns to extract and distinguish. The presence of some repeated information is not detrimental to the optimal performance achieved by CONDOR and, indeed, helps CONDOR convergence to the same performance each time the training of its networks is repeated; when a smaller subset of inputs is used instead, different trainings can lead to a higher variability in the final performance after training and training of CONDOR should be repeated for best results.

2.6. Segmentation with CONDOR

The output of the classification and the inference can also be used to identify possible change points in a trajectory. In practice, a trajectory of length T_max is divided in T_max − B + 1 smaller segments using an overlapping moving window of width B (with B ≪ T_max). The width of the moving window is a trade-off between the possibility of identifying change points close to either end of a trajectory (which increases for smaller B) and the accuracy in predicting the correct time of the change point (which increase with B due to smaller errors in predicting the model and the α exponent associated to each segment). Each of these segments is then fed to CONDOR for classification and inference, thus generating two time series of T_max − B + 1 values of the predicted model and α exponent. The occurrence of abrupt changes (up to two) in the root-mean-square (rms) level of these time series is used to identify the point of change of the model and/or of the α exponent [51]. To identify abrupt changes in the algorithm for the segmentation of trajectories, we used the MATLAB^TM function findchangepts to simultaneously detect significant changes in rms level of the two-dimensional vector containing the time series of predicted models and α exponents. We set the function findchangepts to detect at most two significant changes: if no abrupt change is identified, we set the change point at the time instant corresponding to the center of the first moving window; if just one abrupt change is identified, we set the change point for the corresponding trajectory at the time instant where it was identified; if two abrupt changes are identified, we set the change point half-way between the two time instants where they were identified. This approach unavoidably introduces errors when the actual change point in the trajectory appears for t ⩽ B/2 and t ⩾ T_max − B/2. To minimise the impact associated to this uncertainty in the attribution of the change point near the beginning and the end of the trajectory, each trajectory where no change point is identified with a given width B can be reanalysed with windows of increasingly smaller width at the expenses of a lower performance in the determination of the model and the α exponent (due to the use of shorter segments for classification and inference).

To evaluate CONDOR's performance in identifying the presence of a change point, we calculated the root mean square error of the change point localization as in the AnDi challenge [41]:

$\begin{equation*}\mathrm{R}\mathrm{M}\mathrm{S}\mathrm{E}=\sqrt{\frac{1}{{N}_{\text{traj}}}\sum\limits _{j=1}^{{N}_{\text{traj}}}{({\tilde {t}}_{j}-{t}_{j})}^{2}},\end{equation*}$

where ${\tilde {t}}_{j}$ is the prediction and t_j the ground truth value of the change point, respectively, of a single trajectory in a dataset of N_traj trajectories. Similar to the MAE, the root mean square error (RMSE) has a lower limit of 0 when all the change points of the N_traj trajectories are correctly identified.

2.7. CONDOR codes

We have implemented all neural networks using the patternnet function in the MATLAB^TM-based Deep Learning Toolbox^TM. We provide the MATLAB^TM implementations of the key functionalities of CONDOR [48]. Nonetheless, our approach is independent of the deep-learning framework used for its implementation. Moreover, it is also possible to exchange neural network models between the Deep Learning Toolbox^TM and other platforms, such as those based on TensorFlow^TM.

3. Results

3.1. CONDOR overall performance on the datasets of the AnDi challenge

The best performing networks identified during the training step (black symbols in figure 4) were carried forward and used in the prediction step, where we classified the anomalous diffusion model and inferred the α exponent of the same datasets used in the AnDi challenge (with 10k trajectories each) for testing purposes [41]. Figure 5(a) shows the overall F₁ score obtained on these test databases: F₁ = 0.844, F₁ = 0.879 and F₁ = 0.913 for the 1D, 2D and 3D cases, respectively. Figure 5(b) shows the overall MAE instead: MAE = 0.161, MAE = 0.141 and MAE = 0.121 for the 1D, 2D and 3D cases, respectively. These values enabled CONDOR to be one of the top performant methods in the AnDi challenge (https://andi-challenge.org) [40].

**Figure 5.** CONDOR performance on test datasets. (a) F₁ and (b) MAE obtained using CONDOR on the 1D, 2D and 3D datasets used in the classification and inference tasks of the AnDi challenge (10k trajectories each). In (a), the stacked bar charts shows the contribution of the three consecutive networks used for classification (figure 2) to the total F₁ score. In (b), the grouped bar charts shows the MAE associated to the inference of the α-exponent according to ${\tilde {\alpha }}_{1}$ , ${\tilde {\alpha }}_{2}$ and their arithmetic average $\tilde {\alpha }$ , which provides the best estimate of the α exponent (figure 2).
Download figure:
Standard image High-resolution image

3.2. Analysis of CONDOR performance in the model classification task

Figure 6 provides further detail on CONDOR performance in the classification task by value of α (figures 6(a)–(c)), by model (figure 6(d)), by strength of the localization noise (figure 6(e)) and by trajectory length (figures 6(f)–(h)). These data largely support CONDOR robustness of performance across the parameter space.

CONDOR is able to predict the correct model relatively consistently across the values of α, with the exception of α ≃ 0.05 in 1D and 2D and of α ≃ 1 for all dimensions (figures 6(a)–(c)). This is not surprising. In fact, for α ≃ 0.05 all trajectories are close to stationary and show a very low degree of motion, which could be dominated by the localization noise. Therefore, these trajectories are harder to classify, particularly in lower dimensions where a smaller amount of information is intrinsically available. Similarly, for α ≃ 1 all models converge to Brownian motion, making it much more challenging to distinguish among them. Interestingly, in 1D and 2D, CONDOR shows an above-than-average performance in classifying superdiffusive trajectories, thus compensating what lost in the classification of subdiffusive trajectories.

Consistently across the dimensions (1D, 2D and 3D), CONDOR recognises two models (ctrw and lw) extremely well with F₁ ⩾ 0.94 for ctrw and F₁ ⩾ 0.98 for lw (figure 6(d)). This is understandable as both models show typical features that stand them apart from the remaining models and are relatively robust to the presence of localization noise (figure 1): i.e. long ballistic steps in lw trajectories and long waiting times in ctrw trajectories [1]. For the remaining three models (attm, fbm and sbm), CONDOR performs progressively better moving from 1D to 3D. As noticed earlier (figure 1), the distinctive features among these models are more subtle and can be better evaluated for higher dimensions as they are intrinsically richer in information content. While CONDOR performs equally well with fbm and sbm (F₁ ≈ 0.81–0.82 in 1D, F₁ ≈ 0.85–0.86 in 2D and F₁ ≈ 0.87–0.89 in 3D), it struggles more with attm as this model can be more easily confused with others (figure 1). Nonetheless, CONDOR performance increases significantly with the dimension of the attm trajectories from F₁ ≈ 0.66 in 1D to F₁ ≈ 0.88 in 3D.

The performance of CONDOR with different values of SNR gives the expected results (figure 6(e)). The F₁ score increases for higher values of SNR, reaching F₁ scores of 0.89, 0.92 and 0.95 for 1D, 2D and 3D, respectively (i.e. well above the overall F₁ score in figure 5(a)). However, even when the noise overcomes the signal (SNR ⩽ 1), the lost in the classification performance is less than 10% of these high values for the level of noise considered here.

Finally, figures 6(f)–(h) shows CONDOR classification performance with the duration of the trajectory. Due to the richer information content, classification is increasingly better for longer trajectories and for higher dimensions. The trends in figures 6(f)–(h) are well fitted by power law functions (dashed lines), which show how, above a certain threshold in duration, CONDOR performs above the average F₁ scores (delimited by the shaded areas), reaching F₁ values of up to 0.96 in 1D, 0.97 in 2D and 0.99 in 3D. Moreover, this threshold decreases when increasing the dimension, from ≈340 time points in 1D to ≈280 in 3D. Note that, even though we did not perform a length specific training for our submissions to the AnDi challenge, by directly training CONDOR neural networks for classification only with short trajectories (⩽ 50 time steps), its performance in classifying them can be boosted. This possibility is particularly useful for the classification of experimental trajectories that often are short or are interrupted by missing data points [37, 53]. For example, we were able to improve the F₁ score on short trajectories (⩽ 30 time steps) in figures 6(f)–(h) by ≈15% on average.

3.3. Analysis of CONDOR performance in the α-exponent inference task

Figure 7 provides further detail on CONDOR performance in the α-exponent interference task by value of α (figures 7(a)–(c)), by model (figure 7(d)), by strength of the localization noise (figure 7(e)) and by trajectory length (figures 5(f)–(h)). These data largely confirm the observations on CONDOR performance for the classification task, thus confirming its robustness across the parameter space.

In figures 7(a)–(c), we can see how CONDOR performs in a relatively consistent way across the α values, with the exception of α = 0.05 in the 1D and 2D cases (MAE = 0.239 in 1D and MAE = 0.302 in 2D). This is primarily due to a relatively large proportion of 1D and 2D attm trajectories with α = 0.05 being assigned to other α values: for example, for these trajectories, ≈10% of the predicted values of α is greater than 1 (against only the 0.03% of the 3D attm trajectories with α = 0.05). As in the classification case, being these subdiffusive trajectories almost stationary, the inference is largely influenced by the localization noise.

As already noted for the model classification (figure 6(a)), CONDOR can also infer the α-exponent very well and consistently across the dimensions for two models (ctrw and lw) with MAE ⩽ 0.128 for ctrw and MAE ⩽ 0.141 for lw (figure 7(d)). For the three remaining models (attm, fbm and sbm), CONDOR again performs progressively better moving from 1D to 3D. The best results across the dimensions are obtained for the fbm (MAE ≈ 0.133 in 1D, MAE ≈ 0.100 in 2D and MAE ≈ 0.083 in 3D), followed by sbm (MAE ≈ 0.173 in 1D, MAE ≈ 0.134 in 2D and MAE ≈ 0.107 in 3D). Once again, as in the classification task, CONDOR struggles more with attm (MAE ≈ 0.274 in 1D, MAE ≈ 0.269 in 2D and MAE ≈ 0.201 in 3D).

Figure 7(e) confirms what has been discussed for the classification regarding CONDOR performance with different SNR values. For high SNRs, the MAE values for each dimension are well below the average MAE (figure 5(b)): MAE = 0.136 in 1D, MAE = 0.121 in 2D and MAE = 0.112 in 3D. As expected, the MAE increases with higher levels of noise in all dimensions, but it is still below ≈0.2 for the values of noise considered here.

Figures 7(f)–(h) shows CONDOR performance in inferring the α exponent with the duration of the trajectory. Similar to the classification task, CONDOR performance improves for higher dimensions and for longer trajectories as the MAE becomes increasingly lower in agreement with prior observations [35, 37]. The trends in figures 7(f)–(h) are well fitted by power law functions (dashed lines), which show how, above a certain threshold in duration, CONDOR performance is significantly below the average MAE (delimited by the shaded areas), reaching values as small as 0.083 in 1D, 0.085 in 2D and 0.065 in 3D. As for the classification task, this threshold decreases when increasing the dimension, from 350 time points in 1D to 300 in 3D. Also for the inference task, it is possible to improve the prediction of the value of α for short trajectories (⩽ 40 time steps) by training CONDOR neural networks only using short trajectories (⩽ 50 time steps). Differently from the classification task, training CONDOR's networks for both classification and inference on short trajectories (which we did not perform as part of our submissions to the AnDi challenge) does not significantly reduce the error on the inference of the α exponent. The main limiting factor is indeed the amount of information that can be extracted from short trajectories with the statistical observables used as inputs, which CONDOR is already performing at its best with networks trained on mixed length trajectories.

Finally, it is important to note that CONDOR's inference performance relies on the performance of the classification task. Indeed, the inference of the α exponent uses the predicted model either as a selection element or as an additional input, so the misclassification of trajectories can be directly detrimental for the inference task too. To test the robustness to misclassification more quantitatively, we computed the value of the MAE obtained with both inference methods ( ${\tilde {\alpha }}_{1}$ and ${\tilde {\alpha }}_{2}$ in figure 2) and their average ( $\tilde {\alpha }$ ) in the best-case scenario (all trajectories are classified correctly) and in the worst-case scenario (all trajectories are misclassified in any possible model but the correct one) using the ground truth information about the model in the AnDi challenge dataset for the inference task. For reference, we also compared our results with the MAE calculated assigning to each trajectory a random α value drawn from a uniform distribution (over 5 different attempts). Table 3 shows the correlation between CONDOR's inference performance and the misclassification issue. A perfect classification (MAE_Best) allows us to slightly reduce the overall MAE reported in the AnDi challenge (MAE_Chal as in figure 5(b)). This is further confirmation of the fact that CONDOR is highly performant in classifying anomalous diffusion trajectories and, consequently, in inferring their anomalous exponent. In the case of good classification, averaging between the two methods is also likely to help reduce the impact of potential misclassifications. Nonetheless, when misclassification becomes more of an issue (e.g. with trajectories heavily corrupted by localization noise), the overall performance of the inference starts to deteriorate. Interestingly, this trend is primarily driven by the first method ( ${\tilde {\alpha }}_{1}$ , where the classification outcome is a selection parameter for the choice of the network), while the second inference method ( ${\tilde {\alpha }}_{2}$ , where the classification outcome is an additional input rather than a selection parameter) is much more robust to classification errors and becomes, on its own, a better choice than the average of the two methods. An even better choice when misclassification is of serious concern is to use a revised version of the second inference method where the classification is no longer used as an additional input (MAE_Mod): this revised method (fully independent of the classification step) achieves a performance which is comparable to the performance obtained during the AnDi challenge with the first approach ( ${\tilde {\alpha }}_{1}$ ) independently. Under all circumstances, CONDOR performs substantially better than assigning the α exponent at random (MAE_Rand).

Table 3. Comparison of the mean absolute errors (MAE) obtained with different approaches to the inference task on the AnDi challenge dataset: MAE_Chal corresponds to the standard outcome of CONDOR (figure 5(b)), MAE_Best and MAE_Worst represent the outcomes of CONDOR when all trajectories are either classified correctly or completely misclassified, MAE_Mod is obtained from the second approach to the inference task ( ${\tilde {\alpha }}_{2}$ in figure 2) when the predicted model is not added to the inputs for the corresponding neural networks. Finally, MAE_Rand is obtained assigning to each trajectory a random value of α uniformly; this value is an average over 5 different attempts (the standard deviation is also provided).

		MAE_Chal	MAE_Best	MAE_Worst	MAE_Mod	MAE_Rand
1D	${\tilde {\alpha }}_{1}$	0.171	0.159	0.467	0.176	0.665 ± 0.003
	${\tilde {\alpha }}_{2}$	0.170	0.159	0.316
	$\tilde {\alpha }$	0.161	0.149	0.391
2D	${\tilde {\alpha }}_{1}$	0.153	0.142	0.496	0.151	0.662 ± 0.002
	${\tilde {\alpha }}_{2}$	0.147	0.143	0.257
	$\tilde {\alpha }$	0.141	0.133	0.377
3D	${\tilde {\alpha }}_{1}$	0.131	0.127	0.531	0.132	0.666 ± 0.003
	${\tilde {\alpha }}_{2}$	0.127	0.125	0.245
	$\tilde {\alpha }$	0.121	0.117	0.388

4. Expanding CONDOR to the segmentation of trajectories

We tested our algorithm for the segmentation of trajectories on the same datasets used in the AnDi challenge for the segmentation task [42]. For each dimension, a dataset is composed of 10k trajectories 200 time steps long [41]. Similar to the datasets for the classification and the inference tasks, these datasets are also corrupted by Gaussian noise corresponding to a finite localization precision. Moreover, in these datasets, at least one parameter between the model and the α-exponent changes at a random time t_i within the duration of each trajectory. To predict the change point, we used a moving window of width B₁ = 50 at first as explained in section 2.6. We then used two moving windows of width B₂ = 40 and B₃ = 20 sequentially to refine the prediction of the change point in trajectories where the previous moving window did not detect a change. As our segmentation algorithm is based on using short segments, we retrained CONDOR's networks for classification and inference using only short trajectories (⩽ 50 data points) for an improved segmentation performance. The results of our segmentation are reported in figure 8. As shown in figure 8(a), the error in the determination of the change point is strongly influenced by its location in time, reaching its smallest value when the change point is located nearer the center of each trajectory: whilst the overall RMSE for 1D, 2D and 3D trajectories are 53.45, 51.81 and 50.54 respectively, the RMSE reaches average values of 26.00 ± 0.02 in 1D, 24.20 ± 0.04 in 2D and 23.01 ± 0.37 in 3D when the change point is located between 80 and 120 time steps. Interestingly, when the change point location can be determined using larger windows of width B₁ (i.e. for 25 < t_i < 175), even the overall RMSE is significantly reduced (RMSE equal to 39.38 in 1D, 38.41 in 2D and 36.75 in 3D, figure 8(a)), due to CONDOR's better performance in predicting the model and the α exponent when more time points are available. Once the position of the change point is known, CONDOR can be used to classify the model and infer the α exponent for each segment of the trajectory determined by its segmentation. We either used the networks identified in figure 4 for long segments (T_max > 40) or those trained on short trajectories for short segments (T_max ⩽ 40). The overall F₁ score (i.e. calculated over all possible segment lengths) is 0.533 in 1D, 0.534 in 2D, and 0.572 in 3D, while the overall MAE is 0.372 in 1D, 0.381 in 2D, and 0.371 in 3D. When looking at the F₁ score and the MAE as a function of segment length, figures 8(b) and (c) confirms how CONDOR performance in classifying the model and inferring the α exponent increases with the length of the trajectory, which is here determined by the location of the change point in time for each identified trajectory segment. The extra uncertainty introduced by the localization of the change point however leads to slightly lower F₁ and slightly higher MAE values (for both their overall and length-specific values) across the dimensions with respect to those in figures 6(f)–(h) and 7(f)–(h) for trajectories of the same length. When compared against the other methods submitted to the AnDi challenge for the segmentation task (and using the same metric) [40], our method is always outperformed by the best method for segmentation in each dimension. Nonetheless, our method reaches an overall satisfactory performance comparable with the other submissions beyond the top-ranking one. Indeed, our method would have ranked 2nd in 3D (out of 4) and 4th in 1D and 2D (out of 5). Importantly, considering that the segmentation task of the AnDi challenge was the least subscribed, our method adds a further available tool to study the segmentation of anomalous diffusion trajectories, and could be useful for a preliminary assessment of the presence of segmented trajectories in real data based on CONDOR's outputs.

**Figure 8.** Detailed analysis of CONDOR performance for the segmentation task. CONDOR segmentation results on the 1D, 2D and 3D datasets of the AnDi challenge used for the segmentation task (10k trajectories each). (a) CONDOR segmentation performance is evaluated by the root mean square error (RMSE) of the change point localization as a function of the ground truth value of the change point (t_i). The dotted lines represent the overall RMSE for each dimension when using moving windows of width B₁ = 50, B₂ = 40 and B₃ = 20 sequentially (see text). The dash-dotted lines represent the average RMSE for each dimension on all change points identified only with a moving window of width B₁ (i.e. for 25 < t_i < 175). (b) Model classification and (c) α-exponent inference with CONDOR on each of the two trajectory segments identified through segmentation. These results are reported by (b) F₁ score and (c) MAE, respectively, for the first (solid lines) and the second (dashed lines) trajectory segment as a function of t_i. The overall F₁ score (i.e. calculated over all possible segment lengths) is 0.533 in 1D, 0.534 in 2D, and 0.572 in 3D, respectively. The overall MAE is 0.372 in 1D, 0.381 in 2D, and 0.371 in 3D, respectively.
Download figure:
Standard image High-resolution image

5. Conclusion

In conclusion, we have introduced a new method (CONDOR) for the characterisation of single anomalous diffusion trajectories in response to the AnDi challenge [40, 41]. Our method combines tools from classical statistics analysis with deep learning techniques. CONDOR can identify the anomalous diffusion model and its exponent with high accuracy, in relatively short times and without the need of any a priori information. Moreover, CONDOR performance is robust to the addition of localization noise. As a consequence, CONDOR was among the top performant methods in the AnDi challenge (https://andi-challenge.org), performing progressively better for higher dimension trajectories. A deeper understanding of the relative relevance of different inputs for the deep learning architecture can be gained by employing advanced interpretability algorithms [54]. This information could be important to optimise the performance of CONDOR as a function of the most relevant input parameters, and, in the future, to develop an in-depth understanding of how the various statistical observables are connected to each model being classified and each α value being inferred. At the expenses of an increased computational cost, performance can be further improved by expanding the set of inputs fed to the deep learning algorithm and by employing more advanced network architectures (e.g. convolutional neural networks or recurrent neural networks [55]). While most advanced machine learning techniques often operate as black boxes, we envisage that CONDOR, being partially based on classical statistics analysis, can help characterise the underlying diffusion processes with greater physical insight in many real-life phenomena, from molecular encounters and cellular signalling, to search strategies and the spread of diseases, to trends in financial markets and climate records.

Acknowledgments

The authors would like to thank the organisers of the AnDi challenge for providing a stimulating scientific environment around anomalous diffusion and its characterization. The authors acknowledge sponsorship for this work by the US Office of Naval Research Global (Award No. N62909-18-1-2170).

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Characterization of anomalous diffusion classical statistics powered by deep learning (CONDOR)

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. CONDOR workflow

2.2. Feature engineering and CONDOR input definition

2.3. CONDOR architecture

2.4. Training of CONDOR

2.5. CONDOR performance vs number of inputs

2.6. Segmentation with CONDOR

2.7. CONDOR codes

3. Results

3.1. CONDOR overall performance on the datasets of the AnDi challenge

3.2. Analysis of CONDOR performance in the model classification task

3.3. Analysis of CONDOR performance in the α-exponent inference task

4. Expanding CONDOR to the segmentation of trajectories

5. Conclusion

Acknowledgments

Data availability statement

Characterization of anomalous diffusion classical statistics powered by deep learning (CONDOR)

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. CONDOR workflow

2.2. Feature engineering and CONDOR input definition

2.3. CONDOR architecture

2.4. Training of CONDOR

2.5. CONDOR performance vs number of inputs

2.6. Segmentation with CONDOR

2.7. CONDOR codes

3. Results

3.1. CONDOR overall performance on the datasets of the AnDi challenge

3.2. Analysis of CONDOR performance in the model classification task

3.3. Analysis of CONDOR performance in the α-exponent inference task

4. Expanding CONDOR to the segmentation of trajectories

5. Conclusion

Acknowledgments

Data availability statement