A Spatio-Temporal Model and Inference Tools for Longitudinal Count Data on Multicolor Cell Growth

PuXue Qiao; Christina Mølck; Davide Ferrari; Frédéric Hollande

doi:10.1515/ijb-2018-0008

Publicly Available Published by De Gruyter July 7, 2018

A Spatio-Temporal Model and Inference Tools for Longitudinal Count Data on Multicolor Cell Growth

PuXue Qiao , Christina Mølck , Davide Ferrari and Frédéric Hollande

From the journal The International Journal of Biostatistics

https://doi.org/10.1515/ijb-2018-0008

Abstract

Multicolor cell spatio-temporal image data have become important to investigate organ development and regeneration, malignant growth or immune responses by tracking different cell types both in vivo and in vitro. Statistical modeling of image data from common longitudinal cell experiments poses significant challenges due to the presence of complex spatio-temporal interactions between different cell types and difficulties related to measurement of single cell trajectories. Current analysis methods focus mainly on univariate cases, often not considering the spatio-temporal effects affecting cell growth between different cell populations. In this paper, we propose a conditional spatial autoregressive model to describe multivariate count cell data on the lattice, and develop inference tools. The proposed methodology is computationally tractable and enables researchers to estimate a complete statistical model of multicolor cell growth. Our methodology is applied on real experimental data where we investigate how interactions between cancer cells and fibroblasts affect their growth, which are normally present in the tumor microenvironment. We also compare the performance of our methodology to the multivariate conditional autoregressive (MCAR) model in both simulations and real data applications.

Keywords: spatio-temporal lattice model; count data; multicolor cell growth

1 Introduction

Longitudinal image data based on fluorescent proteins play a crucial role for both in vivo and in vitro analysis of various biological processes such as gene expression and cell lineage fate. Assessing the growth patterns of different cell types within a heterogeneous population and monitoring their interactions enables biomedical researchers to determine the role of different cell types in important biological processes such as organ development and regeneration, malignant growth or immune responses under various experimental conditions. For example, tumor progression has been shown to be affected by bidirectional interactions among cancer cells or between cancer cells and cells from the microenvironment, including tumor-infiltrating immune cells [1]. Being able to study these interactions in a laboratory setting is therefore highly relevant, but is complicated by the difficulty of dissecting the effect of the different cell types as soon as the number of cell types exceeds two. In the present study we used longitudinal image data collected from multicolor live-cell imaging growth experiments of co-cultures of cancer cells and fibroblasts (a key cell type in the tumor microenvironment) as well as behaviourally distinct (cloned) cancer cells. Using a high-content imaging system, we were able to acquire characteristics for each individual cell at subsequent times, including fluorescent properties, spatial coordinates, and morphological features. The motivation of this work was to design a model allowing the determination of spatio-temporal growth interactions between these multiple cell populations.

In longitudinal growth experiments, the two important goals are to determine growth rates for different cell populations and to assess how interactions between cell types may affect their growth. Whilst a wide range of descriptive data analysis approaches have been used in applications, inference based on a comprehensive model of multicolor cell data is an open research area. The main challenges are related to the presence of complicated spatio-temporal interactions amongst cells and difficulties related to tracking individual cells across time from image data. Typical longitudinal experiments consist of a relatively small number of measurements (e.g. 5 to 20 images taken every few hours), which is adequate for monitoring cell growth. Tracking individual cells would typically require more frequent measurements, complicating the practicality of the experiments in terms of the storage cost of very large image files and the cytotoxicity induced by the imaging process.

Although tracking individual cell trajectories is difficult due to cell migration, overlapping cells, changes in cell morphology, image artifacts, cell death and division, obtaining cell counts by cell type (represented by a certain color) is straightforward and can be easily automated. To describe the spatial distribution for different cell types, we propose to divide an image into a number of contiguous regions (tiles) to form a regular lattice structure as shown in Figure 1(a). We then record the frequency of cells of different colors in each tile at subsequent time points, and based on which we model the spatial and temporal dependencies of the cell growth.

$Figure 1: (a) Microscope images for the cancer cell growth data obtained from a high-content imager (Operetta, Perkin Elmer) at the initial and final time points of the experiment. In each image, colors for non-fluorescent fibroblasts, as well as red and green fluorescent cancer cells are merged. (b) Illustration of the local structure for the model in (1). The two planes correspond to 3×3$3 \times 3$ tiles at times t$t$ and t+1$t+1$. The average number of cells of color c$c$ in a given tile at time t+1$t+1$ is assumed to depend on the number of cells of other colors in contiguous neighboring tiles at time t$t$.$

Figure 1:

(a) Microscope images for the cancer cell growth data obtained from a high-content imager (Operetta, Perkin Elmer) at the initial and final time points of the experiment. In each image, colors for non-fluorescent fibroblasts, as well as red and green fluorescent cancer cells are merged. (b) Illustration of the local structure for the model in (1). The two planes correspond to 3×3 tiles at times t and t+1. The average number of cells of color c in a given tile at time t+1 is assumed to depend on the number of cells of other colors in contiguous neighboring tiles at time t.

To model spatio-temporal data, one could choose to approximate the spatio-temporal process by a spatial process of time series, that is, to view the process as a multivariate spatial process where the multivariate dependencies are inherited from temporal dependencies. In other words, it can be seen as a temporal extension of spatial processes.

The most popular way of developing a spatial process is through the conditionally auto-regressive (CAR) model proposed by Besag [2]. Waller, Carlin, Xia, and Gelfand [3] extend the CAR model into a spatio-temporal setting by allowing spatial effects to vary across time. However, the model lacks a specification of temporal dependency, as also noted by Knorr-Held [4]. More recently, Quick, Waller, and Casper [5] proposed a multivariate space-time CAR (MSTCAR) model, which is essentially a multivariate CAR model, where both temporal and between group dependencies are modelled as multivariate dependencies. Other works related to spatial process of time series include Sans, Schmidt, Nobre, et al. [6] and Quick, Waller, and Casper [7].

Alternatively, one also think of the process as a time series of spatial process, or a spatial extension of time series. This is the approach we take in our spatio-temporal modelling. The underlying notion is that “the temporal dependence is more natural to model than the spatial dependence” [8].

Following Cox et al. [9], it is useful to distinguish two modelling approaches for the analysis of time series data commonly seen in spatial-temporal modelling literature: the parameter-driven and observation-driven model. In a parameter-driven model, the dependence between subsequent observations is modelled by a latent stochastic process, which evolves independently of the past history of the observation process. In contrast, in an observation-driven model, time dependence arises because the conditional expectation of the outcome given the past depends explicitly on the past values.

For multivariate count data, the advantage of parameter-driven models is that one can easily assume that the conditional expectation of the observed process (on log-scale), as a latent process, is (multivariate) normal. There are extensive works related to latent spatio-temporal models under the Bayesian framework, including models with Gaussian data modelled by (multivariate) Gaussian process with an additive error [10, 11, 12, 13], Poisson data with conditional expectation modelled by Gaussian latent process ([14, 15] and Chapter 7 of [8]) and Poisson data with multivariate log-gamma latent process [16]. However, estimation of parameters in parameter-driven models requires considerable computational effort, as does prediction of the latent process.

On the other hand, in observation-driven models, inference is possible in a (penalized) maximum likelihood framework and therefore can be easily fitted even for quite complex regression models [17]. Schrödle, Held, and Rue [18] proposed a parameter-driven spatio-temporal model and compared it with a similar observation-driven model proposed by Paul, Held, and Toschke [19]. They conclude that the parameter-driven models perform slightly better in terms of prediction in some cases, however, while the computation time for the observation-driven model is mostly less than a second, fitting a parameter-driven model takes several hours if it ever converges, because of the complexity with the latent autoregressive process. Besides, their model contains only five parameters, while in our application, the number of parameters of interest grows quadratically with the number of cell populations, which will make the parameter-driven models intractable even with a moderate number of cell populations.

Therefore, we choose to work with a spatial extension of observation-driven time series. Zeger and Qaqish [20] review various observation-driven time series models with a quasi-likelihood estimation. Fokianos and Tjøstheim [21] develop and study the probabilistic properties of a log-linear autoregressive time series model for Poisson data, as an extension of the model considered by Fokianos, Rahbek, and Tjøstheim [22]. See Scott, et al. and Kedem and Fokianos [23, 24] for a complete review.

Literature about observation-driven spatio-temporal models, however, is relatively sparse. Held, Höhle, and Hofmann [25] propose a multivariate time series model where parameters are allowed to vary across space. Paul et al. [19] extended the model such that spatial dependences are captured by additional parameters that quantify the “directed influence” of neighboring areas at previous time points on the observation of interest. Paul and Held [26] further extend the model by introducing random effects. Note that these approaches model directly the conditional expectation of the count data, meaning they are using an identity link function, instead of the canonical log-link. Thus, it is required that the parameters are positive to ensure that the resulting conditional expectation is positive. Knorr-Held and Richardson [27] propose a space-time model for surveillance data, apart from separate seasonal and spatial components, they include an autoregressive term with a latent indicator.

In this paper, we develop a conditional spatial-temporal model for multivariate count data on tiled images, and provide its application on tiled images in the context of longitudinal cancer cell monitoring experiments. Our model enables us to measure the effect on the growth rate of each cell population and changes due to local cross-population interactions. Specifically, we consider a multivariate Poisson model with intensity modeled as a log-linear form similar to those in [27] and [21], and we quantify spatio-temporal impacts of different cell populations in neighboring tiles through model parameters, as illustrated in Figure 1(b). Impacts are allowed to be positive or negative, and unlike those models that describe between group dependence through a covariance matrix, influences do not have to be symmetrical in our model. Another main advantage of the proposed framework is that it enables one to accommodate spatio-temporal cell interactions for heterogeneous cell populations within a relatively parsimonious statistical model.

Since the model complexity can be potentially very large in the presence of many cell types, it is also important to address the question of how to select an appropriate model by retaining only the meaningful spatio-temporal interactions between cell populations We cary out model selection using the common model selection criteria for parametric models, the Akaike and the Bayesian information criteria (AIC and BIC).

The remainder of the paper is organized as follows. In Section 2, we introduce the conditional spatio-temporal lattice model for multivariate count data and develop maximum likelihood inference tools. In the same section, we discuss the asymptotic properties of our estimator and standard errors. In Section 3, we study the performance of our methodology using simulated data, and compare it to that of the multivariate conditional autoregressive (MCAR) model. In Section 4, we apply our method, as well as the MCAR model to analyze datasets from an in-vitro experiment, where cancer cells are co-cultured with fibroblasts. In Section 5, we conclude and give final remarks.

2 Methods

2.1 Multicolor spatial autoregressive model on the lattice

Let L∈N2 be a discrete lattice. In the context of our application, the lattice is obtained by tiling a microscope image into nL tiles, denoted by Ln(⊂L). The total number of tiles nL is a monotonically increasing function of n. One can choose various forms of lattice, for example, the regular or hexagonal lattices. For simplicity, we tile the image into n×n regular rectangular tiles, which makes nL=n2. An example of a tiled image with n=10 is shown in Figure 1(a). Denote a pair of neighboring tiles {i,j} with i∼j, if tiles i and j share the same border or coincide (i=j). Each tile may contain cells of different colors; thus, we let C={1,…,nC} be a finite set of colors and denote by nC the total number of colors. Let Y={Yt,t=1,…,T} be the sample of observations where Yt={Yt(c),c∈C} is the collection of observations at time point t, and Yt(c)=(Y1,t(c),…,YnL,t(c))⊤ is the vector of observed frequencies for color c on the lattice Ln at time t. The joint distribution for the spatio-temporal process on the lattice is difficult to specify, due to local spatial interactions for neighboring tiles and global interactions occurring at the level of the entire image. An additional issue is that cells tend to be clustered together due to the cell division process and other biological mechanisms; thus it is not uncommon to observe low counts in a considerable portion of tiles. In typical longitudinal experiments, the number of time points seldom go beyond 50 due to experimental, storage and processing cost, while nL can be relatively large. So we work under the framework where T is assumed to be finite, while nL is allowed to grow to infinity.

We suppose that the count for the ith tile Yi,t(c) follows a marginal Poisson distribution Yi,t(c)|Yt−1∼Pois(λi,t(c)), with intensity modeled by the canonical log-link vi,t(c)=logλi,t(c), where vi,t(c) takes the following spatial autoregressive form:

(1)vi,t(c)=α(c)+∑c′∈Cβ(c|c′)Si,t−1(c′),

(2)Si,t−1(c′)=1ni∑i∼j:j∈Lnlog1+Yj,t−1(c′),

for all c∈C,t=1,…,T, with ni={#j:i∼j,j∈Ln} being the number of tiles in a neighborhood of tile i. Although we are adopting the regular grids for simplicity, the model is readily applicable to other tiling strategies. Changing the tiling strategy would only change the realisations of Si,t−1(c′) in (2).

Here, we assume that the conditional count for different tiles at time t is independent conditioning on information from t−1, i.e.

P(Yi,t(c)Yj,t(c′)|Yt−1)=P(Yi,t(c)|Yt−1)P(Yj,t(c′)|Yt−1),

for all c,c′∈C,t=1,…,T, and i,j∈Ln,i≠j. This does not suggest that they (Yi,t(c) and Yj,t(c′)) are independent, but rather that their spatio-temporal dependence is due to the structure of intensity λi,t(c) in (1). Conditional independence is a commonly used assumption for spatio-temporal models in a non-gaussian setting [3, 28], since it’s exceedingly difficult to work with multivariate non-Gaussian distribution [8].

The elements of the parameter vector α=(α(1),…,α(nC))⊤ are main effects corresponding to a baseline average count for cells of different colors. The spatio-temporal interactions are measured by the statistic Si,t−1(c′) in (2), which essentially counts the number of cells of color c′ in the neighborhood of tile i at time t−1. Hence, the autoregressive parameter β(c|c′) is interpreted as positive or negative change in the average number of cells with color c, due to interactions with cells of color c′ in neighbouring tiles. A positive (or a negative) sign of β(c|c′) means that the presence of cells of color c′ in neighboring tiles promotes (or inhibits) the growth of cells of color c. The spatio-temporal effects βc|c′,c,c′∈C, are collected in the nC×nC weighted incidence matrix B. This may be used to generate weighted directed graphs, as shown in the example of Figure 2, where the nodes of the directed graph correspond to cell types, and the directed edges are negative or positive spatio-temporal interactions between cell types.

Equation (1) could be extended to some more specific form, for example, vi,t(c)=α(c)+∑c′∈Cβ1(c|c′)Si,t−1(c′)+β0(c|c′)log1+Yi,t−1(c′), where β1(c|c′) are interpreted as the effect of cells of color c′ from neighbouring (but not the same) tiles have on the growth of cells with color c, while β0(c|c′) as the effect of cells of color c′ from the same tile. However, we stick to the model in (1) because we have no evidence showing that the more complex model is advantageous from a model selection view point.

We choose to work with a log-linear form for the autoregressive equation of vi,t(c) in eq. (1), where we apply a logarithmic transform and add 1 to the counts at time t−1, Yi,t−1(c). It offers several advantages compared to the more commonly used linear form. First, λi,t(c) and Yi,t−1(c) are transformed on the same scale. Moreover, this model can accommodate both positive and negative correlations, while it is not possible to account for positive association in a stationary model if past counts are directly included as explanatory variables. For example, with the model vi,t=α+βYi,t−1 for a single color, the intensity would be λi,t=expαexpβYi,t−1, which may lead to instability of the Poisson means if β>0 since λi,t is allowed to increase exponentially fast. Finally, adding 1 to Yi,t−1(c) is for coping with zero data values, since log(Yi,t−1(c)) is not defined when Yi,t−1(c)=0, which arises often, and it maps zeros of Yi,t−1(c) into zeros of log(1+Yi,t−1(c)).

2.2 Likelihood inference

Let θ be the overall parameter vector θ=(α⊤,vec(B)⊤)⊤∈Rp, where α is a nC-dimensional vector defined in Section 2.1 and B is a nC×nC matrix of color interaction effects, p=nC(1+nC) is the total number of parameters. In this section, we develop a weighted maximum likelihood estimator for our model,

(3)Ln(θ)=∏t=1T∏c∈C∏i∈LnP(Yi,t(c)|Yt−1;θ)wi,t(c)=∏t=1T∏c∈C∏i∈Ln(e−λi,t(c)(θ)λi,t(c)(θ)yi,t(c)yi,t(c)!)wi,t(c),

where λi,t(c)(θ) is the expected number of cells with color c in tile i at time t, defined in (1) and the weights wi,t(c) are given constants. The weighted maximum likelihood estimator (MLE), θˆ, is obtained by maximizing the weighted log-likelihood function

(4)ℓn(θ)=∑i∈Ln∑t=1T∑c∈Cwi,t(c)Yi,t(c)vi,t(c)(θ)−expvi,t(c)(θ),

where vi,t(c)(θ)≡logλi,t(c)(θ). Equivalently, θˆ is formed by solving the weighted estimating equations

(5)0=un(θ)≡1nL∇ℓn(θ)=1nL∑i∈Ln∑t=1Twi,t(c)γi,t(θ)⊗∇vi,t,

where γi,t(θ)=yi,t(1)−expvi,t(1)(θ),…,yi,t(nC)−expvi,t(nC)(θ), ⊗ denotes the Kronecker product, ∇ is the gradient operator with respect to θ and ∇vi,t≡∇vi,t(c)(θ)=(1,Si,t−1(1),…,Si,t−1(nC))⊤.

Specific weights could be used to address the presence of outliers. Following Ferrari and Vecchia [29] and La Vecchia, Camponovo, and Ferrari [30], the influence of strong outliers could be avoided by taking weights of form wi,t(c)=exp{−(1−q)[Yi,t(c)v(c)(θ)−exp(v(c)(θ))]} with q being a tuning constant smaller than 1. However, for the current application we use constant weights all equal to 1.

Our empirical results show that this choice performs reasonably well in terms of estimation accuracy in all our numerical examples and guarantees optimal variance for the estimator θˆ under correct model specification. The solution to eq. (5) is obtained by a standard Fisher scoring algorithm, which is found to be stable and converges fast in all our numerical examples.

Finally, in practical applications it is also important to address the question of how to select an appropriate model by retaining only the meaningful spatio-temporal interactions between cell populations, and avoid over-parametrized models. Model selection plays an important role by balancing goodness-of-fit and model complexity. Here, we select non-zero model parameters based traditional model selection approaches: the Akaike Information criterion, AIC=−2ℓ(θˆ)+2p, and the Bayesian information criterion, BIC=−2ℓ(θˆ)+plog(|nLT|).

2.3 Asymptotic properties and standard errors

In this section, we overview the asymptotic behavior of the estimator introduced in Section 2.2. In our setting we consider a fixed number of time points, T, whilst the lattice Ln is allowed to increase. This reflects the notion that the statistician is allowed to choose an increasingly fine tiling grid as the number of cells increases. If the regularity conditions stated in the Appendix hold, then nLHn(θ0)1/2(θˆn−θ0) converges in distribution to a p-variate normal distribution with zero mean vector and identity variance, as nL→∞, with Hn(θ) given in (6). Asymptotic normality of θˆn follows by applying the limit theorems for M-estimators for nonlinear spatial models developed by Jenish and Prucha [31]. One condition required to ensure this behaviour is that Yt has constant entries at the initial time point t=0, which is quite realistic since typically cells are seeded randomly at the beginning of the experiment. Our proofs mostly check α-mixing conditions and L2-Uniform Integrability of the score functions ui,t(θ) ensures a pointwise law of large numbers, with additional stochastic equicontinuity, a uniform version of the law of large numbers required by Jenish and Prucha [31].

The asymptotic variance of θˆ is Vn(θˆ)=Hn−1(θ0), where Hn(θ) is the p×p Hessian matrix

(6)Hn(θ)=−E∇2ℓ(θ)=−E∑i∈Ln∇ui(θ),

with ui(θ)=ui,1(θ)+⋯+ui,T(θ) being the partial score function for the ith tile. Direct evaluation of H(θ) may be challenging since the expectations in (6) is intractable. Thus, we estimate Hn(θ) by the empirical counterpart

Hˆn(θ)=Hˆ(1)(θ)0⋯00Hˆ(2)(θ)⋯0⋮⋮⋱⋮00⋯Hˆ(nC)(θ),

where

(7)Hˆ(c)(θ)=∑i∈Ln∑t=1Twi,t(c)expvi,t(c)(θ)∇vi,t∇vi,t⊤.

Note that the above estimators approximate the quantities in formula (6) by conditional expectations. Our numerical results suggest that the above variance approximation yields confidence intervals with coverage close to the nominal level (1−α). Besides the above formulas, we also consider confidence intervals obtained by a parametric bootstrap approach. Specifically, we generate B bootstrap samples Y(1)∗,…,Y(B)∗ by sampling at subsequent times from the conditional model specified in eqs. (1) and (2) with θ=θˆ. From such bootstrap samples, we obtain bootstrapped estimators, θˆ(1)∗,…,θˆ(B)∗, which are used to estimate var(θˆ0) by the usual covariance estimator Vˆboot(θˆ)=∑b=1B(θˆ(b)∗−θ‾∗)2/(B−1), where θ‾∗=∑b=1Bθˆ(b)∗/B. Finally, a (1−α)100% confidence interval for θj is obtained as θˆj±z1−α/2{Vˆ}jj1/2, where zq is the q-quantile of a standard normal distribution, and Vˆ is an estimate of var(θˆ) obtained by either eq. (7) or bootstrap resampling.

3 Monte Carlo simulations

In our Monte Carlo experiments, we generate data from a Poisson model as follows. At time t=0, we populate nL tiles using equal counts for cells of different colors. For t=1,…,T, observations are drawn from the multivariate Poisson model Yi,t(c)|Yt−1∼Poisson(λi,t(c)),c∈C. Recall that the rate λi,t(c) defined in Section 2.1 contains autoregressive coefficients β(c|c′), which are collected in the nC×nC matrix B.

We assess the performance of MLE under different settings concerning the size and sparsity of B. Consider the three models with the following choices of B:

B1=0.7−0.70.70.70.7−0.7−0.70.70.7,B2=0.05−0.150.250.350.45−0.55−0.650.750.85,B3=0.7−0.70.700.70000.7.

Denote Model i as the model corresponding to Bi,i=1,2,3. In Model 1, all the effects in B have the same size; in Model 2, the effects have decreasing sizes; Model 3 is the same as Model 1, but with some interactions exactly equal to zero.

We set α(1)=⋯=α(nC)=−0.1 for all three models. The above parameter choices reflect the situation where the generated process Y has a moderate growth.

In Table 1 and Table 2, we show results based on 1000 Monte Carlo runs generated from Models 1-3, for n=25,nC=3 and T=10 and 25. In Table 1, we show Monte Carlo estimates of squared bias and variance of θˆ. Both squared bias and variance of our estimator are quite small in all three models, and decrease as T gets larger. The variances of Model 2 are slightly larger than those in the other two models due to the increasing difficulty in estimating parameters close to zero.

Table 1:

Monte Carlo estimates for squared bias (×10−6) and variance (×10−4) of the MCLE for three models with time points T=10,25. Simulation standard errors are shown in parenthesis. The three models differ in terms of the coefficients β(c|c′),c,c′∈C, as described in Section 3: Non-zero equal effects (Model 1), non-zero decreasing interactions (Model 2), and sparse effects (Model 3). For all models, α(c)=−0.1,c=1,2,3. Estimates are based on 1000 Monte Carlo runs.

	T=10		T=25
	Biasˆ2	Varˆ	Biasˆ2	Varˆ
Model 1	0.45(0.57)	5.75(0.26)	0.29(0.32)	2.36(0.11)
Model 2	0.64(0.91)	9.66(0.42)	0.67(0.71)	4.45(0.20)
Model 3	0.77(0.97)	8.09(0.36)	0.52(0.51)	3.47(0.16)

In Table 2, we report the coverage probability for symmetric confidence intervals of the form θˆ±z1−α/2sdˆ(θˆ), where zq is the q−quantile for a standard normal distribution, with α=0.01,0.05,0.10. The standard error, sdˆ(θˆ), is obtained by the squared root of diagonal elements of Vn(θˆ) and the parametric bootstrap estimate, Vˆest and Vˆboot, described in Section 2.3. The coverage probability of the confidence intervals are very close to the nominal level for both methods.

Table 2:

Monte Carlo estimates for the coverage probability of (1−α)% confidence intervals θˆ±z1−α/2sdˆ(θˆ), with sdˆ(θˆ) obtained using bootstrap (Vˆboot) and sandwich (Vˆest) estimators in Section 2 and 3. The three models differ in terms of the coefficients β(c|c′),c,c′∈C as described in Section 3: Non-zero equal effects (Model 1), non-zero decreasing interactions (Model 2), and sparse effects (Model 3). For all models, α(c)=−0.1,c=1,2,3, estimates are based on 1000 Monte Carlo runs.

		T=10		T=25
		Vˆboot	Vˆest	Vˆboot	Vˆest
	Model 1	98.6	99.0	98.9	99.0
α=0.01	Model 2	99.0	99.0	98.8	98.9
	Model 3	98.9	99.0	98.9	98.9
	Model 1	94.2	95.2	94.9	95.0
α=0.05	Model 2	95.2	95.1	95.0	95.3
	Model 3	95.4	95.5	94.9	95.1
	Model 1	89.2	90.3	90.1	90.3
α=0.10	Model 2	90.6	90.0	89.7	90.0
	Model 3	90.6	90.6	90.2	90.2

In Table 3, we show results for the model selection based on 1000 Monte Carlo samples from Model 3 using the AIC and the BIC given in Section 2 for n=25 and T=10,25. We report Type A error (a term is not selected when it actually belongs to the true model ) and Type B error (a term is selected when it is not in the true model ). For both AIC and BIC model selection is more accurate for large T. As expected AIC tends to over select, and BIC outperforms AIC, with zero Type A error, and very low Type B error.

Table 3:

Monte Carlo estimates for % Type A error (a term is not selected when it actually belongs to the true model) and % Type B error (a term is selected when it is not in the true model) using AIC and BIC criteria. Results are based on 1000 Monte Carlo samples generated from Model 3 with n=25 and T=10,25.

	T=10		T=25
	Type A	Type B	Type A	Type B
AIC	0.00	10.00	0.00	10.38
BIC	0.00	0.22	0.00	0.20

Finally, we compare the performance of our model with the following Multivariate conditional autoregressive (MCAR) model proposed by Leroux, Lei, and Breslow [32]:

Yi,t(c)∼Poisexpxi,tTβ+Zi,

where Zi,i∈Ln are random effects with conditional distribution

Zi|Z−i∼Nρ∑j∼i:j∈LnZjρni+1−ρ,ΣZρni+1−ρ,

where ρ is a spatial autocorrelation parameter, with ρ=0 corresponding to independence, while ρ=1 corresponds to the intrinsic model, and ΣZ is a nCT×nCT between variable covariance matrix, which is assumed to have no fixed structure, and ni is the number of tiles in a neighborhood of tile i as defined in Section 2.1. Let β=(αT,vec(B)T)T be a vector of regression parameters, where B is defined in Section 2.1 and α is the intercept. Let the covariate xi,t be a nC2-dimensional vector consists of nC vectors: (Si,t−1(1),…,Si,t−1(nC)), where Si,t−1(c) carries the information from the neighbouring tiles on the previous time point, defined in eq. (2).

An independent Gaussian prior, N(0,100000), is specified for each regression parameter in β. A uniform prior on the unit interval, U(0,1), is specified for ρ. For covariance matrix ΣZ, assume an inverse Wishart distribution with identity scale matrix and nCT degree of freedom.

To evaluate the performance of MLE under our model and estimators obtained by the MCAR model, we generate 1000 set of data from Model 1. Estimation of the MCAR model is done by MCMC sampling, using R package CARBayes by Lee [33]. Table 4 show Monte Carlo estimates of squared bias, variance, the coverage probability of 95% confidence intervals and computation time for n,T∈{10,25} and nC=1,2,3. Two of the settings are the same as those shown for Model 1 in Table 1: n=25,nC=3,T=10 and n=25,nC=3,T=25 . In estimation of MCAR, we also show results of two MCMC settings: 1. MCAR1: 1000 MCMC samples generated and 200 discarded as the burn-in period; 2. MCAR2: 5000 samples with 100 discarded. Coverage probabilities of our model is computed as θˆ±z0.975sdˆ(θˆ), where zq is the q−quantile for a standard normal distribution. The standard error, sdˆ(θˆ), is obtained by taking the squared root of diagonal elements of Vn(θˆ) described in Section 2.3.

Table 4:

Monte Carlo estimates for squared bias (×10−6), variance (×10−4), the coverage probability of 95% confidence intervals as well as computation time for n,T∈{10,25} and nC=1,2,3 of MLE of our model, and MCAR, where in MCAR1, 1000 MCMC samples generated and 200 discarded as the burn-in period; and in MCAR2, 5000 samples with 100 discarded. True values of regression parameters are shown as B1. Estimates are obtained from 1000 Monte Carlo runs.

		T=10				T=25
n = 10		Biasˆ2	Varˆ	95%	Time	Biasˆ2	Varˆ	95%	Time
nC= 1	Model 1	2.89(3.40)	21.22(0.98)	94.5	1s	1.29(1.62)	10.18(0.46)	94.6	3s
	MCAR1	420.20(44.63)	33.98(1.60)	90.1	4s	395.58(29.60)	13.75(0.60)	93.4	10s
	MCAR2	381.20(44.99)	30.07(1.33)	96.7	17s	143.09(15.44)	13.54(0.58)	94.8	45s
nC= 2	Model 1	3.86(3.87)	43.81(1.95)	95.3	2s	3.51(1.96)	33.64(1.52)	94.6	3s
	MCAR1	348.09(34.23)	90.85(4.11)	89.7	8s	177.33(15.29)	31.63(1.41)	92.7	25s
	MCAR2	202.02(38.77)	85.59(3.97)	93.7	34s	26.82(9.87)	32.64(1.56)	93.1	105s
nC= 3	Model 1	4.83(5.10)	34.66(1.59)	94.9	3s	4.4(2.48)	14.21(0.65)	94.7	7s
	MCAR1	217.09(17.38)	44.89(2.08)	82.6	12s	72.72(6.68)	13.73(0.62)	88.2	46s
	MCAR2	82.64(14.25)	38.71(1.77)	93.0	52s	13.64(3.75)	13.27(0.61)	93.0	190s

n = 25		T=10				T=25
		Biasˆ2	Varˆ	95%	Time	Biasˆ2	Varˆ	95%	Time

nC= 1	Model 1	0.51(0.56)	3.18(0.14)	94.9	10s	0.40(0.37)	1.73(0.08)	94.5	23s
	MCAR1	20.98(3.82)	3.99(0.17)	93.4	31s	41.21(1.63)	2.41(0.08)	92.0	70s
	MCAR2	4.35(1.57)	4.64(0.21)	95.6	145s	10.37(3.75)	1.98(0.11)	93.5	345s
nC= 2	Model 1	0.76(0.34)	13.22(0.56)	94.4	8s	0.59(0.67)	5.69(0.25)	94.4	19s
	MCAR1	26.17(5.61)	14.33(0.62)	91.9	54s	16.14(1.65)	5.54(0.24)	92.6	157s
	MCAR2	10.84(4.67)	13.87(0.60)	93.7	260s	3.15(0.83)	5.44(0.24)	92.9	2290s
nC= 3	Model	0.67(0.66)	5.91(0.27)	94.6	24s	0.31(0.44)	2.35(0.14)	94.9	55s
	MCAR1	15.42(2.17)	12.66(0.59)	60.6	82s	14.10(1.48)	4.43(0.28)	30.5	300s
	MCAR2	2.13(2.14)	6.53(0.30)	92.3	390s	0.64(0.73)	2.16(0.13)	92.7	3387s

In overall, our method performs better than MCAR at analysing the kind of data that we generate, especially when n and/or T is small, with much smaller bias and variance, as well as computation time. The performance of MCAR improves significantly as the model gets more complicated (i.e. larger nC), and when n and T increases. In the case where n=25,T=25 and nC=3, it almost performs equally well with our model, however, it takes almost an hour to obtain the estimates, while our method requires less than a minute. Besides, for the coverage probabilities to reach the nominal level, it seems that MCAR requires larger MCMC sample size as the model gets more complicated, while those of our model has been stable and close to the nominal level in all cases.

4 Analysis of the cancer cell growth data

Cancer cell behaviour is believed to be determined by several factors including genetic profile and differentiation state. However, the presence of other cancer cells and non-cancer cells has also been shown to have a great impact on overall tumor behaviour [34, 35]. It is therefore important to be able to dissect and quantify these interactions in complex culture systems. The data sets in this section represent a cancer cell-fibroblast co-culture experiment. The data sets analyzed consist of counts of cell types (different cancer cell populations expressing different fluorescent proteins, and non-fluorescent fibroblasts) from 9 subsequent images taken at an 8-hour frequency over a period of 3 days using the Operetta high-content imager (Perkin Elmer). Information regarding cell type (fluorescent profile) and spatial coordinates for each individual cell were extracted using the associated software (Harmony, Perkin Elmer).

Each image was subsequently tiled using a 25×25 regular grid.

We choose the number of tiles for a balance between the fit of the model and capturing the local impact between cell populations. More specifically, decreasing tile sizes enables one to detect local impacts between cell populations, which is one of the objectives of our analysis. However, if the tiles are too small, we will end up with mostly no cells in most tiles. In this situation the conditional Poisson model would not fit well the data. On the other hand, when the tiles are too large the model would fit the data well (the conditional Poisson would be approximately a conditional normal model), but we lose information on local impacts. We recommend 0 to 20 average cells per tile, since for such choice our diagnostic and goodness-of-fit analyses suggest that the conditional Poisson model fits well the data whilst enabling us to measure local correlation effects between populations.

4.1 Cancer cell-fibroblast co-culture experiment

In this experiment, cancer cells are co-cultured with fibroblasts, a predominant cell type in the tumor microenvironment, believed to affect tumor progression, partly due to interactions with and activation by cancer cells [34]. In this experiment, fibroblasts (F) are non-fluorescent whereas cancer cells fluoresce either in the red (R) or green (G) channels due to the experimental expression of mCherry or GFP proteins, respectively. Cells were initially seeded at a ratio of 1:1:2 (R:G:F).

Model selection and inference. We applied our methodology to quantify the magnitude and direction of the impacts have on growth for the considered cell types. To select the relevant terms in the intensity expression (1), we carry out model selection using the BIC model selection criterion. In Table 5, we show estimated parameters for the full and the BIC models, with bootstrap 95% confidence intervals in parenthesis. Figure 2 illustrates estimated spatio-temporal impacts between cell types using a directed graph. The solid and dashed arrows represent respectively significant and not significant impacts between cell types at the 95% confidence level. Significant impacts coincide with parameters selected by BIC.

The interactions within each cell type (βˆ(c|c),c=R,G,F) are significant, which is consistent with healthy growing cells. As anticipated, the effects βˆ(c|c) for the cancer cells are larger than those for the slower growing fibroblasts. The validity of the estimated parameters is also supported by the similar sizes of the parameters for the green and red cancer cells. This is expected, since the red and green cancer cells are biologically identical except for the fluorescent protein they express. Interestingly, the size of the estimated effects within both types of cancer cells (βˆ(c|c),c=R,G) are larger than the impact they have on one another (βˆ(G|R) and βˆ(R|G)). This is not surprising, since βˆ(c|c)(c=R,G) reflects not only impacts between cells from the same cell population, but also cell proliferation. The fact that we are able to detect the impacts between the red and green cancer cells confirms that our methodology is sensitive enough to detect biologically relevant impacts even though no interactions were found between the cancer cells and the fibroblasts. This might be due to the fact that we used normal fibroblasts that had not previously been in contact with cancer cells and thus had not been activated to support tumor progression as is the case with cancer-activated fibroblasts.

$Figure 2: Directed graph showing fitted spatio-temporal interactions between GFP cancer cells (G), mCherry cancer cells (R) and fibroblasts (F). The solid and dashed arrows represent respectively the significant and not significant interactions between cell types at the 95%$95\%$ confidence level.$

Figure 2:

Directed graph showing fitted spatio-temporal interactions between GFP cancer cells (G), mCherry cancer cells (R) and fibroblasts (F). The solid and dashed arrows represent respectively the significant and not significant interactions between cell types at the 95% confidence level.

Goodness-of-fit and one-step ahead prediction To illustrate the goodness-of-fit of the estimated model, we generate cell counts for each type in each tile, yˆi,t(c), from the Pois(λˆi,t(c)) distribution for t≥1, where λˆi,t(c) is computed using observations at time t−1, with parameters estimated from the entire dataset. In Figure 4, we compare the actually observed and generated cell counts for GFP cancer cells (G) and mCherry cancer cells (R) and fibroblasts (F) across the entire image. The solid and dashed curves for all cell types are close, suggesting that the model fits the data reasonably well. As anticipated, the overall growth rate for the red and green cancer cells are similar, and sensibly larger than the growth rate for fibroblasts.

To assess the prediction performance of our method, we consider one-step-ahead forecasting using parameters estimated from a moving window of five time points. In Figure 3, we show quantiles of observed cell counts against predicted counts for each tile. The upper and lower 95% confidence bounds are computed non-parametrically by taking Fˆ1−1(Fˆ0(yt(c))−0.95) and Fˆ1−1(Fˆ0(yt(c))+0.95), where Fˆ0 and Fˆ1 are the empirical distributions of the observations and predictions at time t respectively [36]. The identity line falls within the confidence bands in each plot, indicating a satisfactory prediction performance.

Figure 3:

QQ-plots for cell growth, comparing observed (horizontal axis) and one-time ahead predicted (vertical axis) cell counts per tile on the entire image at times t=6,7,8 for GFP cancer cells (G), mCherry cancer cells (R) and fibroblasts (F). One-time ahead predictions are based on the model fitted using a moving window of five time points.

Comparison with MCAR model Next, we compare the estimates as well as the goodness-of-fit on the real data with the MCAR model. Parameter estimates are shown in Table 5, with 95% confidence intervals given in parenthesis. Results from both models are mostly consistent with each other, specifically, both models show that impacts within each cell type (βˆ(c|c),c=R,G,F) are significant, the effects βˆ(c|c) for cancer cells are larger than those for the slower growing fibroblasts, the green and red cancer cells have positive impact on each other, and cancer cells have no impact on fibroblasts. The only difference is, the MCAR model shows a negative impact of fibroblasts on the green cancer cells only, while our model detect no significant impact on either cancer cells. Since the red and green cancer cells are biologically identical except for the fluorescent protein they express, we expect a symmetrical result with both cancer cells.

Table 5:

Estimated parameters for the full, the BIC models and the MCAR model based on the cancer cell growth data described in Section 4. Bootstrap 95% confidence intervals based on 50 bootstrap samples are given in parenthesis.

	Full model
c=	G	R	F
αˆ(c)	−0.99 (−1.19, −0.79)	−0.50 (−0.70, −0.30)	−0.26 (−0.45, −0.06)
βˆ(G\|c)	1.23 (1.10, 1.35)	0.34 (0.21, 0.48)	0.12 (−0.03, 0.27)
βˆ(R\|c)	0.28 (0.17, 0.38)	1.09 (0.96, 1.21)	0.02 (−0.09, 0.13)
βˆ(F\|c)	0.10 (−0.01, 0.21)	0.02 (−0.07, 0.12)	0.92 (0.81, 1.03)

	BIC model
c=	G	R	F
αˆ(c)	−0.88 (−1.04, −0.72)	−0.49 (−0.66, −0.31)	−0.19 (−0.36, −0.02)
βˆ(G\|c)	1.24 (1.11, 1.37)	0.35 (0.21, 0.48)	/
βˆ(R\|c)	0.28 (0.17, 0.39)	1.09 (0.96, 1.21)	/
βˆ(F\|c)	/	/	0.93 (0.82, 1.04)

	MCAR
c=	G	R	F
αˆ(c)	−0.45 (−0.54, −0.38)	−0.45 (−0.54, −0.38)	−0.45 (−0.54, −0.38)
βˆ(G\|c)	1.06 (0.93, 1.16)	0.16 (0.09, 1.16)	−0.15 (−0.22, −0.09)
βˆ(R\|c)	0.25 (0.15,0.31)	1.01 (0.92,1.08)	0.05 (−0.02,0.10)
βˆ(F\|c)	0.03 (−0.06,0.20)	0.03 (−0.07,0.19)	0.96 (0.83,1.08)

In Figure 4, apart from the observed (solid curve) and generated (dashed curve) cell counts from our model, we also show the generated cell counts from the MCAR model (dotted curve) for the green cancer cells (G), red cancer cells (R) and fibroblasts (F) across the entire image. Compared to the dotted curves, the dashed curves are slightly closer to the solid ones, which means our model seems more appropriate for analysing this type of data than the MCAR model.

$Figure 4: Goodness-of-fit of the estimated models. Observed (solid) and predicted (dashed for our model and dotted for the MCAR model) number of GFP cancer cells (G), mCherry cancer cells (R) cancer cells and fibroblasts (F) for the entire image. Predicted cell counts for each cell type in each tile yˆi,t(c)$\hat{y}_{i,t}^{{(c)}}$ is generated from the conditional Poisson model with intensity λˆi,t(c)$\hat{\lambda}_{i,t}^{{(c)}}$ defined in eqs. (1) and (2), where the coefficients βˆ(c|c′)$\hat{\beta}^{(c|c\prime)}$ are estimated from the entire dataset.$

Figure 4:

Goodness-of-fit of the estimated models. Observed (solid) and predicted (dashed for our model and dotted for the MCAR model) number of GFP cancer cells (G), mCherry cancer cells (R) cancer cells and fibroblasts (F) for the entire image. Predicted cell counts for each cell type in each tile yˆi,t(c) is generated from the conditional Poisson model with intensity λˆi,t(c) defined in eqs. (1) and (2), where the coefficients βˆ(c|c′) are estimated from the entire dataset.

5 Conclusion and final remarks

In this paper, we introduced a conditional spatial autoregressive model and accompanying inference tools for multivariate spatio-temporal cell count data. The new methodology enables one to measure the overall cell growth rate in longitudinal experiments and spatio-temporal interactions with either homogeneous or heterogeneous cell populations. The proposed inference approach is computationally tractable and strikes a good balance between computational feasibility and statistical accuracy. Numerical findings from simulated and real data in Sections 3 and 4 confirm the validity of the proposed approach in terms of prediction, goodness-of-fit and estimation accuracy.

The data sets described in this paper serve as a proof-of-concept that the proposed methodology works. However, the potential applications and the relevant questions that the methodology can help to answer in cancer cell biology are plentiful. To build on from the examples given in this paper, the methodology can be used to study interactions between cancer cells and a wide range of cancer-relevant cell types such as cancer-activated fibroblasts, macrophages, and other immune cells when co-cultured. Since a substantial proportion of cancer cells in tumors are in close proximity to other cell types that have been shown to affect tumor progression, using these co-cultures is more representative of the situation in a patient compared to studying cancer cells on their own. In addition to just giving the final cell number, the presented approach can dissect which cell types affect the growth of others and to what extent in complex heterogeneous populations. This could be relevant in a drug discovery setting to determine if a drug affects cancer cell growth due to internal effects (on other cancer cells) or by interfering with the interaction between the cancer cells and other cell types. Finding drugs with different targets and mechanisms of action are particularly sought after as they provide a wider target profile, increasing the chance of patients responding as well as reducing the risk of tumors becoming resistant. The impact of different genes and associated pathways in different cell types in relation to inter-cellular interactions can also be studied by genetically modifying the cell type(s) in question before mixing the cells together. This could be beneficial to identify new potential drug targets. Our approach is also applicable in other kinds of studies where local spatial cell-cell interactions are believed to affect cell growth such as studies of neurodegenerative diseases [37] and wound healing/tissue re-generation [38]. In addition to evaluating cell growth, our approach can also be used to study transitions between cellular phenotypes upon interaction with other cell types, provided that the different phenotypes studied can be distinguished from one another based on the image data. Finally, it is worth noting that issues may arise when cells become too confluent/dense, this may lead to segmentation problems of the imaging system. If they become completely confluent, they are likely to progressively stop growing. If one wants to measure for longer period of time, experiments can be performed in larger wells/plates or with smaller starting cell numbers.

Our method offers several practical advantages to researchers interested in analysing multivariate count data on heterogeneous cell populations. First, the conditional Poisson model does not require tracking individual cells across time, a process that is often difficult to automate due to cell movement, morphology changes at subsequent time points, and additional complications related to storage of large data files. Second, we are able to quantify local spatio-temporal interactions between different cell populations from a very simple experimental set-up where the different cell populations are grown together in a single experimental condition (co-culture). An alternative, solely experimentally-based strategy would require monitoring the different cell types alone and together at different cell densities (number of cells per condition) in order to make inferences in terms of potential interactions. However, such an approach would give no possibility of evaluating the spatial relations in the co-culture conditions and would still restrict the number of simultaneously tested cell types to two.

In the future, we foresee several useful extensions of the current methodology, possibly enabling the treatment of more complex experimental settings. First, complex experiments involving a large number of cell populations, nC, would imply an over-parametrized model. Clearly, this large number of parameters would be detrimental to both statistical accuracy and reliable optimization of the likelihood objective function ℓn(θ) in (4). To address these issues, we plan to explore a penalized likelihood of form ℓn(θ)−penλ(θ), where pen(θ) is a nonnegative sparsity-inducing penalty function. For example, in a different likelihood setting, Bardic et al. [39] consider the L1-type penalty pen(θ)=λ∑|θ|,λ>0.

Second, for certain experiments, it would be desirable to modify the statistics in eq. (2) to include additional information on cell growth such as the distance between heterogeneous cells, and covariates describing cell morphology.

Thirdly, it would be useful to develop a more principled way to select the tile sizes/number, and consider tiling the microscope image into a hexagonal lattice, which is a more natural choice in real application, since the distance between neighboring tiles would be more even than that of a regular lattice.

Finally, although numerical results (results not reported here) show that our method are quite robust in the presence of mild outliers (with around 5% of contaminated data), for more severe situations, we expect that severe or numerous outliers will have some influence on the estimates since the Poisson score function is unbounded. To address this problem, the log-likelihood scores in eq. magenta (5) should be replaced by some other robust alternative. Following Ferrari and Vecchia [29] and La Vecchia et al. [30], robustness can be obtained by the so-called q-entropy estimation method simply obtained by replacing the usual logarithm in the log-likelihood estimating equation by the q-logarithm logarithm function logq(x)=(x1−q−1)/(1−q)if q≠1, and logq(u)=log(x) if q=1, for all x>0. This ensures a bounded influence function for the implied estimator and therefore guarantees control of the bias under contamination.

Funding statement: This article was supported by the Australian National Health and Medical Research Council (1049561, 1064987 and 1069024)

Acknowledgements

The authors wish to acknowledge support from the Australian National Health and Medical Research Council grants 1049561, 1064987 and 1069024 to Frédéric Hollande. Christina Mølck is supported by the Danish Cancer Society.

Appendix

In the first part of this section, we provide technical lemmas required to prove asymptotic properties of the estimator θˆn.

Denote Et[⋅] as the expectation with respect to Yt={Yi,t,i∈Ln}, and E[⋅] as the expectation of Y={Yt,t=1,…,T}. Let Ni,r be the set of tiles in the neighborhood of tile i, with radius r. Specifically, for two locations i and j, we say j∈Ni,r if ∥i−j∥≤r. Thus, the neighborhood defined in Section 2 is of radius 1, i.e. {j:j∼i}={j:j∈Ni,1}. Denote nr=maxi∈Ln|Ni,r|=r2+r+1. Actually, for any tile i that is not on the boundary of the image, |Ni,r|=nr.

In the remainder of this paper we use the following assumptions:

A.1: The parameter space Θ is a compact subset of Rp, and that θ0 is the unique maximiser of ℓ(θ)=limnL→∞ℓn(θ).
A.2: The (nC+1)×nLT matrix (∇v1,1,∇v1,2,…,∇v1,T,∇v2,1,…,∇vn,T) is full rank.

Lemma 1.

Let Y1,…,Yn be independent Poisson random variables with mean λ1,…,λn respectively, where N is a finite positive integer. Then for any positive integer h,

Emaxi=1,…,nYih≤nhmaxi=1,…,nEYih.

Proof.

Emaxi=1,…,nYih⩽E∑i=1nYih ⩽nh−1E∑i=1nYih(convexity)⩽nhmaxi=1,…,nEYih

Lemma 2

Denote Y˜Ni,r,t=maxj∈Ni,r,c∈CYj,t(c), with corresponding observation y˜Ni,r,t and conditional mean λ˜Ni,r,t, then

(8)EY˜Ni,r,t+1B≤wr,t∑k=0Btft(k)ekα˜1+y˜Ni,r+t,0Bk,t=1,2,…,T

where

ft(k)=∑h=k/BBt−1eα˜hg(k,Bh)ft−1(h),g(a,b)=∑k=abbhha,f1(k)=g(k,B)=∑h=kBBhhk,wr,t=∏k=0t−1nr+k2nr+k,

the {⋅} denotes Stirling number of the second kind, α˜=maxc∈Cα(c),B=maxc(∑c′∈Cβ(c|c′))n1.

Proof

(9)λi,t(c)=expα(c)+∑c′∈Cβ(c|c′)∑j∈Ni,1log(yj,t−1(c′)+1)⩽eα˜y˜Ni,1,t−1+1B.

Similarly, for any c∈C, we have λNi,r,t(c)≤eα˜y˜Ni,r+1,t−1+1B, since {j′∈Nj,1;j∈Ni,r,i∈Ln,r>0}={j∈Ni,r+1;i∈Ln,r>0}.

Next, we proceed by induction.

For T=1, by the conditional independence assumption and Lemma 1, we have

ET−1ETY˜Ni,r,T+1B|YT−1=ET−1∑h=0BBhETmaxj∈Ni,r,c∈CYi,T(c)h|YT−1<nr2nrET−1[∑k=0B∑h=kBBhhkλ˜Ni,r,Tk]≤wr,1∑k=0Bf1(k)ekα˜ET−11+Y˜Ni,r+1,T−1Bk.

Since T−1=0 and Yt has constant entries at time point 0, ET−11+Y˜Ni,r+1,T−1Bk=1+y˜Ni,r+1,0Bk.

Suppose eq. (8) is true for T=t, then for T=t+1, we have

ET−t−1ET−tET−t+1…ETY˜Ni,r,T+1B|YT−1,…,YT−t−1⩽ET−t−1wr,t∑k=0Btft(k)ekα˜ET−t1+Y˜Ni,r+t,T−tBk|YT−t−1=wr,t∑k=0Btft(k)ekα˜∑k′=0BkBkk′ET−t−1ET−tY˜Ni,r+t,T−tk′|YT−t−1⩽wr,t∑k=0Btft(k)ekα˜∑k′=0BkBkk′ET−t−1nr+t2nr+tmaxj∈Ni,r+t,c∈CET−tYj,T−t(c)k′|YT−t−1=wr,t+1∑k=0Btft(k)ekα˜∑k′′=0Bk∑k′=k′′BkBkk′k′k′′ET−t−1λ˜Ni,r+t,T−tk′′ ⩽wr,t+1∑k′′=0Bt+1∑k=k′′/BBtft(k)ekα˜g(k′′,Bk)ek′′α˜ET−t−11+Y˜Ni,r+t+1,T−t−1Bk′′=wr,t+1∑k′′=0Bt+1ft+1(k′′)ek′′αˉ1+y˜Ni,r+t+1,0Bk′′

Lemma 3.

Given Assumption A.1, for any finite constant a,b≥0 and θ∈Θ,E(λi,t(c)aSi,t−1(c′)b)<∞,∀c,c′∈C,i∈Ln,t=1,…,T.

Proof.

By the definition of ft(k) given in Lemma 2, we know that ft(k) is bounded for all bounded t under assumption A.1. Thus, Lemma 2 implies

E(λi,t(c)aSi,t−1(c′)b)=E∑j∈Ni,1log(1+Yj,t−1(c′))bλi,t(c)a≤E(1+Y˜Ni,1,t−1)bBλi,t(c)a≤Eeaαˉ(1+Y˜Ni,1,t−1)(a+b)B≤eaα˜w1,t∑k=0Btft(k)ekα˜1+y˜Ni,1+t,0Bk<∞.

For simplicity, define the distance between tile i and j as d(i,j)=r if r−1<∥i−j∥≤r.

Lemma 4.

For anyi∈Ln,t1=1,…,T,

Cov(Yi,t1,Yj,t2)=0,for ∀j∈Ln,t2=1,…,T,if d(i,j)>t1+t2.

and

|(j,t2):Cov(Yj,t2,Yi,t1)≠0;j∈Ln,t2=1,…,T,|≤T(8T2+4T+1)

Proof.

Let Ni,t∗={j:Cov(Yj,0,Yi,t)≠0;j∈L} be the collection of counts in tiles at time 0 that are correlated with the count in tile i at time t (Yi,t). Due to the neighborhood structure in the autoregressive term described in Section 2, one can easily tell that Ni,t∗ is a neighbourhood around tile i, with the radius equal to t.

Due to the condition that Yt has constant entries at time 0, we have Cov(Yi,t1,Yj,t2)=0 if Ni,t1∗∩Nj,t2∗=∅, which is true when d(i,j)>t1+t2.

For any (i,t1)∈Dn, {(j,t2):Ni,t1∗∩Nj,t2∗≠∅} is a neighborhood around tile i, with a radius t1+t2.

Since nr=2r2+2r+1, we have

|(j,t2):Ni,t1∗∩Nj,t2∗≠∅|≤T|j:Ni,T∗∩Nj,T∗≠∅|=TN2T=T(8T2+4T+1).

In the second part of this section, we study the asymptotic properties of the estimator θˆn.

Proposition 1

(Existence and uniqueness) If assumption A.3 holds, then there exist unique maximizer of ℓn(θ), denoted by θˆn.

Proof.

First, since Θ is compact and ℓn(θ) is continuous, at least one maximiser of ℓn(θ) exist. Next, we wish to prove that the maximiser is unique. The p×p Hessian matrix of −ℓn(θ) can be written as a block matrix

Hn(θ)=−∇2ℓn(θ)=Hn(1)(θ)0⋯00Hn(2)(θ)⋯0⋮⋮⋱⋮00⋯Hn(nC)(θ),

where Hn(c)(θ)=∑i∈Ln∑t=1Texpvi,t(c)(θ)∇vi,t∇vi,t⊤ is a (nC+1)×(nC+1) matrix. Matrix ∇vi,t∇vi,t⊤ is positive semidefinite with rank 1. By Assumption A.2, ∑i∈Ln∑t=1T∇vi,t∇vi,t⊤ is full rank, which means Hn(c)(θ) is positive definite for all c∈C and θ∈Θ, since expvi,t(c)(θ)>0. This shows that −ℓn(θ) is strictly convex, which implies θˆn is unique.

Proposition 2

[Consistency] If the regularity assumption A.1 holds, then θˆn→pθ0 with probability tending 1, as nL→∞.

Proof.

We proceed by verifying the conditions of Theorem 2 in [31]. First we show that the score functions are Lp-Uniform Integrable for p<3, i.e.

(10)limn→∞supi∈Lnt=1,…,Tsupθ∈ΘE[ui,tp(θ)I(ui,t(θ)>k)]→0,as k→∞.

The general form of each entry of ui,t(θ) is (λi,t(c)−yi,t(c))Si,t−1c′, take p=3, we have

which is finite by lemma 3. This gives us the L3−boundedness of ui,t(θ), i.e.

limn→∞supi∈Lnt=1,…,Tsupθ∈ΘE[ui,t(c)(θ)3]<∞,

which implies Lp-Uniform Integrability, for p<3.

Second, we show the stochastic equicontinuity of ui,t(y;θ), i.e.

limn→∞supi∈Lnt=1,…,TPsupθ,θ′∈Θ∥θ−θ′∥<δ|ui,t(θ)−ui,t(θ′)|>ϵ=0.

The ∇ui,t(θ) is a p×p matrix, with each column being either ∂γi,t(θ)∂β(c|c′)⊗∇vi,t or ∂γi,t(θ)∂α(c)⊗∇vi,t, and

∂γi,t(θ)∂β(c|c′)=(0,…,0,λi,t(c)Si,t(c),0,…),and∂γi,t(θ)∂α(c)=(0,…,0,λi,t(c),0,…).

Thus, the non-zero entries of Esupθ∈Θ∇ui,t(θ) have the general form: Esupθ∈Θλi,t(c)Si,t(c)Si,t(c′), which are bounded by an equivalent analogous to Lemma 3.

Thirdly, we check α−mixing conditions. Let U and V be two subsets of Dn, and let σ(U)=σ{Yi,t;(i,t)∈U} be the σ−algebra generated by random variables Yi,t,(i,t)∈U.

Define

α(U,V)=sup{|P(A∩B)−P(A)P(B)|;A∈σ(U),B∈σ(V)}.

Then the α−mixing coefficient for the random field {Yi,t,i∈Ln,t=1,…,T} is defined as

α(k,l,m)=sup{α(U,V),|U|≤k,|V|≤l,d(U,V)≥m}.

Following Bai et al. [40], in an a−dimensional space, we need (a) ∃δ>0s.t.∑m=1∞ma−1α(1,1,m)δ/(2+δ)<∞, (b) For k+l≤4,∑m=1∞ma−1α(k,l,m)<∞, (c) ∃ϵ>0s.t.α(1,∞,m)=O(m−a−ϵ), where k,l,m∈N and d(U,V)=min{∥i−j∥:i∈U,j∈V} is the distance between sets U and V.

For any fixed i1,…,ik∈Ln,k<∞ and t1=0,…,T,

consider U={Yi,t1=yi,t1,…,Yik,t1=yik,t1} and V={Yj,t2=yj,t2;j∈Ln,t2=0,…,T}, then |U|=k and |V|→∞ as n→∞. By Lemma 4, we have P(Yi,t1=yi,t1,Yj,t2=yi,t1)−P(Yi,t1=yi,t1)P(Yj,t2=yj,t2)=0, if d(i,j)>t1+t2. Thus, α(U,V)=0 for any |U|=k, provided that d(U,V)>2T, that is, α(k,∞,m)=0 if m>2T.

This implies all three mixing conditions.

Finally, by Theorem 3 in Jenish and Prucha [31], Uniform Integrability in eq. (10) and mixing condition (a) ensure that the score functions ui,t(y;θ) satisfy a point wise law of large numbers in the sense that

1nL∑i∈Ln∑t=1Tsupθ∈Θ(ui,t(y,θ)−Eui,t(y;θ))→p0,as nL→∞,

for all θ∈Θ.

Proposition 3.

If the regularity assumptions A.1 and A.2 hold, we havenLVn(θ)−1/2(θˆn−θ0)converges in distribution to ap−variate Normal with zero mean vector and identity variance, asnL→∞.

Proof.

First, we show the uniform law of large numbers for ∇un(θ):

(11)supθ∇un(θ)−E∇un(θ)→p0,as nL→∞,

where un(θ)=∇ℓn(θ)/nL as defined in Section 2. Note that

(12)Var∇un(θ)=1nL2Var∑i=1n∑t=1T∇ui,t(θ)=1nL2∑i∈Ln∑t=1TVar∇ui,t(θ)+1nL2∑i∈Ln∑t1=1T∑j∈Lnj≠i∑t2=1t2≠t1TCov∇ui,t1(θ),∇uj,t2(θ)

The first term in eq. (12) is O(nL−1), since Var∇ui,t(θ)≤E∇ui,t(θ)2, which is shown to be finite in the proof of Proposition 2.

For the second term in eq. (12), by Lemma 2 we have

1nL2∑i∈Ln∑t1=1T∑j∈Lnj≠i∑t2=1t2≠t1TCov∇ui,t1(θ),∇uj,t2(θ)≤1nL2∑i∈Ln∑t1=1TT(8T2+4T+1)maxj:d(i,j)≤2Tt2≠t1Cov∇ui,t1(θ),∇uj,t2(θ),

where Cov∇ui,t1(θ),∇uj,t2(θ)≤E∇ui,t1(θ),∇uj,t2(θ)≤E∇ui,t1(θ)2+E∇ui,t2(θ)2 is finite by Lemma 2. Thus, the second term in eq. (12) is also of order O(nL−1) element wise, which means Var∇un(θ)→0 as n→∞. Therefore, eq. (11) follows by Chebyshev’s inequality.

Second, Vn(θ)=1/nLVar∑i∈Ln∑t=1Tuit(θ)=−1/nLE∑i∈Ln∑t=1Tuit(θ)=1/nLHn(θ), which is shown to be positive definite under Assumption A.2 in Proposition 1. Thus, together with uniform Integrability in eq. (10) and the mixing conditions, by Theorem 1 in [31], we have

(13)nLVn(θ)−1/2un(θ)→N(0,Ip)

Finally, by Taylor’s expansion,

(14)un(θˆn)=0=un(θ0)+∇un(θ0)(θˆn−θ0)+12∇2un(θ0)(θ˜n−θ0)2⇒0=nLVn(θ0)−1/2un(θ0)+nLVn(θ)−1/2∇un(θ0)(θˆn−θ0)+12nLVn(θ0)−1/2∇2un(θ˜n)(θˆn−θ0)2,

where θ˜n is a vector with elements between θˆn and θ0. Since θˆn=θ0+op(1) by Proposition 2, we have (θ˜n−θ0)2=(θˆn−θ0)op(1). The second derivative ∇2un(θ) is a p×p×p matrix, with entries being either 0 or λit(c)Si,t−1(c1)Si,t−1(c2)Si,t−1(c3), where i=1,…,n, and t=1,…,T, and c,c1,c2,c3∈C. Due to the structure of λit(c) and Si,t−1(c) in Section 2, all non-zero elements in ∇2un(θ) are monotone with respect to θ. Thus, there exists θs∈Θ such that ∇2un(θs)≥∇2un(θ) for all θ∈Θ. Therefore, we have Esupθ∈Θ∇2un(θ)=supθ∈ΘE∇2un(θ),

which can be shown to be finite by an equivalent analogous to Lemma 3.

Thus, eq. (13) can be written as

0=nLVn(θ0)−1/2un(θ0)+nLVn(θ)−1/2∇un(θ0)+op(1)(θˆn−θ0),

By eq. (11), ∇un(θ0)→pE∇un(θ0)=−Vn(θ0), since ℓn(θ) is the full likelihood. Therefore, by eqs. (12) and (13), we have

nLVn(θ0)1/2(θˆn−θ0)→dN(0,Ip).

References

[1] Medema JP, Vermeulen L. Microenvironmental regulation of stem cells in intestinal homeostasis and cancer. Nature. 2011;474:318–326.10.1038/nature10212Search in Google Scholar PubMed

[2] Besag J. Spatial interaction and the statistical analysis of lattice systems. J Royal Stat Soci Series B Methodol. 1974;192–236.10.1111/j.2517-6161.1974.tb00999.xSearch in Google Scholar

[3] Waller LA, Carlin BP, Xia H, Gelfand AE. Hierarchical spatio-temporal mapping of disease rates. J Am Stat Assoc. 1997;92:607–617.10.1080/01621459.1997.10474012Search in Google Scholar

[4] Knorr-Held L. Bayesian modelling of inseparable space-time variation in disease risk. 1999.Search in Google Scholar

[5] Quick H, Waller LA, Casper M. A multivariate space–time model for analysing county level heart disease death rates by race and sex. J R Stat Soc: Ser C. Appl Stat. 2017.10.1111/rssc.12215Search in Google Scholar

[6] Sans Ó, Schmidt AM, Nobre AA, et al. Bayesian spatio-temporal models based on discrete convolutions. Can J Stat. 2008;36:239–258.10.1002/cjs.5550360205Search in Google Scholar

[7] Quick H, Waller LA, Casper M. Hierarchical multivariate space-time methods for modeling counts with an application to stroke mortality data. arXiv preprint arXiv:1602.04528. 2016.Search in Google Scholar

[8] Cressie N, Wikle CK. Statistics for spatio-temporal data. John Wiley & Sons, 2011.Search in Google Scholar

[9] Cox DR, Gudmundsson G, Lindgren G, Bondesson L, Harsaae E, Laake P, Juselius K, Lauritzen SL. Statistical analysis of time series: Some recent developments [with discussion and reply]. Scand J Stat. 1981;93–115.Search in Google Scholar

[10] Bradley JR, Holan SH, Wikle CK. Multivariate spatio-temporal models for high-dimensional areal data with application to longitudinal employer-household dynamics. Ann Appl Stat. 2015;9:1761–1791.10.1214/15-AOAS862Search in Google Scholar

[11] Bradley JR, Holan SH, Wikle CK. Multivariate spatio-temporal survey fusion with application to the american community survey and local area unemployment statistics. Stat. 2016;5:224–233.10.1002/sta4.120Search in Google Scholar

[12] Shaddick G, Wakefield J. Modelling daily multivariate pollutant data at multiple sites. J R Stat Soc: Ser C. Appl Stat. 2002;51:351–372.10.1111/1467-9876.00273Search in Google Scholar

[13] Wikle CK, Berliner LM, Cressie N. Hierarchical bayesian space-time models. Environ Ecol Stat. 1998;5:117–154.10.1023/A:1009662704779Search in Google Scholar

[14] Holan S, Wikle C. Hierarchical dynamic generalized linear mixed models for discrete-valued spatio-temporal data. Handbook of Discrete–Valued Time Series, 2015.Search in Google Scholar

[15] Mugglin AS, Cressie N, Gemmell I. Hierarchical statistical modelling of influenza epidemic dynamics in space and time. Stat Med. 2002;21:2703–2721.10.1002/sim.1217Search in Google Scholar PubMed

[16] Bradley JR, Holan SH, Wikle CK. Computationally efficient multivariate spatio-temporal models for high-dimensional count-valued data. Bayesian Anal. 2017.10.1214/17-BA1069Search in Google Scholar

[17] Davis RA, Dunsmuir WT, Streett SB. Observation-driven models for poisson counts. Biometrika. 2003;90:777–790.10.1093/biomet/90.4.777Search in Google Scholar

[18] Schrödle B, Held L, Rue H. Assessing the impact of a movement network on the spatiotemporal spread of infectious diseases. Biometrics. 2012;68:736–744.10.1111/j.1541-0420.2011.01717.xSearch in Google Scholar PubMed

[19] Paul M, Held L, Toschke AM. Multivariate modelling of infectious disease surveillance data. Stat Med. 2008;27:6250–6267.10.1002/sim.3440Search in Google Scholar PubMed

[20] Zeger SL, Qaqish B. Markov regression models for time series: a quasi-likelihood approach. Biometrics. 1988;1019–1031.10.2307/2531732Search in Google Scholar PubMed

[21] Fokianos K, Tjøstheim D. Log-linear poisson autoregression. J Multivariate Anal. 2011;102:563–578.10.1016/j.jmva.2010.11.002Search in Google Scholar

[22] Fokianos K, Rahbek A, Tjøstheim D. Poisson autoregression. J Am Stat Assoc. 2009;104:1430–1439.10.1198/jasa.2009.tm08270Search in Google Scholar

[23] Dunsmuir WT, Scott DJ, et al. The glarma package for observation driven time series regression of counts. J Stat Softw. 2015;67:1–36.10.18637/jss.v067.i07Search in Google Scholar

[24] Kedem B, Fokianos K. Regression models for time series analysis, vol. 488. John Wiley & Sons, 2005.Search in Google Scholar

[25] Held L, Höhle M, Hofmann M. A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Modell. 2005;5:187–199.10.1191/1471082X05st098oaSearch in Google Scholar

[26] Paul M, Held L. Predictive assessment of a non-linear random effects model for multivariate time series of infectious disease counts. Stat Med. 2011;30:1118–1136.10.1002/sim.4177Search in Google Scholar PubMed

[27] Knorr-Held L, Richardson S. A hierarchical model for space–time surveillance data on meningococcal disease incidence. J R Stat Soc: Ser C. Appl Stat. 2003;52:169–183.10.1111/1467-9876.00396Search in Google Scholar

[28] Wikle CK, Anderson CJ. Climatological analysis of tornado report counts using a hierarchical bayesian spatiotemporal model. J Geophys Res Atmos. 2003;108.10.1029/2002JD002806Search in Google Scholar

[29] Ferrari D, Vecchia. On robust estimation via pseudo-additive information. Biometrika. 2011;99:238–244.10.1093/biomet/asr061Search in Google Scholar

[30] La Vecchia D, Camponovo L, Ferrari D. Robust heart rate variability analysis by generalized entropy minimization. Comput Stat Data Anal. 2015;82:137–151.10.1016/j.csda.2014.09.001Search in Google Scholar

[31] Jenish N, Prucha IR. Central limit theorems and uniform laws of large numbers for arrays of random fields. J Econom. 2009;150:86–98.10.1016/j.jeconom.2009.02.009Search in Google Scholar PubMed PubMed Central

[32] Leroux BG, Lei X, Breslow N. Estimation of disease rates in small areas: a new mixed model for spatial dependence. In: Statistical models in epidemiology, the environment clinical trials, 179–191. Springer, 2000.Search in Google Scholar

[33] Lee D. Carbayes: An r package for bayesian spatial modeling with conditional autoregressive priors. J Stat Softw. 2013;55:1–24.10.18637/jss.v055.i13Search in Google Scholar

[34] Kalluri R, Zeisberg M. Fibroblasts in cancer. Nat Rev Cancer. 2006;6:392–401.10.1038/nrc1877Search in Google Scholar PubMed

[35] Tabassum DP, Polyak K. Tumorigenesis: it takes a village. Nat Rev Cancer. 2015;15:473–483.10.1038/nrc3971Search in Google Scholar PubMed

[36] Koenker R. Quantile regression. No. 38, Cambridge university press, 2005.10.1017/CBO9780511754098Search in Google Scholar

[37] Garden GA, La Spada AR. Intercellular (mis) communication in neurodegenerative disease. Neuron. 2012;73:886–901.10.1016/j.neuron.2012.02.017Search in Google Scholar PubMed PubMed Central

[38] Leoni G, Neumann P, Sumagin R, et al. Wound repair: role of immune–epithelial interactions. Mucosal Immunol. 2015;8:959–968.10.1038/mi.2015.63Search in Google Scholar PubMed PubMed Central

[39] Bradic J, Fan J, Wang W. Penalized composite quasi-likelihood for ultrahigh dimensional variable selection. J R Stat Soc Series B Stat Methodol. 2011;73:325–349.10.1111/j.1467-9868.2010.00764.xSearch in Google Scholar PubMed PubMed Central

[40] Bai Y, Song PX, Raghunathan T. Joint composite estimating functions in spatiotemporal models. J R Stat Soc Series B Stat Methodol. 2012;74:799–824.10.1111/j.1467-9868.2012.01035.xSearch in Google Scholar

Received: 2017-07-20

Revised: 2018-05-08

Accepted: 2018-06-19

Published Online: 2018-07-07

A Spatio-Temporal Model and Inference Tools for Longitudinal Count Data on Multicolor Cell Growth

Abstract

1 Introduction

2 Methods

2.1 Multicolor spatial autoregressive model on the lattice

2.2 Likelihood inference

2.3 Asymptotic properties and standard errors

3 Monte Carlo simulations

4 Analysis of the cancer cell growth data

4.1 Cancer cell-fibroblast co-culture experiment

5 Conclusion and final remarks

Acknowledgements

Appendix

Lemma 1.

Proof.

Lemma 2

Proof

Lemma 3.

Proof.

Lemma 4.

Proof.

Proposition 1

Proof.

Proposition 2

Proof.

Proposition 3.

Proof.

References

Journal and Issue

Articles in the same Issue