Minimum message length analysis of multiple short time series

doi:10.1016/j.spl.2015.09.021

Statistics & Probability Letters

Volume 110, March 2016, Pages 318-328

https://doi.org/10.1016/j.spl.2015.09.021 Get rights and content

Abstract

This paper applies the Bayesian minimum message length principle to the multiple short time series problem, yielding satisfactory estimates for all model parameters as well as a test for autocorrelation. Connections with the method of conditional likelihood are also discussed.

Introduction

Consider data $Y = {(y_{1}, \dots, y_{m})}^{'} \in R^{m \times n}$ comprised of $m$ sequences $y_{i} = {(y_{i, 1}, \dots, y_{i, n})}^{'} \in R^{n}$ generated by the following stationary first order Gaussian autoregressive model: $y_{i, j} = μ_{i} + ε_{i, j},$ $ε_{i, j} = ρ ε_{i, j - 1} + v_{i, j},$ where $(i = 1, \dots, m; j = 1, \dots, n)$ , $μ = {(μ_{1}, \dots, μ_{m})}^{'} \in R^{m}$ are the sequence means, $ρ \in (- 1, 1)$ is a common autoregressive parameter and $v_{i, j}$ denotes the innovations which are independently and identically distributed as $N (0, τ)$ . The starting point of this paper is to make inferences about the parameters $θ = {(μ, ρ, τ)}^{'} \in R^{m + 2}$ given data sampled from the model (1)–(2). The sequences are considered exchangeable, in the sense that inferences made about the model parameters should be invariant under the interchange of any pair of sequences in the matrix $Y$ . This model appears frequently in epidemiological and medical studies in which several measurements have been made over time on a large number of people. In this case, the autocorrelation parameter $ρ$ is of particular interest, as it represents how well the physical quantity “tracks” over time.

Making inferences about $ρ$ in this setting is complicated by the fact that the number of parameters grows with the number of sequences $m$ and a straightforward application of the maximum likelihood principle leads to inconsistent estimates of both $τ$ and $ρ$ . A likelihood-based solution to the problem of estimating $ρ$ in the model (1)–(2) using the method of approximate conditional likelihood was presented in Cruddas et al. (1989) and shown to yield significant improvements over the standard maximum likelihood estimates. Two frequentist test procedures for the presence of autocorrelation are also discussed in Cox and Solomon (1988).

A solution within the Bayesian framework of inference would be of great value. Unfortunately, with the choice of sensible priors that reflect the invariance properties required of the problem, the usual method of analysing the posterior distribution formed from the product of the prior distribution and likelihood is unsatisfactory. The posterior distribution does not concentrate probability mass around the true parameter values even as the number of sequences $m \to \infty$ , and parameter estimation based on this posterior is subsequently inconsistent. This paper demonstrates that estimation based on the alternative information-theoretic Bayesian principle of minimum message length (Wallace, 2005) leads to satisfactory estimates of all parameters $θ$ as well as providing a simple basis for testing for autocorrelation.

This paper has three aims: (1) to produce satisfactory point estimates for all parameters of the first order Gaussian autoregressive model, (2) to produce a suitable test for autocorrelation, and (3) demonstrate the resolution of a difficult estimation problem using the minimum message length principle.

Section snippets

Minimum message length

The minimum message length (MML) principle (Wallace, 2005, Wallace and Boulton, 1968, Wallace and Freeman, 1987) is a Bayesian principle for inductive inference based on information theory. The essential idea behind the minimum message length principle is that compressing data is equivalent to learning structure in the data. The key measure of the quality of fit of a model to data is the length of the data after it has been compressed by the model under consideration. As the compressed data

Wallace–Freeman estimates

Inference using the Wallace–Freeman estimator requires specifying a likelihood function, the corresponding Fisher information matrix and prior densities over all parameters. In the multiple short time series setting specified by (1)–(2), the negative log-likelihood function for the parameters $θ = {(μ, ρ, τ)}^{'} \in R^{m + 2}$ is $- log p (Y | θ) = \frac{m n}{2} log 2 π τ - \frac{m}{2} log (1 - ρ^{2}) + \frac{1}{2 τ} \sum_{i = 1}^{m} T_{i} (μ_{i}, ρ),$ where $T_{i} (μ_{i}, ρ) = \sum_{j = 1}^{n} {(y_{i j} - μ_{i})}^{2} + ρ^{2} \sum_{j = 2}^{n - 1} {(y_{i j} - μ_{i})}^{2} - 2 ρ \sum_{j = 2}^{n} (y_{i j} - μ_{i}) (y_{i j - 1} - μ_{i}) .$ The determinant of the Fisher information matrix for $θ$ is $| J$

Estimation of $ρ$

The Wallace–Freeman estimates $\hat{ρ}$ of $ρ$ were compared against the approximate conditional likelihood estimates ${\hat{ρ}}_{λ}$ derived in Cruddas et al. (1989) and the regular maximum likelihood estimates. Due to the translation invariance of the estimates for $μ$ given by (12), and the fact that the estimates for $ρ$ and $τ$ are based on the residuals $(y_{i j} - {\hat{μ}}_{i})$ for both the Wallace–Freeman procedure and approximate conditional likelihood, the particular choice of $μ_{i}$ will have no effect on the behaviour of the

Comparison with approximate conditional likelihood

The approximate conditional likelihood procedure can be shown to be equivalent to a restricted Wallace–Freeman estimator. In particular, the approximate conditional likelihood estimate for a parameter of interest $ψ$ orthogonal to nuisance parameters $λ$ can be written as $\hat{ψ} = arg min_{ψ} {I_{87} (y, ψ, {\hat{λ}}_{ML} (ψ))}$ with prior densities of $(ψ, λ)$ $π_{λ} (λ) \propto 1,$ $π_{ψ} (ψ) \propto J_{ψ} {(ψ)}^{(1 / 2)},$ where ${\hat{λ}}_{ML} (ψ)$ is the maximum likelihood estimate of the orthogonalised parameters given $ψ$ . Although this estimate is permissible in the sense that

References (15)

J.O. Berger et al.
Noninformative priors and Bayesian testing for the AR(1) model
Econ. Theory
(1994)
J.H. Conway et al.
Sphere Packing, Lattices and Groups
(1998)
D.R. Cox et al.
On testing for serial correlation in large numbers of small samples
Biometrika
(1988)
A.M. Cruddas et al.
A time series illustration of approximate conditional likelihood
Biometrika
(1989)
D.L. Dowe et al.
Resolving the Neyman–Scott problem by minimum message length
P. Druilhet et al.
Invariant HPD credible sets and MAP estimators
Bayesian Anal.
(2007)
D.F. Schmidt
A new message length formula for parameter estimation and model selection

There are more references available in the full text version of this article.

Cited by (5)

Minimum Message Length Inference of the Weibull Distribution with Complete and Censored Data
2024, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
MINIMUM MESSAGE LENGTH INFERENCE OF THE WEIBULL DISTRIBUTION WITH COMPLETE AND CENSORED DATA
2022, arXiv
Minimum message length inference of the exponential distribution with type i censoring
2021, Entropy
MML is not consistent for neyman-scott
2020, IEEE Transactions on Information Theory
Risk-averse estimation, an axiomatic approach to inference, and Wallace-Freeman without MML
2018, arXiv

View full text

Minimum message length analysis of multiple short time series

Abstract

Introduction

Section snippets

Minimum message length

Wallace–Freeman estimates

Estimation of ρ

Comparison with approximate conditional likelihood

Noninformative priors and Bayesian testing for the AR(1) model

Econ. Theory

Sphere Packing, Lattices and Groups

On testing for serial correlation in large numbers of small samples

Biometrika

A time series illustration of approximate conditional likelihood

Biometrika

Resolving the Neyman–Scott problem by minimum message length

Invariant HPD credible sets and MAP estimators

Bayesian Anal.

A new message length formula for parameter estimation and model selection

Estimation of $ρ$