Multivariate regression model selection from small samples using Kullback's symmetric divergence
Introduction
Multivariate regression model also known as multiresponse regression model is a common model for describing the relationships between multiple variables. Parameter estimation is an important step in multivariate modelling. Usually, the parameter estimation is carried out by maximum likelihood or by least squares procedures, given the order of the model. A companion to the problem of parameter estimation is the problem of model selection, which consists of choosing an appropriate model from a class of candidate models to characterize the data at hand. Such determination is often facilitated by the use of a model selection criterion where one only has to evaluate two simple terms that trade off data fitting and model's complexity. In this paper, the problem of multivariate regression model selection when the sample size is small is considered. Multivariate modelling has been used successfully in a variety of applications among them speech modelling and synthesis, adaptive control and source number detection. The difficulty when working with multivariate regression model comes from the rapid increase in parameter counts as model complexity increases in comparison to the univariate regression model. This rapid increase causes many model selection criteria to perform poorly [1].
A model selection criterion can be designed to estimate an expected overall discrepancy, a quantity which reflects the degree of similarity between a fitted approximating model and the generating or “true” model. Estimation of Kullback's information [2] is the key to deriving the Akaike Information Criterion (AIC) [3], which is the first model selection criterion that has gained widespread acceptance. From the estimation of Kullback's symmetric divergence [4] follows the Kullback Information Criterion (KIC) [5], a recently developed model selection criterion. Many other criteria based on other principles (or that use other measures for model quality) have been introduced and studied from which, the Bayesian Information Criterion, BIC [6], or the Minimum Description Length, MDL [7], that are based, respectively, on Bayesian and coding theory. No single model selection criterion will always be better than another; certain criteria perform best for specific model types.
KIC serves as an asymptotically unbiased estimator of a variant of the Kullback symmetric divergence between the generating model and the fitted approximating model under the assumption that the true model is correctly specified or overfitted. As the dimension of the candidate model, k, increases compared to the sample size, n, KIC becomes a strongly negatively biased estimate of the variant of the Kullback symmetric divergence and leads to the choice of over-parameterized models. A bias corrected version of KIC, denoted , where in this case “c” stands for correction, has been recently proposed for univariate linear regression and autoregressive models [8]. This modified or corrected version does not only produce a bias reduction but also improves strongly model selection in small sample settings. Yet the basic form of is similar to that of KIC, meaning that the improvement in selection performance comes without an increase in computational cost.
The correction of KIC proposed in [8] is not appropriate for multivariate regression models. In this case, will produce a biased estimate of the Kullback symmetric divergence and will lead to underfitting due to the choice of an under-parameterized model. The reason is that the multivariate regression models contain more unknown parameters than the corresponding univariate regression models.
In this paper, a new information criterion, denoted , where the notation “vc” stands for vector correction, is proposed as a bias correction of KIC for multivariate regression models. This is achieved by using similar arguments, used in the derivation of [1] and . The bias of , in comparison to KIC is dramatically reduced leading to improved order selection as shown by a simulation example. is shown to outperform classical criteria in a small sample multivariate regression model selection.
The remainder of this paper is organized as follows. In Section 2, we briefly review KIC and its corrected version . In Section 3, we introduce the bias corrected version of KIC, for multivariate regression model selection and its practical used approximation. In Section 4 we examine the probabilities of overfitting both in small sample and asymptotically of the proposed criterion. is compared with other criteria in a simulation experiment described in Section 5. Concluding remarks are given in Section 6. A theoretical justification of the proposed criterion is also presented.
Section snippets
A brief review of the derivation of KIC and
Suppose a collection of data has been generated according to an unknown parametric model . We try to find a parametric model which provides a suitable approximation for .
Let denote a -dimensional parametric family, where consists of k elements that correspond to the model's parameters and an additional parameter, which corresponds to the noise variance. Let denote the vector of parameters estimate obtained by maximizing the likelihood
Derivation of
In order to study multivariate model selection, we need first to define the true or generating model and the approximating or fitted model.
Suppose that the generating model for the vector responses data , is given bywhere the rows of the matrix of dimension correspond to p response variables on each of n individuals, is an design matrix, and is an matrix of unknown regression parameters. The rows of the error matrix of dimension are assumed to be
Overfitting properties
An overfitted model is defined as a model that has more parameters than the optimal model. Under consistency, the optimal model is the true model, where as under asymptotic efficiency it corresponds to the closest to the true model. Overfitting is analysed here by comparing an overfitted model of order to the reduced model of order . The overfitting properties of the criteria KIC and are described here through the probabilities of overfitting both in small samples and
Simulation results
To investigate the effectiveness of the proposed model selection criterion, we consider the problem of selecting the order of a bivariate regression model from a small sample data set in a simple example. A thousand simulated realizations of sample sizes and were generated from each of the two models
Model 1:
Model 2:There were 10 candidate models stored in an matrix with a
Conclusion
The results in Section 5 suggest that should function as an effective multivariate regression model selection criterion in small sample applications. has two major strength: first, it is based on , which provides an additional information about model's dissimilarity compared with , originally used to derive AIC-based criteria. Second, is an unbiased estimator of instead of an asymptotically unbiased as for KIC. This makes outperforming KIC in
References (21)
A large-sample model selection criterion based on Kullback's symmetric divergence
Statist. Probab. Lett.
(1999)Modeling by shortest data description
Automatica
(1978)- et al.
A criterion for model selection in the presence of incomplete data based on Kullback's symmetric divergence
Signal Processing
(2005) - et al.
An Akaike information criterion for model selection in the presence of incomplete data
J. Statist. Plann. Inference
(1998) - et al.
Model selection for multivariate regression in small samples
Biometrics
(1994) - et al.
On information and sufficiency
Ann. Math. Statist.
(1951) A new look at the statistical model identification
IEEE Trans. Automat. Control
(1974)An invariant form of the prior probability in estimation problems
J. Roy. Statist. Soc.
(1946)Estimating the dimension of a model
Ann. Statist.
(1978)- et al.
A small sample model selection criterion based on the Kullback symmetric divergence
IEEE Trans. Signal Process.
(2004)
Cited by (12)
The weighted average information criterion for multivariate regression model selection
2013, Signal ProcessingCitation Excerpt :Consequently, KICvc has smaller probability of overfitting than does MAICc. It can be easily seen that K is asymptotically equivalent to 3A/2 and KICvc is more consistent than MAICc [2]. Moreover, KICvc in small samples offers a good compromise between the risks of overfitting and underfitting, i.e., creates a balance between both risks [2,29].
Two-dimensional DOA estimation of coherent signals using acoustic vector sensor array
2012, Signal ProcessingRadar detection theory of sliding window processes
2017, Radar Detection Theory of Sliding Window ProcessesKullback–Leibler divergence and the Pareto–Exponential approximation
2016, SpringerPlusA Model-Free De-Drifting Approach for Detecting BOLD Activities in fMRI Data
2015, Journal of Signal Processing Systems
- 1
National ICT Australia is funded by the Australian Department of Communications, Information Technology and the Arts and the Australian Research Council through Backing Australia's Ability and the ICT Center of Excellence Program.