Elsevier

Signal Processing

Volume 86, Issue 8, August 2006, Pages 2074-2084
Signal Processing

Multivariate regression model selection from small samples using Kullback's symmetric divergence

https://doi.org/10.1016/j.sigpro.2005.10.009Get rights and content

Abstract

The Kullback Information Criterion, KIC, and its univariate bias-corrected version, KICc, are two new developed criteria for model selection. The two criteria can be viewed as estimators of the expected Kullback symmetric divergence and they have a fixed bias corrected term. In this paper, a new small sample model selection criterion for multivariate regression models is developed. The proposed criterion is named KICvc, where the notation “vc” stands for vector correction, and it can be considered as an extension of KIC for multivariate regression models. KICvc adjusts KIC to be an unbiased estimator for the variant of the Kullback symmetric divergence, assuming that the true model is correctly specified or overfitted. Furthermore, KICvc provides better multivariate regression model order or dimension choices than KIC in small samples. Simulation results shows that the proposed criterion estimates the model order more accurately than other asymptotically efficient methods when applied to multivariate regression model selection in small samples. As a result, KICvc serves as an effective tool for selecting a multivariate regression model of appropriate dimension. A theoretical justification of the proposed criterion is presented.

Introduction

Multivariate regression model also known as multiresponse regression model is a common model for describing the relationships between multiple variables. Parameter estimation is an important step in multivariate modelling. Usually, the parameter estimation is carried out by maximum likelihood or by least squares procedures, given the order of the model. A companion to the problem of parameter estimation is the problem of model selection, which consists of choosing an appropriate model from a class of candidate models to characterize the data at hand. Such determination is often facilitated by the use of a model selection criterion where one only has to evaluate two simple terms that trade off data fitting and model's complexity. In this paper, the problem of multivariate regression model selection when the sample size is small is considered. Multivariate modelling has been used successfully in a variety of applications among them speech modelling and synthesis, adaptive control and source number detection. The difficulty when working with multivariate regression model comes from the rapid increase in parameter counts as model complexity increases in comparison to the univariate regression model. This rapid increase causes many model selection criteria to perform poorly [1].

A model selection criterion can be designed to estimate an expected overall discrepancy, a quantity which reflects the degree of similarity between a fitted approximating model and the generating or “true” model. Estimation of Kullback's information [2] is the key to deriving the Akaike Information Criterion (AIC) [3], which is the first model selection criterion that has gained widespread acceptance. From the estimation of Kullback's symmetric divergence [4] follows the Kullback Information Criterion (KIC) [5], a recently developed model selection criterion. Many other criteria based on other principles (or that use other measures for model quality) have been introduced and studied from which, the Bayesian Information Criterion, BIC [6], or the Minimum Description Length, MDL [7], that are based, respectively, on Bayesian and coding theory. No single model selection criterion will always be better than another; certain criteria perform best for specific model types.

KIC serves as an asymptotically unbiased estimator of a variant of the Kullback symmetric divergence between the generating model and the fitted approximating model under the assumption that the true model is correctly specified or overfitted. As the dimension of the candidate model, k, increases compared to the sample size, n, KIC becomes a strongly negatively biased estimate of the variant of the Kullback symmetric divergence and leads to the choice of over-parameterized models. A bias corrected version of KIC, denoted KICc, where in this case “c” stands for correction, has been recently proposed for univariate linear regression and autoregressive models [8]. This modified or corrected version does not only produce a bias reduction but also improves strongly model selection in small sample settings. Yet the basic form of KICc is similar to that of KIC, meaning that the improvement in selection performance comes without an increase in computational cost.

The correction of KIC proposed in [8] is not appropriate for multivariate regression models. In this case, KICc will produce a biased estimate of the Kullback symmetric divergence and will lead to underfitting due to the choice of an under-parameterized model. The reason is that the multivariate regression models contain more unknown parameters than the corresponding univariate regression models.

In this paper, a new information criterion, denoted KICvc, where the notation “vc” stands for vector correction, is proposed as a bias correction of KIC for multivariate regression models. This is achieved by using similar arguments, used in the derivation of AICc [1] and KICc. The bias of KICvc, in comparison to KIC is dramatically reduced leading to improved order selection as shown by a simulation example. KICvc is shown to outperform classical criteria in a small sample multivariate regression model selection.

The remainder of this paper is organized as follows. In Section 2, we briefly review KIC and its corrected version KICc. In Section 3, we introduce the bias corrected version of KIC, KICvc for multivariate regression model selection and its practical used approximation. In Section 4 we examine the probabilities of overfitting both in small sample and asymptotically of the proposed criterion. KICvc is compared with other criteria in a simulation experiment described in Section 5. Concluding remarks are given in Section 6. A theoretical justification of the proposed criterion is also presented.

Section snippets

A brief review of the derivation of KIC and KICc

Suppose a collection of data yn=(y1,,yn) has been generated according to an unknown parametric model p(y|θ0). We try to find a parametric model which provides a suitable approximation for p(y|θ0).

Let Mk={p(y|θk)|θkΘk} denote a (k+1)-dimensional parametric family, where θk consists of k elements that correspond to the model's parameters and an additional parameter, which corresponds to the noise variance. Let θ^k denote the vector of parameters estimate obtained by maximizing the likelihood

Derivation of KICvc

In order to study multivariate model selection, we need first to define the true or generating model and the approximating or fitted model.

Suppose that the generating model for the vector responses data y1,,yn, yRp is given byY=Xβ0+U0,where the rows of the matrix Y of dimension n×p correspond to p response variables on each of n individuals, X is an n×k0 design matrix, and β0 is an k0×p matrix of unknown regression parameters. The rows of the error matrix U0 of dimension n×p are assumed to be

Overfitting properties

An overfitted model is defined as a model that has more parameters than the optimal model. Under consistency, the optimal model is the true model, where as under asymptotic efficiency it corresponds to the closest to the true model. Overfitting is analysed here by comparing an overfitted model of order k0+l to the reduced model of order k0. The overfitting properties of the criteria KIC and KICvc are described here through the probabilities of overfitting both in small samples and

Simulation results

To investigate the effectiveness of the proposed model selection criterion, we consider the problem of selecting the order of a bivariate (p=2) regression model from a small sample data set in a simple example. A thousand simulated realizations of sample sizes n=25 and n=40 were generated from each of the two models

Model 1:yi=11+11xi,1+1/41xi,2+ε0i,ε0iN(0,Σ0),Σ0=1001,

Model 2:yi=11+11xi,1+1/41xi,2+01xi,3+ε0i,ε0iN(0,Σ0),Σ0=1001.There were 10 candidate models stored in an n×10 matrix X with a

Conclusion

The results in Section 5 suggest that KICvc should function as an effective multivariate regression model selection criterion in small sample applications. KICvc has two major strength: first, it is based on Jn(θ0,θk), which provides an additional information about model's dissimilarity compared with In(θ0,θk), originally used to derive AIC-based criteria. Second, KICvc is an unbiased estimator of Jn(θ0,θ^k) instead of an asymptotically unbiased as for KIC. This makes KICvc outperforming KIC in

References (21)

There are more references available in the full text version of this article.

Cited by (12)

  • The weighted average information criterion for multivariate regression model selection

    2013, Signal Processing
    Citation Excerpt :

    Consequently, KICvc has smaller probability of overfitting than does MAICc. It can be easily seen that K is asymptotically equivalent to 3A/2 and KICvc is more consistent than MAICc [2]. Moreover, KICvc in small samples offers a good compromise between the risks of overfitting and underfitting, i.e., creates a balance between both risks [2,29].

  • Radar detection theory of sliding window processes

    2017, Radar Detection Theory of Sliding Window Processes
View all citing articles on Scopus
1

National ICT Australia is funded by the Australian Department of Communications, Information Technology and the Arts and the Australian Research Council through Backing Australia's Ability and the ICT Center of Excellence Program.

View full text