Domain selection for the varying coefficient model via local polynomial regression
Introduction
The varying coefficient model (Cleveland et al., 1991, Hastie and Tibshirani, 1993) assumes that the covariate effect may vary depending on the value of an underlying variable, such as time. It has been used in a variety of applications, such as longitudinal data analysis, and is given by where the predictor vector represents features, and correspondingly, denotes the effect of different features over the domain of the variable . is the response we are interested in and denotes the random error satisfying and .
The varying coefficient model has been extensively studied. Many methods have been proposed to estimate its parameters. The first group of estimation methods are based on local polynomial smoothing. Examples include, but are not limited to, Fan and Gijbels (1996), Wu et al. (1998), Hoover et al. (1998), Kauermann and Tutz (1999) and Fan and Zhang (2008). The second is polynomial splines-based methods, such as Huang et al., 2002, Huang et al., 2004, Huang and Shen (2004) and references therein. The last group is based on smoothing splines as introduced by Hastie and Tibshirani (1993), Hoover et al. (1998), Chiang et al. (2001) and many others. In this paper, we not only consider estimation for the varying coefficient model, but also wish to identify the regions in the domain of where predictors have an effect and the regions where they may not. This is similar, although different than variable selection, as selection methods attempt to decide whether a variable is active or not while our interest focuses on identifying regions.
For variable selection in a traditional linear model, various shrinkage methods have been developed. They include least absolute shrinkage and selection operator (LASSO) (Tibshirani, 1996), Smoothly Clipped Absolute Deviation (SCAD) (Fan and Li, 2001), adaptive LASSO (Zou, 2006) and excessively others. Although the LASSO penalty gives sparse solutions, it leads to biased estimates for large coefficients due to the linearity of the L1 penalty. To remedy this bias issue, Fan and Li (2001) proposed the SCAD penalty and showed that the SCAD penalized estimator enjoys the oracle property in the sense that not only it can select the correct submodel consistently, but also the asymptotic covariance matrix of the estimator is the same as that of the ordinary least squares estimate as if the true subset model is known as a priori. To achieve the goal of variable selection for group variables, Yuan and Lin (2006) developed the group LASSO penalty which penalized coefficients as a group in situations such as a factor in analysis of variance. As with the LASSO, the group LASSO estimators do not enjoy the oracle property. As a remedy, Wang et al. (2007) proposed the group SCAD penalty, which again selects variables in a group manner.
For the varying coefficient model, some existing works focus on identifying the nonzero coefficient functions, which achieves component selection for the varying coefficient functions. However, each estimated coefficient function is either zero everywhere or nonzero everywhere. For example, Wang et al. (2008) considered the varying coefficient model under the framework of a -spline basis and used the group SCAD to select the significant coefficient functions. Wang and Xia (2009) combined local constant regression and the group SCAD penalization together to select the components, while Leng (2009) directly applied the component selection and smoothing operator (Lin and Zhang, 2006).
In this paper, we consider a different problem: detecting the nonzero regions for each component of the varying coefficient functions. Specifically, we aim to estimate the nonzero domain of each , which corresponds to the regions where the th component of has an effect on . To this end, we incorporate local polynomial smoothing together with penalized regression. More specifically, we combine local linear smoothing and group SCAD shrinkage method into one framework, which estimates not only the function coefficients but also their nonzero regions. The proposed method involves two tuning parameters, namely the bandwidth used in local polynomial smoothing and the shrinkage parameter used in the regularization method. We propose methods to select these two tuning parameters. Our theoretical results show that the resulting estimators have the same asymptotic bias and variance as the original local polynomial regression estimators.
The rest of paper is organized as follows. Section 2 reviews the local polynomial estimation for the varying coefficient model. Section 3 describes our methodology including the penalized estimation method and tuning procedure. Asymptotic properties are presented in Section 4. Simulation examples in Section 5 are used to evaluate the finite-sample performance of the proposed method. In Section 6, we apply our methods to the real data. We conclude with some discussions in Section 7.
Section snippets
Local polynomial regression for the varying coefficient model
Suppose we have independent and identically distributed (i.i.d.) samples from the population satisfying model (1). As is a vector of unspecified functions, a smoothing method must be incorporated for its estimation. In this article, we adopt the local linear smoothing for this varying coefficient model (Fan and Zhang, 1999). For in a small neighborhood of , we can approximate the function , by a linear function For a
Penalized local polynomial regression estimation
In practice, it can be of certain interest to detect the nonzero region of each function component of the vector . To achieve this goal, shrinkage methods can be applied. Notice that for as long as for . Consequently if the function estimates are zero over certain regions, the corresponding derivative estimates should also be zero. Consequently, we treat as a group and do penalization together for each . To achieve variable selection as well as
Asymptotic properties
Without loss of generality, we assume that only the first components of are nonzero. Denote and . Denote for and . Recall that , which is the effective sample size. Our objective function can be written as
Denote . We first state the following conditions:
Conditions (A)
- (A1)
The bandwidth satisfies and .
- (A2)
, where
Simulation example
Simulation studies are conducted to examine the performance of our penalized local polynomial regression approach and compare it with that of the local polynomial regression method. Specifically, local linear approximation is used for our proposed approach and the unpenalized local polynomial comparison approach.
Real data application
We apply our method to the Boston housing data, which has been analyzed by various authors, see for instance Harrison and Rubinfeld (1978), Belsley et al. (1980) and Ibacache-Pulgar et al. (2013). The data set is based on the 1970 US census and consists of the median value of owner-occupied homes for 506 census tracks in Boston area. The aim of the study is to find the association between the median house value (MHV) and various predictors. We treat the median house value as the response and
Conclusion and discussions
In this paper, we propose the domain selection for the varying coefficient model using penalized local polynomial regression. Our method can identify the zero regions for each coefficient function component and perform estimation simultaneously. We further proved that our estimator enjoys the oracle property in the sense that they have the same asymptotic distribution as the local polynomial estimates as if the true sparsity is known. We have evaluated our method using both simulation examples
Acknowledgments
The authors are grateful for support from NIHP01-CA142538 and NSFDMS-1308400.
References (29)
- et al.
Hedonic housing prices and the demand for clean air
J. Environ. Econ. Manag.
(1978) A simple approach for varying-coefficient model selection
J. Statist. Plann. Inference
(2009)- et al.
Variable bandwidth selection in varying-coefficient models
J. Multivariate Anal.
(2000) - et al.
- et al.
Smoothing spline estimation for varying coefficient models with repeatedly measured dependent variables
J. Amer. Statist. Assoc.
(2001) - et al.
Local regression models
- et al.
Variable bandwidth and local linear regression smoothers
Ann. Statist.
(1992) - et al.
Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation
J. R. Stat. Soc. Ser. B
(1995) - et al.
- et al.
Profile likelihood inferences on semiparametric varying-coefficient partially linear models
Bernoulli
(2005)
Variable selection via nonconcave penalized likelihood and its oracle properties
J. Amer. Statist. Assoc.
Statistical estimation in varying coefficient models
Ann. Statist.
Statistical methods with varying coefficient models
Stat. Interface
Varying-coefficient models
J. R. Stat. Soc. Ser. B
Cited by (7)
Non-separable models with high-dimensional data
2019, Journal of EconometricsIntegrating biodiversity and ecosystem services to identify ecological priorities for an ecologically fragile city in China
2023, Land Degradation and DevelopmentRobust estimation and outlier detection for varying-coefficient models via penalized regression
2022, Communications in Statistics: Simulation and ComputationMix local polynomial and spline truncated: The development of nonparametric regression model
2018, Journal of Physics: Conference Series