Next Article in Journal
Applying a Hybrid Sequential Model to Chinese Sentence Correction
Next Article in Special Issue
Extended Exponential Regression Model: Diagnostics and Application to Mineral Data
Previous Article in Journal
New Oscillation Results for Second-Order Neutral Differential Equations with Deviating Arguments
Previous Article in Special Issue
Modeling of Extreme Values via Exponential Normalization Compared with Linear and Power Normalization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Parametric Quantile Regression Model for Asymmetric Response Variables on the Real Line

1
Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1530000, Chile
2
Departamento de Estatística, Universidade Federal do Rio Grande do Norte, Natal, RN 59000-000, Brazil
3
Departamento de Matemáticas, Escuela Superior Politécnica del Litoral, ESPOL, Guayaquil 090150, Ecuador
4
Departamento de Matemáticas, Facultad de Ciencias Básicas, Universidad de Antofagasta, Antofagasta 1240000, Chile
*
Author to whom correspondence should be addressed.
Symmetry 2020, 12(12), 1938; https://doi.org/10.3390/sym12121938
Submission received: 30 October 2020 / Revised: 16 November 2020 / Accepted: 19 November 2020 / Published: 25 November 2020

Abstract

:
In this paper, we introduce a novel parametric quantile regression model for asymmetric response variables, where the response variable follows a power skew-normal distribution. By considering a new convenient parametrization, these distribution results are very useful for modeling different quantiles of a response variable on the real line. The maximum likelihood method is employed to estimate the model parameters. Besides, we present a local influence study under different perturbation settings. Some numerical results of the estimators in finite samples are illustrated. In order to illustrate the potential for practice of our model, we apply it to a real dataset.

1. Introduction

Frequently, in real life, we find continuous data on the real line that are asymmetrical; these data cannot be modeled by known symmetric distributions as the normal, Student-t, Cauchy, Laplace, and logistic distributions. It is therefore more interesting to propose more flexible models that will be useful for modeling highly skewed data which arises in several areas.
In this context, the seminal work in Azzalini [1] introduces a skew-symmetric family of distributions, where this last is established by using a symmetric distribution as a kernel. When this last follows a normal distribution, it rises the well-know skew-normal (SN) distribution. The SN distribution has a skewness parameter which makes possible to have a reasonable model for a skewed distribution. Furthermore, the SN distributions include the normal distribution and possesses several properties which coincide or are similar to the ones of the normal distribution (Azzalini [1,2]). However, the SN distribution is limited in terms of flexibility, that is, for moderate values of the skewness parameter nearly all the mass accumulates either on the positive or negative real line, as determined by the sign of the skewness parameter. In such cases, the SN distribution closely resembles the half-normal density, with a nearly linear shape in the side with smaller mass (Arellano-Valle et al. [3]).
Another alternative to model skewed data is using the family of power-symmetric distributions (see Pewsey et al. [4]) of which the most widely used is the power-normal (PN) distribution. Some references where this family is discussed are Lehmann [5], Durrans [6], Gupta and Gupta [7], Castillo et al. [8], among others. In a series of papers by Martínez-Flórez et al. ([9,10,11,12,13]) extensions and applications of the PN distribution can be found.
An unification of the SN and PN distributions was proposed by Martínez-Flórez et al. [11], namely the power skew-normal (PSN) which is a generalization of the SN and PN distributions. Even though sample information about the SN distribution has been widely studied, there is not the same scope for the PSN distribution, which being a generalization of the first one, has characteristics of interest such as: (i) the SN and PN distributions as particular cases, and (ii) the PSN distribution provides greater range for skewness and kurtosis coefficients compared with the SN distribution (see Table 1), being more flexible to model highly skewed data, which arises frequently in many practical situations. However, the expectation and variance of the PSN distribution cannot be expressed in closed form (have complicated forms), which makes these distributions unsuitable for regression modeling (Martínez-Flórez et al. [11]). Fortunately, the cumulative distribution function (cdf) of the PSN distribution has a simple form that depends on Owen’s T function (to be defined in the next section). This facilitates the calculation of the quantile function (inverse of the cdf), allowing its utilization in the quantile regression (QR) framework. Quantile regression quantifies the association of the explanatory variables with a given quantile of a dependent variable. In this study, we propose a quantile linear regression model based on the PSN distribution, adopting a new parametrization of this model indexed by the quantile, precision and shape parameters. In particular, for this work, inference is conducted via maximum likelihood.
The rest of the paper proceeds as follows. In Section 2, we introduce a new parameterization of the PSN distribution that is indexed by the location, precision and shape parameters and its association with a quantile regression model. In addition, elements related to the maximum likelihood (ML) method are presented as well. Section 3 presents local influence measures under three different perturbation schemes, whereas in Section 4 a real data analysis is conducted in order to show the applicability of our proposed reparametrized PSN (RPSN) based QR model. Final section summarizes the contributions of the paper.

2. A PSN Distribution Parameterized by Its Quantile Parameter, and Its Associated Quantile Regression Model

In this section, we briefly study the PSN distribution based on Martínez-Flórez et al. [11]. We introduce a RPSN distribution which is characterized by its quantile, which allows us to use this distribution in the context of QR models.
The probability density function (pdf) of the PSN distribution is given by
f ( y ; θ ) = α σ ϕ λ z Φ λ z α 1 ,
where θ = ( μ , σ , λ , α ) , z = ( y μ ) / σ and ϕ λ ( · ) and Φ λ ( · ) denote the pdf and cdf of the (standard) skew normal model given by
ϕ λ ( y ) = 2 ϕ ( y ) Φ ( λ y ) and Φ λ ( y ) = y ϕ λ ( t ) d t = Φ ( y ) 2 T ( y , λ ) ,
where ϕ ( · ) and Φ ( · ) denote the pdf and cdf of the standard normal distribution and T ( · , · ) is the Owen’s T function defined as
T ( y , λ ) = 1 2 π 0 λ e 1 2 y 2 ( 1 + t 2 ) 1 + t 2 d t .
Moreover, the cdf of the PSN model is given by
F ( y ; θ ) = Φ λ z α = Φ z 2 T z , λ α .
Note that α = 1 and λ = 0 corresponds to the very well known SN and PN models, respectively. The main advantage of the PSN model is that provides greater range for skewness and kurtosis coefficients compared with the SN and PN models. Table 1 shows the range for those coefficients.
The r-th moment of the distribution depends on the expected value of [ Φ λ 1 ( Y ) ] s , s = 1 , , r , where Y have beta distribution with shape parameters α and 1, respectively. For this reason, some interesting characteristic of the model, such as mean and variance, have cumbersome forms. On the other hand, quantiles of the model also need to be computed numerically since non-closed form are available for the distribution. For this reason, non-interpretation and useful reparametrizations can be performed for this model. Besides, as the Owen’s T function satisfies T ( 0 , λ ) = ( 2 π ) 1 arctan ( λ ) , we note that
F ( μ ; θ ) = 1 2 1 π arctan ( λ ) α .
For this reason, if we consider the restriction
α = α ( λ , τ ) = log ( τ ) log 1 2 1 π arctan ( λ ) ,
we have that F ( μ ; θ ) = τ with μ representing directly the τ -th quantile of the distribution. For a fixed τ and considering α ( λ , τ ) as in (1), we have a flexible model for quantile regression. This parametrization has not been proposed in the statistical literature. Hence, we can rewrite the PSN distribution according to the parameters μ , σ and λ , whose cumulative distribution function is now given by
F ( y ; μ , σ , λ ) = Φ z 2 T z , λ log ( τ ) log 1 2 1 π arctan ( λ ) ,
where the quantile τ ( 0 , 1 ) is assumed to be known. Hereafter, we use the notation Y RPSN ( μ , σ , λ ) to indicate that Y is a random variable following a restricted PSN distribution with quantile parameter μ , precision parameter σ , and shape parameter λ . Figure 1 shows the density function for the RPSN model with location and scale parameters fixed at 0 and 1, respectively. Note that in all the curves, the zero represents the specified quantile τ . We also note that the curves are not necessarily symmetric for τ = 0.5 (the median case).
Let Y 1 , , Y n be the n independent random variables, where each Y i , i = 1 , , n , follows the PSN distribution with quantile parameter μ , precision parameter σ , and shape parameter λ . Suppose that, for a given τ ( 0 , 1 ) , the location, precision and shape parameters for the RPSN satisfy the following functional relations
g 1 ( μ i ( τ ) ) = η i 1 ( τ ) = x i 1 β 1 ( τ ) , g 2 ( σ i ( τ ) ) = η i 2 ( τ ) = x i 2 β 2 ( τ ) and g 3 ( λ i ( τ ) ) = η i 3 ( τ ) = x i 3 β 3 ( τ ) ,
where β j ( τ ) = ( β j 1 ( τ ) , , β j p 1 ( τ ) ) , j = 1 , 2 , 3 , are vectors of unknown regression coefficients which are assumed to be functionally independent, β j ( τ ) R p j , with  p 1 + p 2 + p 3 < n , η j i ( τ ) are the linear predictors, and  x i j = ( x i j 1 , , x i j p j ) , are observations on p 1 , p 2 and p 3 known regressors, for  i = 1 , , n . Moreover covariate matrices X j = ( x 1 j , , x n j ) are assumed to have rank p j , for  j = 1 , 2 , 3 . Link functions g 1 : R R , g 2 : R R + and g 3 : R R in (2) must be strictly monotone and at least twice differentiable, and g 2 is also required to be a positive function. Such functions also satisfy that μ i = g 1 1 ( x i 1 β 1 ) , σ i = g 2 1 ( x i 2 β 2 ) and λ i = g 3 1 ( x i 3 β 3 ) , with  g j 1 ( · ) being the inverse function of g j ( · ) .
The log-likelihood function for θ = θ ( τ ) = ( β 1 ( τ ) , β 2 ( τ ) , β 3 ( τ ) ) has the form ( θ ) = i = 1 n i , where
i = ( z i , μ i , σ i , λ i ) = log α ( λ i , τ ) log ( σ i ) + log ϕ λ i z i + α ( λ i , τ ) 1 log Φ λ i z i .
The ( p 1 + p 2 + p 3 ) × 1 score vector of the model is given by
˙ ( θ ) = ( θ ) β 1 ( θ ) β 2 ( θ ) β 3 = X 1 W β 1 1 / 2 ˙ μ X 2 W β 2 1 / 2 ˙ σ X 3 W β 3 1 / 2 ˙ λ ,
where W β j = diag ( w β j 1 , , w β j n ) , w β 1 i = ( μ i / η 1 i ) 2 , w β 2 i = ( σ i / η 2 i ) 2 , w β 3 i = ( λ i / η 3 i ) 2 , ˙ ξ = ( ˙ ξ 1 , , ˙ ξ n ) , for  ξ { μ , σ , λ } , with  ˙ ξ i = ( μ i , σ i , λ i ) / ξ i . Such elements are specified in the Appendix A.1 Section.
The Hessian for the model is
H ( θ ) = H β 1 β 1 H β 1 β 2 H β 1 β 3 · H β 2 β 2 H β 2 β 3 · · H β 3 β 3 = X 1 ¨ μ μ W β 1 X 1 X 1 ¨ μ σ W β 1 1 / 2 W β 2 1 / 2 X 2 X 1 ¨ μ λ W β 1 1 / 2 W β 3 1 / 2 X 3 · X 2 ¨ σ σ W β 2 X 2 X 2 ¨ σ λ W β 2 1 / 2 W β 3 1 / 2 X 3 · · X 3 ¨ λ λ W β 3 X 3 ,
where ¨ ξ ξ = diag ( ¨ ξ 1 ξ 1 , , ¨ ξ n ξ n ) , for  ξ , ξ { μ , σ , λ } , with  ˙ ξ i ξ i = 2 i / ξ i ξ i . Such elements are detailed in the Appendix A.1.
The ML estimators β ^ 1 ( τ ) , β ^ 2 ( τ ) and β ^ 3 ( τ ) of β 1 ( τ ) , β 2 ( τ ) and β 3 ( τ ) , respectively, can be obtained by solving simultaneously the nonlinear system of equations ( θ ) = 0 p 1 + p 2 + p 3 , where 0 r denotes a vector of zeros with dimension r. Unfortunately, it is not possible to obtain analytical expressions for the ML estimators above, so numerical methods for solving nonlinear equations system are required.

3. Local Influence

Global influence is related to case deletion, i.e, the effect of dropping a case from the dataset Cook [14]. The likelihood distance (LD) is defined as LD ( ω ) = 2 [ ( θ ^ ) ( θ ^ ( ω ) , ω ) ] , where θ ^ ( ω ) is the ML estimate of θ under a perturbed model related to ω = ( ω 1 , , ω n ) , a perturbation vector. Cook [14] studied the LD ( ω ) around the non-perturbed vector ω 0 such as θ ^ ( ω 0 ) = θ ^ . The normal curvature for ω ^ at the direction of the orthonormal vector | | d | | is defined as C d ( θ ^ ) = 2 | d Δ ω ¨ θ ^ θ ^ Δ ω d | , where ¨ θ ^ θ ^ is the Hessian of ( θ ) evaluated at θ = θ ^ and Δ ω = 2 ( θ , ω ) / θ ω θ = θ ^ ( ω ) and both, Δ ω and ¨ θ ^ θ ^ are evaluated at θ ^ ( ω ) . Hence, C d max is the largest eigenvalue of B = Δ ω 0 ¨ θ ^ θ ^ Δ ω 0 and d max the corresponding orthonormal eigenvector. The index d max plot of the matrix B suggests how to perturb the model (or data) to obtain large changes in the estimates of θ .
For three common perturbation schemes we compute the matrix
Δ ω = 2 ( θ , ω ) θ ω = Δ ω , β 1 Δ ω , β 2 Δ ω , β 3 ,
where Δ ω , β j = 2 ( θ , ω ) β 1 ω .

3.1. Case Weights Perturbation

For this case, the perturbed log-likelihood function is defined as ( θ , ω ) = i = 1 n ω i i , where ( z i , μ i , σ i , λ i ) is defined in (3) and 0 ω i 1 , for  i = 1 , , n . In this case, ω 0 = ( 1 , , 1 ) and
Δ ω = X 1 W β 1 1 / 2 ˙ μ X 2 W β 2 1 / 2 ˙ σ X 3 W β 3 1 / 2 ˙ λ .

3.2. Case Response Perturbation

We consider now an additive perturbation on the ith response (say y i ( · ) ) by making y i ( ω i ) = y i + ω i S Y i , where ω i R and S Y i is a scale factor. An usual consideration for such scale factor is S Y i = S Y , with  S Y denoting the sample standard deviation of Y. Note that ω 0 = ( 0 , , 0 ) . Therefore, under the scheme of response perturbation, the log-likelihood function is given by ( θ , ω ) = i = 1 n ( z i ( ω i ) , μ i , σ i , λ i ) , where z i ( ω i ) = ( y i ( ω i ) μ i ) / σ i and
Δ ω = S Y X 1 W β 1 1 / 2 ¨ μ μ X 2 W β 2 1 / 2 ¨ μ σ X 3 W β 3 1 / 2 ¨ μ λ | z i = z i ( ω i ) .

3.3. Case Continuous Covariate Perturbation

Consider an additive perturbation on a particular continuous covariate including on the quantile parameter, namely x t , for  t { 1 , , p 1 } , by making x i t ( ω i ) = x i t + ω i S X i t , where S X i t is a scale factor. Again, a usual consideration is S X i t = S X t , with  S X t the sample standard deviation for X t . Note that ω 0 = ( 0 , , 0 ) . Then, under the scheme of response perturbation, the log-likelihood function is given by ( θ , ω ) = i = 1 n ( z i , μ i ( ω i ) , σ i , λ i ) , where μ i ( ω i ) = g 1 1 ( x i 1 ( ω i ) β 1 ) and x i 1 ( ω i ) = x i 1 + ω i S X i t J t , with  J t a vector of dimension p 1 with zeros, except in the t-th element where is a one. Finally
Δ ω = S X t diag ( J t ) X 1 W β 1 1 / 2 ¨ μ μ diag ( J t ) X 2 W β 2 1 / 2 ¨ μ σ diag ( J t ) X 3 W β 3 1 / 2 ¨ μ λ | z i = z i ( ω i ) .

4. Real Data Analysis

In this section, we present an application to 202 Australian athletes from the Australian Institute of Sport. Such data were discussed in Cook and Weisberg [15]. In order to exemplify the proposed model, we consider the following quantile regression model: bmi i ( τ ) = μ i ( τ ) + σ i ( τ ) ϵ i ( τ ) , where ϵ i ( τ ) PSN ( 0 , 1 , λ , τ ) and
μ i ( τ ) = β 10 ( τ ) + β 11 ( τ ) lbm i + β 12 ( τ ) sex i σ i ( τ ) = β 20 ( τ ) + β 21 ( τ ) lbm i .
Here, the response bmi represents the body mass index, while the covariates lbm and sex represent the lean body mass and sex of the athletes, respectively. Note that λ is not modeled by covariates and sex was not included in the scale parameter because in preliminary analysis we found the coefficient related to such term was not significant (to any τ ( 0 , 1 ) ). This same problem was illustrated in Galarza et al. [16] with a class of skew distributions (SKD), but considering a regression scheme only in the quantile parameter. For comparison purpose, we considered the skewed normal (SKN) and skewed Student-t (SKT) models, that are models belonging to the SKD class. Additionally, we also considered the Gamma-Sinh Cauchy (GSC) model, including covariates only in the quantile parameter. Table 2 shows the Akaike Information Criterion (AIC, Akaike; [17]) for the referred models. Note that, except for τ = 0.25 , the RPSN-QR model attached the minimum AIC for the considered quantiles.
Table 2 displays the MLEs with corresponding standard errors (SE) for the fitted proposed model for each τ = 0.10 , 0.50 and 0.90 . Note that we have a positive relationship between the response variable (bmi) and lbm in all quantiles. We also observe that the quantile intercepts increases as τ increases. Regarding the parameter λ , the greater τ , the greater the estimate of λ .
Figure 2 shows point estimates and 95 % confidence intervals (CIs) for model parameters under the RPSN-QR model for different quantiles. It can be seen that as τ increases the coefficient of lean body mass and the coefficient of gender become larger. Moreover, bmi and lbm are significant in explaining all the quantile modeled in μ i . Figure 3 presented the estimated quantiles 0.10 , 0.25 , 0.50 , 0.75 and 0.90 for the bmi in terms of lbm and the sex of the athlete.
We also present in Table 4 the p-value to validate the normality hypothesis based on the Kolmogorov–Smirnov (KS; Kolmogorov, [18]) for the quantile residuals (Dunn and Smyth, [19]) using different quantile τ of such residuals. In all cases, the KS test did not reject the null hypothesis of normality. Therefore, the RPSN is appropriated to model all the quantile in this problem.
We also performed a local influence analysis. Figure 4 shows such analysis under the three perturbation schemes discussed in Section 3 for τ = 0.5 . The Appendix A.2 shows the analysis for other quantiles τ = 0.1 , 0.25 , 0.75 and 0.9 . Note that observations 75, 162 and 178 are detected as potentially influent for all the mentioned quantiles and the observation 53 appears for the quantile 0.9 .
To check the impact on the inference of possible influential cases, we consider the relative change (RC), which is computed by removing the possible influential cases for each parameter and its SE as
R C θ j ( i ) = 100 % × θ ^ j θ ^ j ( i ) θ ^ j and R C S E θ j ( i ) = 100 % × S E θ ^ j S E θ ^ j ( i ) S E θ ^ j ,
where θ j is any component of the vector θ = θ ( τ ) , where θ ^ j ( i ) and S E θ ^ j ( i ) denote the ML estimate of θ j and its corresponding SE, respectively, after dropping the i-th observation. Table 5 shows such RC for the non-intercept regression coefficients when observations 53, 75, 162 and 178 are removed. Note that the RC is greater for the estimated parameters than its estimated SE. However, the significance of β 11 ( τ ) and β 12 ( τ ) is maintained whereas β 21 ( τ ) is not significant with a 5%. More combinations of dropped observations are presented in the Appendix A.2.

5. Concluding Remarks

Extending the quantile regression methods to include asymmetric response variables on the real line is promising area of research. In this paper, we have introduced a novel flexible parametric quantile regression model for asymmetric response variables, which can be very useful in modeling response variables on the real line at different quantiles. The proposed quantile regression model was built based on PSN distribution using a new parameterization of this distribution that is indexed by quantile, precision and shape parameters, in which a function of any quantile of the response variable is given by a linear predictor that is defined by regression parameters and explanatory variables. We consider a frequentist approach to estimate the model parameters, and the maximum likelihood inference is employed to estimate the model parameters. An application using a real dataset was presented and discussed. Results of the application showed that the model is adequate; it elaborately showed which covariates influence the response at different levels of quantiles. Finally, there are many possible extensions of the current work, for instance, mixtures of RPSN regression models in order to accommodate multimodality, a semi-parametric component to include a functional covariate to model nonlinearity of the response, and measurement errors, among others. An in-depth investigation of these topics is beyond the scope of this work, and will be considered elsewhere.

Author Contributions

Conceptualization, D.I.G., M.B. and C.E.G.; Formal analysis, D.I.G., M.B., C.E.G. and H.W.G.; Investigation, D.I.G., M.B., C.E.G. and H.W.G.; Methodology, D.I.G., M.B. and C.E.G.; Software, D.I.G. and M.B.; Supervision, C.E.G. and H.W.G.; Validation, D.I.G. and M.B. All authors have read and agreed to the published version of the manuscript.

Funding

The research of H.W. Gómez was supported by Grant PUENTE UA, Chile.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Appendix A.1. Details for Score and Hessian

For the score vector in Equation (4), the elements of the form i ξ i , with ξ { μ , σ , λ } are given by
i μ i = 1 σ i λ i m 0 ( λ i z i ) z i + α ( λ i , τ ) 1 m λ i ( z i ) , i σ i = z i σ i λ i m 0 ( λ i z i ) z i + α ( λ i , τ ) 1 m λ i ( z i ) 1 σ i , i λ i = log ( τ ) α ( λ i , τ ) ( λ i 2 + 1 ) π 2 arctan ( λ i ) 1 1 + log Φ λ i ( z i ) + z i m 0 ( λ i z i ) α ( λ i , τ ) 1 m 0 ( λ i z i ) ( 1 + λ i 2 ) ,
where m λ ( z ) = ϕ λ ( z ) / Φ λ ( z ) .
For the Hessian in Equation (5), the elements of the form 2 i / ξ i ξ i , with ξ , ξ { μ , σ , λ } are given by
2 i μ i 2 = 1 σ i 2 λ i 2 m 0 ( λ i z i ) 1 + [ α ( λ i , τ ) 1 ] m λ i ( z i ) 2 i μ i σ i = z i σ i λ i 2 m 0 ( λ i z i ) 1 + [ α ( λ i , τ ) 1 ] m λ i ( z i ) 2 i μ i λ i = 1 σ i { log ( τ ) m λ ( z i ) ( 1 + λ i 2 ) π 2 arctan ( λ i ) 1 + log Φ λ i ( z i ) + m ( λ i z i ) + λ i z i m 0 ( λ i z i ) [ α ( λ i , τ ) 1 ] λ i m 0 ( λ i z i ) ( 1 + λ i 2 ) } 2 i σ i 2 = z i σ i 2 λ i 2 m 0 ( λ i z i ) 1 + [ α ( λ i , τ ) 1 ] m λ i ( z i ) + 1 σ i 2 2 i σ i λ i = z i σ i { log ( τ ) m λ ( z i ) ( 1 + λ i 2 ) π 2 arctan ( λ i ) 1 + log Φ λ i ( z i ) + m ( λ i z i ) + λ i z i m 0 ( λ i z i ) [ α ( λ i , τ ) 1 ] λ i m 0 ( λ i z i ) ( 1 + λ i 2 ) } 2 i λ i 2 = log 2 ( τ ) ( 1 π λ + 2 λ i arctan ( λ i ) ) log 1 2 1 π arctan ( λ i ) + 1 1 + log Φ λ i ( z i ) α ( λ i , τ ) ( 1 + λ i 2 ) π 2 arctan ( λ i ) 2 log ( τ ) m 0 ( λ i z i ) m λ i ( z i ) α ( λ i , τ ) ( 1 + λ i 2 ) 2 π 2 arctan ( λ i ) ( 1 + log Φ λ i ( z i ) ) + z i 2 m 0 ( λ i z i ) m ( λ i z i ) log ( τ ) α 2 ( λ i , τ ) ( 1 + λ i 2 ) 2 π 2 arctan ( λ i ) [ α ( λ i , τ ) 1 ] ( 1 + λ i 2 ) z i m ( λ i z i ) 2 λ i m ( λ i z i ) ( 1 + λ i 2 ) ,
where m λ ( z ) = λ m 0 ( λ z ) m λ ( z ) z m λ ( z ) m λ 2 ( z ) .

Appendix A.2. Local Influence

In this section, we present additional information for the local influence analysis in the Athletes dataset discussed in Section 5.
Figure A1. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.1 .
Figure A1. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.1 .
Symmetry 12 01938 g0a1
Figure A2. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.25 .
Figure A2. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.25 .
Symmetry 12 01938 g0a2
Figure A3. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.75 .
Figure A3. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.75 .
Symmetry 12 01938 g0a3aSymmetry 12 01938 g0a3b
Figure A4. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.9 .
Figure A4. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.9 .
Symmetry 12 01938 g0a4aSymmetry 12 01938 g0a4b
Table A1. RCs (in %) in ML estimates and their corresponding SEs for the indicated parameter and respective p-values for the athletes dataset when observation 75 and 178 are dropped separately.
Table A1. RCs (in %) in ML estimates and their corresponding SEs for the indicated parameter and respective p-values for the athletes dataset when observation 75 and 178 are dropped separately.
Dropped τ
Cases Parameter0.100.250.500.750.90
75RC 5.317.2210.8216.222.57
RCSE β 11 ( τ ) 0.230.200.170.110.04
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 1.825.0310.0216.0922.08
RCSE β 12 ( τ ) 0.150.050.080.070.17
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 6.779.8414.2719.2023.84
RCSE β 21 ( τ ) 0.650.931.050.710.33
p-value 0.01180.01050.00950.00860.0078
178RC 0.722.626.3011.8818.50
RC S E β 11 ( τ ) 0.170.150.120.070.00
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 0.123.368.6014.8821.06
RC S E β 12 ( τ ) 0.130.060.090.070.18
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 22.9125.4329.0933.1737.01
RC S E β 21 ( τ ) 0.750.470.310.611.61
p-value 0.04490.04180.03930.03710.0352
Table A2. RCs (in %) in ML estimates and their corresponding SEs for the indicated parameter and respective p-values for the athletes dataset when observations {75, 178} and {75, 162, 178} are dropped separately.
Table A2. RCs (in %) in ML estimates and their corresponding SEs for the indicated parameter and respective p-values for the athletes dataset when observations {75, 178} and {75, 162, 178} are dropped separately.
Dropped τ
Cases Parameter0.100.250.500.750.90
75 andRC 6.308.1611.6917.0123.29
178RCSE β 11 ( τ ) 0.410.390.340.280.19
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 1.755.3210.5816.8422.97
RCSE β 12 ( τ ) 0.290.080.030.180.27
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 3133.3436.6740.3843.87
RCSE β 21 ( τ ) 0.040.270.420.110.91
p-value 0.06740.06330.06000.05720.0546
75, 162RC 5.437.2710.8016.1322.45
and 178RCSE β 11 ( τ ) 0.570.540.500.430.34
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 1.365.1210.5316.9123.14
RCSE β 12 ( τ ) 0.390.180.120.260.35
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 43.3745.4648.3551.5354.51
RCSE β 21 ( τ ) 0.290.610.770.460.56
p-value 0.13000.12510.12120.11780.1149

References

  1. Azzalini, A. A class of distributions which includes the normal ones. Scand. J. Stat. 1985, 12, 171–178. [Google Scholar]
  2. Azzalini, A. Further results on a class of distributions which includes the normal ones. Statistica 1986, 46, 199–208. [Google Scholar]
  3. Arellano-Valle, R.B.; Gómez, H.W.; Quintana, F.A. A New Class of Skew-Normal Distributions. Commun. Stat. Theory Methods 2004, 33, 1465–1480. [Google Scholar] [CrossRef]
  4. Pewsey, A.; Gómez, H.W.; Bolfarine, H. Likelihood-based inference for power distributions. Test 2012, 21, 775–789. [Google Scholar] [CrossRef]
  5. Lehmann, E.L. The power of rank tests. Ann. Math. Statist. 1953, 24, 23–43. [Google Scholar] [CrossRef]
  6. Durrans, S.R. Distributions of fractional order statistics in hydrology. Water Resour. Res. 1992, 28, 1649–1655. [Google Scholar] [CrossRef]
  7. Gupta, D.; Gupta, R.C. Analyzing skewed data by power normal model. Test 2008, 17, 197–210. [Google Scholar] [CrossRef]
  8. Castillo, N.O.; Gallardo, D.I.; Bolfarine, H.; Gómez, H.W. Truncated power-normal distribution with application to non-negative measurements. Entropy 2018, 20, 433. [Google Scholar] [CrossRef] [Green Version]
  9. Martínez-Flórez, G.; Arnold, B.C.; Bolfarine, H.; Gómez, H.W. The alpha-power tobit model. Commun. Stat. Theory Methods 2013, 42, 633–643. [Google Scholar] [CrossRef]
  10. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Doubly censored power-normal regression models with inflation. Test 2015, 24, 265–286. [Google Scholar] [CrossRef]
  11. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Skew-normal alpha-power model. Statistics 2014, 48, 1414–1428. [Google Scholar] [CrossRef]
  12. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. The log alpha-power asymmetric distribution with application to air pollution. Environmetrics 2014, 25, 44–56. [Google Scholar] [CrossRef]
  13. Martínez-Flórez, G.; Bolfarine, H.; Gómez, H.W. Asymmetric regression models with limited responses with an application to antibody response to vaccine. Biom. J. 2013, 55, 156–172. [Google Scholar] [CrossRef] [PubMed]
  14. Cook, R.D. Detection of influential observation in linear regression. Technometrics 1977, 19, 15–18. [Google Scholar]
  15. Cook, R.D.; Weisberg, S. An Introduction to Regression Graphics; Wiley: New York, NY, USA, 1994. [Google Scholar]
  16. Galarza, C.E.; Lachos, V.H.; Barbosa, C.; Castro, L.M. Robust quantile regression using a generalized class of skewed distributions. Stat 2017, 6, 113–130. [Google Scholar]
  17. Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
  18. Kolmogorov, A.N. Sulla determinazione empirica di una legge di distribuzionc. Giorn. Ist. Ital. Attuar. 1933, 4, 83–91. [Google Scholar]
  19. Dunn, P.; Smyth, G. Randomized quantile residuals. J. Comput. Graph. Stat. 1996, 5, 236–244. [Google Scholar]
Figure 1. Pdf for the RPSN ( μ = 0 , σ = 1 , λ ) for different values of λ : τ = 0.1 (left panel); τ = 0.5 (center panel); τ = 0.9 (right panel). Values for λ are: 5 (black line), 1.5 (red line), 0.5 (blue line), 0 (green line), 0.5 (orange line), 1.5 (magenta line) and 5 (purple line).
Figure 1. Pdf for the RPSN ( μ = 0 , σ = 1 , λ ) for different values of λ : τ = 0.1 (left panel); τ = 0.5 (center panel); τ = 0.9 (right panel). Values for λ are: 5 (black line), 1.5 (red line), 0.5 (blue line), 0 (green line), 0.5 (orange line), 1.5 (magenta line) and 5 (purple line).
Symmetry 12 01938 g001
Figure 2. Athletes dataset: Point estimates (center line) and 95% confidence intervals (CIs) for model parameters under RPSN-QR model.
Figure 2. Athletes dataset: Point estimates (center line) and 95% confidence intervals (CIs) for model parameters under RPSN-QR model.
Symmetry 12 01938 g002
Figure 3. Data analysis: Fitted RPSN-QR model lines for the response (left panel for males, center panel for females) and scale parameter (right panel) over the grid τ = { 0.10 , 0.25 , 0.50 , 0.75 , 0.90 } .
Figure 3. Data analysis: Fitted RPSN-QR model lines for the response (left panel for males, center panel for females) and scale parameter (right panel) over the grid τ = { 0.10 , 0.25 , 0.50 , 0.75 , 0.90 } .
Symmetry 12 01938 g003
Figure 4. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.5 .
Figure 4. Index plots for C i ( β ^ 1 ) (left), C i ( β ^ 2 ) (center) and C i ( β ^ 3 ) (right) under the weight perturbation (upper), response perturbation (center) and covariate perturbation (lower) schemes for RPSN model for τ = 0.5 .
Symmetry 12 01938 g004
Table 1. Range for skewness and kurtosis coefficients for SN, PN and PSN models.
Table 1. Range for skewness and kurtosis coefficients for SN, PN and PSN models.
CoefficientSNPNPSN
Skewness(−0.9953, 0.9953)[−0.6115, 0.9007][−1.6476, 0.9953)
Kurtosis[3, 3.8692)[1.7170, 4.3556][1.4672, 5.4386]
Table 2. AIC criterion for different models parameterized in terms of the quantile.
Table 2. AIC criterion for different models parameterized in terms of the quantile.
τ SKNSKTGSCRPSN ( σ Constant)RPSN (Modeling σ )
0.101097.74817.77803.08808.64801.37
0.251084.46803.90801.96811.08803.11
0.501095.56810.99854.38815.79806.76
0.751151.40854.57861.37824.56814.01
0.901220.96914.43865.16838.78825.95
Table 3. Estimates and SE for parameters in athletes dataset in RPSN-quantile regression (QR) model for different values of τ .
Table 3. Estimates and SE for parameters in athletes dataset in RPSN-quantile regression (QR) model for different values of τ .
τ = 0.10 τ = 0.50 τ = 0.90
ParameterEst.SEp-ValueEst.SEp-ValueEst.SEp-Value
β 10 ( τ ) 6.46421.1552-6.77271.0867-6.17981.0859-
β 11 ( τ ) 2.30770.3742<0.00012.50080.3728<0.00012.93240.3695<0.0001
β 12 ( τ ) 0.20370.0157<0.00010.22990.0147<0.00010.27280.0151<0.0001
β 20 ( τ ) 0.76330.7952-0.22520.3996-−0.62610.2700-
β 21 ( τ ) 0.00960.00360.00400.01080.00370.00170.01250.00350.0002
β 30 ( τ ) −1.19401.4984-−0.83810.5984-−0.49160.2588-
Table 4. p-values for normality K-S test for residuals under our RPSN-QR model for the athletes dataset for different quantiles τ ’s.
Table 4. p-values for normality K-S test for residuals under our RPSN-QR model for the athletes dataset for different quantiles τ ’s.
τ 0.100.150.200.250.300.350.400.450.50
p-value0.9950.9960.9910.9760.9510.9140.8640.8530.837
τ 0.550.600.650.700.750.800.850.90
p-value0.7650.8100.7770.6830.6040.5240.3940.191
Table 5. Relative changes (RC) (in %) in ML estimates and their corresponding SE’s for the indicated parameter and respective p-values for the athletes dataset when observations 53, 75, 162 and 178 are dropped.
Table 5. Relative changes (RC) (in %) in ML estimates and their corresponding SE’s for the indicated parameter and respective p-values for the athletes dataset when observations 53, 75, 162 and 178 are dropped.
Dropped τ
Cases Parameter0.100.250.500.750.90
53, 75,RC 7.209.0612.5617.8224.05
162 and 178RCSE β 11 ( τ ) 0.740.740.690.630.52
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 1.985.6310.9317.2223.38
RCSE β 12 ( τ ) 0.470.260.200.340.42
p-value <0.0001<0.0001<0.0001<0.0001<0.0001
RC 34.5536.8340.0543.6146.96
RCSE β 21 ( τ ) 0.230.530.690.360.66
p-value 0.08060.07620.07270.06970.0670
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Gallardo, D.I.; Bourguignon, M.; Galarza, C.E.; Gómez, H.W. A Parametric Quantile Regression Model for Asymmetric Response Variables on the Real Line. Symmetry 2020, 12, 1938. https://doi.org/10.3390/sym12121938

AMA Style

Gallardo DI, Bourguignon M, Galarza CE, Gómez HW. A Parametric Quantile Regression Model for Asymmetric Response Variables on the Real Line. Symmetry. 2020; 12(12):1938. https://doi.org/10.3390/sym12121938

Chicago/Turabian Style

Gallardo, Diego I., Marcelo Bourguignon, Christian E. Galarza, and Héctor W. Gómez. 2020. "A Parametric Quantile Regression Model for Asymmetric Response Variables on the Real Line" Symmetry 12, no. 12: 1938. https://doi.org/10.3390/sym12121938

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop