Skip to main content
Log in

Influence measures in nonparametric regression model with symmetric random errors

  • Review Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

In this paper we present several diagnostic measures for the class of nonparametric regression models with symmetric random errors, which includes all continuous and symmetric distributions. In particular, we derive some diagnostic measures of global influence such as residuals, leverage values, Cook’s distance and the influence measure proposed by Peña (Technometrics 47(1):1–12, 2005) to measure the influence of an observation when it is influenced by the rest of the observations. A simulation study to evaluate the effectiveness of the diagnostic measures is presented. In addition, we develop the local influence measure to assess the sensitivity of the maximum penalized likelihood estimator of smooth function. Finally, an example with real data is given for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp 267–281

  • Boor C (1978) A practical guide to splines. Springer Verlag, Berlin

    Book  MATH  Google Scholar 

  • Buja A, Hastie T, Tibshirani R (1989) Linear smoothers and additive models. Ann Stat 17:453–510

    MathSciNet  MATH  Google Scholar 

  • Cook RD (1977) Detection of influential observation in linear regression. Technometrics 19(1):15–18

    MathSciNet  MATH  Google Scholar 

  • Cook RD (1986) Assessment of local influence. J Roy Stat Soc: Ser B (Methodol) 48(2):133–155

    MathSciNet  MATH  Google Scholar 

  • Cook RD, Weisberg S (1982) Residuals and influence in regression. Chapman and Hall, New York

    MATH  Google Scholar 

  • Craven P, Wahba G (1978) Smoothing noisy data with spline functions. Numer Math 31(4):377–403

    Article  MathSciNet  MATH  Google Scholar 

  • Dunn P, Smyth G (1996) Randomized quatile residuals. J Comput Graph Stat 52(5):236–244

    Google Scholar 

  • Eilers P, Marx B (1996) Flexible smoothing with b-splines and penalties. Stat Sci 11:89–121

    Article  MathSciNet  MATH  Google Scholar 

  • Emami H (2018) Local influence for Liu estimators in semiparametric linear models. Stat Pap 59(2):529–544

    Article  MathSciNet  MATH  Google Scholar 

  • Eubank R (1984) The hat matrix for smoothing splines. Stat Probab Lett 2(1):9–14

    Article  MathSciNet  MATH  Google Scholar 

  • Eubank R (1985) Diagnostics for smoothing splines. J R Stat Soc Ser B (Methodological) 47:332–341

    MathSciNet  MATH  Google Scholar 

  • Eubank R, Gunst R (1986) Diagnostics for penalized least-squares estimators. Stat Probab Lett 4(5):265–272

    Article  MathSciNet  MATH  Google Scholar 

  • Eubank RL, Thomas W (1993) Detecting heteroscedasticity in nonparametric regression. J Roy Stat Soc: Ser B (Methodol) 55(1):145–155

    MathSciNet  MATH  Google Scholar 

  • Fang K, Kotz S, Ng KW (1990) Symmetric multivariate and related distributions. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Ferreira CS, Paula GA (2017) Estimation and diagnostic for skew-normal partially linear models. J Appl Stat 44(16):3033–3053

    Article  MathSciNet  MATH  Google Scholar 

  • Fung WK, Zhu ZY, Wei BC, He X (2002) Influence diagnostics and outlier tests for semiparametric mixed models. J R Stat Soc Ser B (Statistical Methodology) 64(3):565–579

    Article  MathSciNet  MATH  Google Scholar 

  • Green PJ, Silverman BW (1993) Nonparametric regression and generalized linear models: a roughness penalty approach. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  • Hastie T, Tibshirani R (1990) Generalized additive models, vol 4343. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Huber PJ (2004) Robust statistics, vol 523. John Wiley & Sons, New Jersey

    Google Scholar 

  • Ibacache-Pulgar G, Paula GA (2011) Local influence for student-t partially linear models. Comput Stat Data Anal 55(3):1462–1478

    Article  MathSciNet  MATH  Google Scholar 

  • Ibacache-Pulgar G, Paula GA, Galea M (2012) Influence diagnostics for elliptical semiparametric mixed models. Stat Model 12(2):165–193

    Article  MathSciNet  MATH  Google Scholar 

  • Ibacache-Pulgar G, Paula GA, Cysneiros FJA (2013) Semiparametric additive models under symmetric distributions. TEST 22(1):103–121

    Article  MathSciNet  MATH  Google Scholar 

  • Kim C (1996) Cook’s distance in spline smoothing. Stat Probab Lett 31(2):139–144

    Article  MathSciNet  MATH  Google Scholar 

  • Kim C, Park BU, Kim W (2002) Influence diagnostics in semiparametric regression models. Stat Probab Lett 60(1):49–58

    Article  MathSciNet  MATH  Google Scholar 

  • Peña D (2005) A new statistic for influence in linear regression. Technometrics 47(1):1–12

    Article  MathSciNet  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6(2):461–464

    Article  MathSciNet  MATH  Google Scholar 

  • Silverman BW (1985) Some aspects of the spline smoothing approach to non-parametric regression curve fitting. J Roy Stat Soc: Ser B (Methodol) 47(1):1–21

    MATH  Google Scholar 

  • Speckman P (1988) Kernel smoothing in partial linear models. J Roy Stat Soc: Ser B (Methodol) 50(3):413–436

    MathSciNet  MATH  Google Scholar 

  • Thomas W (1991) Influence diagnostics for the cross-validated smoothing parameter in spline smoothing. J Am Stat Assoc 86(415):693–698

    Article  MathSciNet  Google Scholar 

  • Türkan S, Toktamis Ö (2013) Detection of influential observations in semiparametric regression model. Revista Colombiana de Estadística 36(2):271–284

    MathSciNet  MATH  Google Scholar 

  • Vanegas L, Paula G (2016) An extension of log-symmetric regression models: R codes and applications. J Stat Comput Simul 86(9):1709–1735

    Article  MathSciNet  MATH  Google Scholar 

  • Wahba G (1983) Bayesian “confidence intervals’’ for the cross-validated smoothing spline. J Roy Stat Soc: Ser B (Methodol) 45(1):133–150

    MathSciNet  MATH  Google Scholar 

  • Wei WH (2004) Derivatives diagnostics and robustness for smoothing splines. Comput Stat Data Anal 46(2):335–356

    Article  MathSciNet  MATH  Google Scholar 

  • Wood SN (2003) Thin plate regression splines. J R Stat Soc Ser B (Statistical Methodology) 65(1):95–114

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang D, Lin X, Raz J, Sowers M (1998) Semiparametric stochastic mixed models for longitudinal data. J Am Stat Assoc 93(442):710–719

    Article  MathSciNet  MATH  Google Scholar 

  • Zhu ZY, He X, Fung WK (2003) Local influence analysis for penalized gaussian likelihood estimators in partially linear models. Scand J Stat 30(4):767 780

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was funded by FONDECYT 11130704, Chile, and DID S-2017-32, Universidad Austral de Chile, grant.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Germán Ibacache-Pulgar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

1.1 Hessian matrix

Let \({\mathbf {L}}_{\mathrm {p}}\left( p^{*} \times p^{*}\right)\) be the Hessian matrix with \(\left( j^{*}, \ell ^{*}\right)\) -element given by \(\partial ^{2} L_{\mathrm {p}}(\varvec{\theta }, \varvec{\alpha }) / \partial \theta _{j^{*}} \theta _{\ell ^{*}}\) for \(j^{*}, \ell ^{*}=1, \ldots , p^{*} .\) After some algebraic manipulations we find

$$\begin{aligned} \begin{array}{l}{ \frac{\partial ^{2} L_{\mathrm {p}}(\varvec{\theta }, \varvec{\alpha })}{\partial {\mathbf {f}} \partial {\mathbf {f}}^{T}}=-\frac{1}{\phi } {\mathbf {N}}^{T} {\mathbf {D}}_{a} {\mathbf {N}}-\alpha {\mathbf {K}}} , \\ {\frac{\partial ^{2} L_{\mathrm {p}}(\varvec{\theta }, \varvec{\alpha })}{\partial \phi ^{2}}=\frac{1}{\phi ^{2}}\left\{ \frac{n}{2}+\varvec{\delta }^{T} {\mathbf {D}}_{\zeta ^{\prime }} \varvec{\delta }-\frac{1}{\phi } \varvec{\epsilon }^{T} {\mathbf {D}}_{v} \varvec{\epsilon }\right\} } {\mathrm {and}}\\ {\frac{\partial ^{2} L_{\mathrm {p}}(\varvec{\theta }, \varvec{\alpha })}{\partial {\mathbf {f}} \partial \phi }=\frac{2}{\phi ^{2}} {\mathbf {N}}^{T} {\mathbf {b}}}, \end{array} \end{aligned}$$

where \({\mathbf {D}}_{a}={\mathrm {diag}}_{1 \le i \le n}\left( a_{i}\right) , {\mathbf {D}}_{\zeta ^{\prime }}={\mathrm {diag}}_{1 \le i \le n}\left( \zeta _{i}^{\prime }\right) , {\mathbf {b}}=\left( b_{1}, \ldots , b_{n}\right) ^{T}, \varvec{\delta }=\left( \delta _{1}, \ldots , \delta _{n}\right) ^{T}\), \(\varvec{\epsilon } = \left( \epsilon _{1}, \ldots , \epsilon _{n}\right) ^{T}, a_{i}=-2\left( \zeta _{i}+2 \zeta _{i}^{\prime } \delta _{i}\right) , b_{i}=\left( \zeta _{i}+\zeta _{i}^{\prime } \delta _{i}\right) \epsilon _{i}, \epsilon _{i} = \textit{y}_{i} - \mu _{i}, \zeta _{i}^{\prime }=\frac{d \zeta _{i}}{d \delta _{i}}\) and \(\delta _{i}=\phi ^{-1} \epsilon _{i}^{2}.\)

1.1.1 Cases-weight perturbation

Let us consider cases-weight perturbation for the observations in the penalized log-likelihood function as

$$\begin{aligned} L_{\mathrm {p}}(\varvec{\theta }, \alpha | \varvec{\omega })=\sum _{i=1}^{n} \omega _{i} L_{i}(\varvec{\theta })-\frac{\alpha }{2} {\mathbf {f}}^{T} {\mathbf {K f}}, \end{aligned}$$

where \(\varvec{\omega }=\left( \omega _{1}, \ldots , \omega _{n}\right) ^{T}\) is the vector of weights, with \(0 \le \omega _{i} \le 1,\) for \(i=1, \ldots , n\). In this case, the vector of no perturbation is given by \(\varvec{\omega }_{0}=(1, \ldots , 1)^{T}\). Differentiating \(L_{\mathrm {p}}(\varvec{\theta }, \alpha | \varvec{\omega })\) with respect to the elements of \(\varvec{\theta }\) and \(\omega _{i},\) we obtain after some algebraic manipulation

$$\begin{aligned} \left. \frac{\partial ^{2} L_{\mathrm {p}_{i}}(\varvec{\theta }, \alpha | \varvec{\omega })}{\partial {\mathbf {f}} \partial \omega _{i}}\right| _{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{2}{\widehat{\phi }} \widehat{\zeta }_{i} \widehat{\epsilon }_{i} {\mathbf {n}}_{i} \quad \text {and}\\ \left. \frac{\partial ^{2} L_{\mathrm {p}_{i}}(\varvec{\theta }, \alpha | \varvec{\omega })}{\partial \phi \partial \omega _{i}}\right| _{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{1}{2 \widehat{\phi }}-\frac{1}{2 \widehat{\phi }} \widehat{\zeta }_{i} \widehat{\delta }_{i} , \end{aligned}$$

where \(\widehat{\epsilon }_{i} = y_i - \widehat{\mu {_i}}\), for \(i=1, \ldots , n\).

1.1.2 Scale perturbation

Under scale parameter perturbation scheme it is assumed that \({\mathrm {y}}_{i} \sim S(\mu _{i}, \omega _{i}^{-1} \phi , g ),\) where \(\omega =\left( \omega _{1}, \ldots , \omega _{n}\right) ^{T}\) is the vector of perturbations, with \(\omega _{i}>0,\) for \(i=1, \ldots , n .\) In this case, the vector of no perturbation is given by \(\omega _{0}=(1, \ldots , 1)^{T}\) such that \(L_{\mathrm {p}}(\varvec{\theta }, \alpha | \varvec{\omega =\omega _{0}})=L_{\mathrm {p}}(\varvec{\theta }, \alpha ).\) Taking differentials of \(L_{\mathrm {P}}(\varvec{\theta }, \alpha | \varvec{\omega })\) with respect to the elements of \(\varvec{\theta }\) and \(\omega _{i},\) we obtain after some algebraic manipulation

$$\begin{aligned} \frac{ \partial ^{2} L_{\mathrm {p}_{i}} (\varvec{\theta }, \alpha | \varvec{\omega })}{\partial {\mathbf {f}} \partial \omega _{i}}|_{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{2}{\widehat{\phi }}\left\{ \widehat{\zeta }_{i}^{\prime } \hat{\delta }_{i}+\widehat{\zeta }_{i}\right\} \widehat{\epsilon }_{i} {\mathbf {n}}_{i}^{T}\\ \frac{\partial ^{2} L_{\mathrm {p}_{i}}(\varvec{\theta }, \alpha | \varvec{\omega })}{\partial \phi \partial \omega _{i}}|_{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{1}{2\widehat{\phi }}-\frac{1}{\hat{\phi }}\left\{ \widehat{\zeta }_{i}^{\prime } \hat{\delta }_{i}+\widehat{\zeta }_{i}\right\} \hat{\delta }_{i} , \end{aligned}$$

where \(\widehat{\epsilon }_{i} = y_i - \widehat{\mu {_i}}\), for \(i=1, \ldots , n\).

1.1.3 Response variable perturbation

To perturb the response variable values we consider \(y_{i \omega }=y_{i}+\omega _{i},\) for \(i=1, \ldots , n\), where \(\omega =\left( \omega _{1}, \ldots , \omega _{n}\right) ^{T}\) is the vector of perturbations. Here, the vector of no perturbation is given by \(\varvec{\omega }_{0}=(0, \ldots , 0)^{T}\) and the perturbed penalized log-likelihood function is constructed from (5) with \({\mathrm {y}}_{i}\) replaced by \({\mathrm {y}}_{i \omega },\) that is,

$$\begin{aligned} L_{\mathrm {p}}(\varvec{\theta }, \alpha | \varvec{\omega })=L(\varvec{\theta } | \varvec{\omega })-\frac{\alpha }{2} {\mathbf {f}}^{T} {\mathbf {K}} {\mathbf {f}}, \end{aligned}$$

where \(L(\cdot )\) is given by (4) with \(\delta _{i \omega }=\phi ^{-1}\left( {\mathrm {y}}_{i \omega }-\mu _{i}\right) ^{2}\) in the place of \(\delta _{i} .\) Differentiating \(L_{\mathrm {p}}(\varvec{\theta }, \alpha | \varvec{\omega })\) with respect to the elements of \(\varvec{\theta }\) and \(\omega _{i},\) we obtain, after some algebraic manipulation, that

$$\begin{aligned} \frac{\partial ^{2} L_{\mathrm {p}_{i}}({\varvec{\theta }}, \alpha | {\varvec{\omega }})}{\partial {\mathbf {f}} \partial \omega _{i}}|_{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{1}{\widehat{\phi }}\left\{ 2 \widehat{\zeta _{i}}^{\prime } \widehat{\delta }_{i}+2 \widehat{\zeta }_{i} \right\} \quad \text {and} \\ \frac{\partial ^{2} L_{\mathrm {P}_{i}}({\varvec{\theta }}, \alpha | {\varvec{\omega }})}{\partial \phi \partial \omega _{i}}|_{\theta =\hat{\theta }, \omega =\omega _{0}}= & {} -\frac{\widehat{\epsilon }_{i}}{\widehat{\phi }^{2}}\left\{ 2 \widehat{\zeta }_{i}^{\prime } \widehat{\delta }_{i}+2 \widehat{\zeta }_{i}\right\} , \end{aligned}$$

where \(\widehat{\epsilon }_{i} = y_i - \widehat{\mu {_i}}\), for \(i=1, \ldots , n\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ibacache-Pulgar, G., Villegas, C., López-Gonzales, J.L. et al. Influence measures in nonparametric regression model with symmetric random errors. Stat Methods Appl 32, 1–25 (2023). https://doi.org/10.1007/s10260-022-00648-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-022-00648-z

Keywords

Navigation