Skip to main content
Log in

Partitioned algorithms for maximum likelihood and other non-linear estimation

  • Papers
  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

There are a variety of methods in the literature which seek to make iterative estimation algorithms more manageable by breaking the iterations into a greater number of simpler or faster steps. Those algorithms which deal at each step with a proper subset of the parameters are called in this paper partitioned algorithms. Partitioned algorithms in effect replace the original estimation problem with a series of problems of lower dimension. The purpose of the paper is to characterize some of the circumstances under which this process of dimension reduction leads to significant benefits.

Four types of partitioned algorithms are distinguished: reduced objective function methods, nested (partial Gauss-Seidel) iterations, zigzag (full Gauss-Seidel) iterations, and leapfrog (non-simultaneous) iterations. Emphasis is given to Newton-type methods using analytic derivatives, but a nested EM algorithm is also given. Nested Newton methods are shown to be equivalent to applying to same Newton method to the reduced objective function, and are applied to separable regression and generalized linear models. Nesting is shown generally to improve the convergence of Newton-type methods, both by improving the quadratic approximation to the log-likelihood and by improving the accuracy with which the observed information matrix can be approximated. Nesting is recommended whenever a subset of parameters is relatively easily estimated. The zigzag method is shown to produce a stable but generally slow iteration; it is fast and recommended when the parameter subsets have approximately uncorrelated estimates. The leapfrog iteration has less guaranteed properties in general, but is similar to nesting and zigzagging when the parameter subsets are orthogonal.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Aitkin, M. (1987) Modelling variance heterogeneity in normal regression using GLIM. Appl. Statist., 36, 332–9.

    Google Scholar 

  • Amari, S. (1982) Differential geometry of curved exponential families—curvatures and information loss. Ann. Statist., 10, 357–85.

    Google Scholar 

  • Amari, S. (1985) Differential geometrical methods in statistics. Lecture Notes in Statistics 28, Springer-Verlag, Heidelberg.

    Google Scholar 

  • Barham, R. H. and Drane, W. (1972) An algorithm for least estimation of non-linear parameters when some of the parameters are linear. Technometrics, 14, 757–66.

    Google Scholar 

  • Barndorff-Nielsen, O. E. (1977) Exponentially decreasing distributions for the logarithm of particle size. J. Roy. Statist. Soc. A, 353, 401–9.

    Google Scholar 

  • Bates, D. M. and Lindstrom, M. J. (1986) Nonlinear least squares with conditionally linear parameters. In: Proceedings Statistical Computing Section, American Statistical Association, New York.

    Google Scholar 

  • Bates, D. M. and Watts, D. G. (1980) Relative curvature measures of nonlinearity. J. R. Statist. Soc. B, 42, 1–25.

    Google Scholar 

  • Bates, D. M. and Watts, D. G. (1988) Nonlinear Regression Analysis and its Applications. Wiley, New York.

    Google Scholar 

  • Box, G. E. P. and Cox, D. R. (1964) An analysis of transformation (with discussion). J. R. Statistic. Soc. B, 26, 211–52.

    Google Scholar 

  • Chambers, J. M. and Hastie, T. J. (ed.) (1992) Statistical Modelsin S. Wadsworth & Brooks/Cole, Pacific Grove, CA.

    Google Scholar 

  • Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977) Maximum likelihood from incomplete data via the EM algorithm (with discussion). J. R. Statist. Soc. B, 39, 1–37.

    Google Scholar 

  • Dieudonné, J. A. E. (1960) Foundations of Modern Analysis. Academic Press, New York.

    Google Scholar 

  • Efron, B. (1975) Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist., 3, 1189–242.

    Google Scholar 

  • Fieller, N. R. J., Flenley, E. C. and Olbricht, W. (1992) Statistics of particle size data. Applied Statist., 41, 127–46.

    Google Scholar 

  • Gallant, A. R. (1987) Nonlinear Statistical Models. Wiley, New York.

    Google Scholar 

  • Golub, G. H. and Pereyra, V. (1973) The differentiation of pseudo-inverses and nonlinear least squares problems whose variables separate. SIAM J. Numer. Anal., 10, 413–32.

    Google Scholar 

  • Golub, G. H. and Pereyra, V. (1976) The differentiation of pseudo-inverses, separable nonlinear least square problems and other tales. In: Generalized Inverses and Applications, pp. 303–24. Academic Press, New York.

    Google Scholar 

  • Golub, G. H. and van Loan, C. F. (1983) Matrix Computations. Johns Hopkins University Press, Baltimore, MD.

    Google Scholar 

  • Hartley, H. O., (1948) The estimation of non-linear parameters by ‘internal least squares’. Biometrika, 35, 32–45.

    Google Scholar 

  • Harville, D. A. (1973) Fitting partially linear models by weighted least squares. Technometrics, 15, 509–15.

    Google Scholar 

  • Jennrich, R. I. (1969) Asymptotic properties of non-linear least squares estimators. Ann. Math. Statist., 40, 633–43.

    Google Scholar 

  • Jensen, J. (1988) Maximum likelihood estimation of hyperbolic parameters from grouped observations. Comput. Geosci., 14, 380–408.

    Google Scholar 

  • Jensen, S. T., Johansen, S. and Lauritzen, S. L. (1991) Globally convergent algorithms for maximizing a likelihood function. Biometrika, 78, 867–77.

    Google Scholar 

  • Jørgensen, B. (1984) The delta algorithm and GLIM. Int. Statist. Rev., 52, 282–300.

    Google Scholar 

  • Jørgensen, B. (1987) Exponential dispersion models. J. R. Statist. Soc. B, 49, 127–62.

    Google Scholar 

  • Kass, R. E. and Slate, E. H. (1992) Reparametrization and diagnostics of posterior non-normality. In Bayesian Statistics 4. Proceedings of the Fourth Valencia International Meeting (J. O. Berger, J. M. Bernardo, A. P. Dawid, D.V. Lindley and A. F. M. Smith, eds.) 289–305. Oxford University Press.

  • Kaufmann, L. (1975) A variable projection method for solving separable nonlinear least squares problems. BIT, 15, 49–57.

    Google Scholar 

  • Khuri, A. I. (1984) A note on D-optimal designs for partially nonlinear regression models. Technometrics, 26, 59–61.

    Google Scholar 

  • Kowalik, J. and Osborne, M. R. (1968) Methods for Unconstrained Optimization Problems. American Elsevier, New York.

    Google Scholar 

  • Lawton, W. H. and Sylvestre, E. A. (1971) Elimination of linear parameters in nonlinear regression. Technometrics, 13, 461–7.

    Google Scholar 

  • McCullagh, P. and Nelder, J. A. (1989) Generalized Linear Models, 2nd edn. Chapman and Hall, London.

    Google Scholar 

  • Ortega, J. M. and Rheinboldt, W. C. (1970) Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York.

    Google Scholar 

  • Osborne, M. R. (1972) Some aspects of nonlinear least squares calculations. In: Numerical Methods for Nonlinear Optimization, Lootsma, F. (ed.), Academic Press, New York.

    Google Scholar 

  • Osborne, M. R. (1987) Estimating nonlinear models by maximum likelihood for the exponential family. SIAM J. Sci. Statist. Comp., 8, 446–56.

    Google Scholar 

  • Osborne, M. R. (1992) Fisher's method of scoring. Int. Statist. Rev., 60, 99–117.

    Google Scholar 

  • Osborne, M. R. and Smyth, G. K. (1991) A modified Prony algorithm for fitting functions defined by difference equations. SIAM J. Sci. Statist. Comp., 12, 362–82.

    Google Scholar 

  • Ostrowski, A. M. (1960) Solutions of Equations and Systems of Equations. Academic Press, New York.

    Google Scholar 

  • Pimentel-Gomes, F. (1953) The use of Mitscherlich's regression law in the analysis of experiments with fertilizers. Biometrics, 9, 498–516.

    Google Scholar 

  • Pregibon, D. (1980) Goodness of link tests for generalised linear models. Appl. Statist., 29, 15–24.

    Google Scholar 

  • Rao, C. R. (1973) Linear Statistical Inference and its Applications. Wiley, New York.

    Google Scholar 

  • Ratkowsky, D. A. (1983) Nonlinear Regression Modelling, A Unified Practical Approach, Dekker, New York.

    Google Scholar 

  • Ratkowsky, D. A. (1989) Handbook of Nonlinear Regression Models. Dekker, New York.

    Google Scholar 

  • Richards, F. S. G. (1961) A method of maximum-likelihood estimation. J. R. Statist. Soc. B, 23, 469–75.

    Google Scholar 

  • Ross, G. J. S. (1970) The efficient use of function minimization in non-linear maximum-likelihood estimation. Appl. Statist., 19, 205–21.

    Google Scholar 

  • Ross, G. J. S. (1990) Nonlinear Estimation. Springer-Verlag, New York.

    Google Scholar 

  • Ruhe, A. and Wedin, P. A. (1980) Algorithms for separable nonlinear least squares problems. SIAM Rev., 22, 318–37.

    Google Scholar 

  • Scallan, A. (1982) Some aspects of parametric link functions. In: GLIM 82, R. Gilchrist (ed.), New York: Springer-Verlag.

    Google Scholar 

  • Scallan, A., Gilchrist, R. and Green, M. (1984) Fitting parametric link functions in generalised linear models. Comput. Statist. Data Anal., 2, 37–49.

    Google Scholar 

  • Schall, R. (1991) Estimation in generalized linear models with random effects. Biometrika, 78, 719–27.

    Google Scholar 

  • Seber, G. A. F. and Wild, C. J. (1989) Nonlinear Regression. Wiley, New York.

    Google Scholar 

  • Smyth, G. K. (1987) Curvature and convergence. 1987 Proceedings of the Statistical Computing Section. American Statistical Association, Virginia, pp. 278–83.

    Google Scholar 

  • Smyth, G. K. (1989) Generalized linear models with varying dispersion. J. R. Statist. Soc. B, 51, 47–60.

    Google Scholar 

  • Smyth, G. K. (1992) Using Poisson-gamma generalized linear models to model data with exact zeros. Technical Report, Department of Mathematics, University of Queensland.

  • Sprott, D. A. (1973) Normal likelihoods and their relation to large sample theory estimation. Biometrika, 60, 457–65.

    Google Scholar 

  • Stevens, W. L. (1951) Asymptotic regression. Biometrics, 7, 247–67.

    Google Scholar 

  • Thisted, R. (1988) Elements of Statistical Computing. Chapman & Hall, New York.

    Google Scholar 

  • Tweedie, M. C. K. (1984) An index which distinguishes between some important exponential families. In: Statistics: Applications and New Directions. Proceedings of the Indian Statistical Institute Golden Jubilee International Conference (eds. J. K. Ghosh and J. Roy), pp. 579–604. Indian Statistical Institute, Calcutta.

    Google Scholar 

  • Varah, J. M. (1990) Relative sizes of the Hessian terms in nonlinear parameter estimation. SIAM J. Sci. Stat. Comput., 11, 174–9.

    Google Scholar 

  • Verbyla, A. P. (1993) Modelling variance heterogeneity: residual maximum likelihood and diagnostics. J. Roy. Statist. Soc. B, 55, 493–508.

    Google Scholar 

  • Walling, D. (1968) Non-linear least squares curve fitting when some parameters are linear. Texas J. Science, 20, 119–24.

    Google Scholar 

  • Weisberg, S. and Welsh, A. H. (1994) Adapting for the missing link. Ann. Statist., 22, 1674–1700.

    Google Scholar 

  • Wermuth, N. and Scheidt, E. (1977) Algorithm AS105: Fitting a covariance selection model to a matrix. Appl. Statist., 26, 88–92.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Smyth, G.K. Partitioned algorithms for maximum likelihood and other non-linear estimation. Stat Comput 6, 201–216 (1996). https://doi.org/10.1007/BF00140865

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF00140865

Keywords

Navigation