Elsevier

Economic Modelling

Volume 33, July 2013, Pages 593-604
Economic Modelling

Pareto–lognormal distributions: Inequality, poverty, and estimation from grouped income data

https://doi.org/10.1016/j.econmod.2013.04.046Get rights and content

Highlights

  • Estimation and properties of the Pareto–lognormal distribution are considered.

  • Moment distribution functions, inequality and poverty measures are derived.

  • Generalized method of moments estimation from grouped data is considered.

  • Performance is compared with that of the generalized beta-2 distribution.

Abstract

The (double) Pareto–lognormal is an emerging parametric distribution for income that has a sound underlying generating process, good theoretical properties, and some limited favorable evidence of its fit to data. We extend existing results for this distribution in 3 directions. (1) We derive closed form formula for its moment distribution functions, and for various inequality and poverty measures. (2) We show how it can be estimated from grouped data using GMM. (3) Using grouped data from ten countries, we compare its performance with that of another leading 4-parameter income distribution, the generalized beta-2 distribution. The results confirm that, when using grouped data, both distributions provide a good fit, with the double Pareto–lognormal distribution outperforming the beta distribution in 4 out of 10 cases.

Introduction

The estimation of parametric income distributions is useful for assessing inequality and poverty, and for welfare comparisons over time and space. This is particularly so when income data are available in grouped form, such as income and population shares, making the use of nonparametric methods less desirable.1 For more than a century considerable effort has been directed towards providing new parametric functional forms for income distributions. A major motivation of such studies has been to suggest functions with theoretical properties and statistical fits that are better than those of functions already appearing in the literature. The large number of functional forms that has been suggested includes, but is not limited to, the lognormal, gamma, Pareto, Weibull, Dagum, Singh-Maddala, beta-2, and generalized beta-2 distributions, and the quest continues. See Kleiber and Kotz (2003) for a comprehensive review. Despite the large number of distributions that have appeared, there has been no clear winner. An emerging 4-parameter distribution for income and other phenomena is the double Pareto–lognormal distribution (hereafter dPLN) developed by Reed (2003) and studied further by Reed and Jorgensen (2004). It has good properties with a sound theoretical justification (Reed, 2003), and, according to Reed and Wu (2008), provides a very good fit to income data.

The purpose of this paper is to provide some further results for this distribution. We derive closed form solutions for various inequality and poverty measures in terms of the parameters of the dPLN distribution and show how they relate to the corresponding inequality and poverty measures for the lognormal and Pareto distributions. When grouped data are available as income and population proportions, and mean income is also observed, enabling calculation of group mean incomes, the generalized method of moments (GMM) framework developed in Hajargasht et al. (2012) is convenient for estimation. We provide the moment distribution functions which are required for estimating the dPLN distribution within this framework. Results from estimating this model are compared with those from estimating another leading 4-parameter distribution known as the generalized beta of the second kind (hereafter GB2), developed by McDonald (1984) and McDonald and Xu (1995). Using grouped data from ten regions (China rural, China urban, India rural, India urban, Pakistan, Russia, Poland, Brazil, Nigeria and Iran), we estimate the GB2, a 3-parameter Pareto–lognormal, and the dPLN distributions, and compare their performance in terms of goodness of fit. The results suggest that all three distributions provide good fits, and there is not one particular distribution that dominates the others over all datasets.

The paper is organized as follows. In Section 2 we briefly review the dPLN distribution. Expressions are provided for its moments and moment distribution functions, and for various inequality and poverty measures. The GB2 distribution that we later compare with the dPLN is reviewed in Section 3. 2 Pareto–lognormal income distributions, 3 The generalized beta distribution of the second kind contain results about the dPLN and GB2 distributions that are useful irrespective of whether or not the data are in grouped form. In Section 4 we summarize the GMM methodology developed in Hajargasht et al. (2012) for using grouped data to estimate the parameters of a general income distribution, and show how it can be applied to the dPLN distribution. This methodology draws on the results for moments and moment distribution functions that are provided in Section 2. Section 5 contains a description of the data used to illustrate the theoretical framework, and the results. The results include parameter estimates and their standard errors, test results for excess moment conditions, mean-square-error comparisons for goodness-of-fit, Gini and Theil coefficients, and coefficients of variation. Concluding remarks are provided in Section 6.

Section snippets

Pareto–lognormal income distributions

The probability density function (pdf) of the dPLN distribution with parameters (m, σ, α, β) isfdPLNymσαβ=αβα+βyϕlnymσRx1+Rx2y>0where R(t) = [1  Φ(t)]/ϕ(t) is a Mills' ratio, ϕ(.) and Φ(.) are, respectively, the pdf and cumulative distribution function (cdf) for a standard normal random variable,x1=ασlnymσandx2=βσ+lnymσ.

The cdf of the dPLN distribution has been derived as (Reed and Jorgensen, 2004)FdPLNymσαβ=0yftdt=ΦlnymσϕlnymσβRx1αRx2α+β.

Two attractive features of the dPLN distribution

The generalized beta distribution of the second kind

For a distribution with which to compare the performance of the PLN and dPLN distributions, we chose the GB2 distribution whose pdf with positive parameters (a, b, p, q) isfyabpq=ayap1bapBpq1+ybap+qy>0where B(⋅,⋅) is the beta function. Like the dPLN, the GB2 income distribution is derived from a reasonable economic model. Parker (1999) shows how it arises from a neoclassical model with optimizing firm behavior under uncertainty where the shape parameters p and q become functions of the

GMM estimation from grouped data

We are concerned with estimation of the dPLN distribution from grouped data in the form typically provided by the World Bank's Povcal website2 or by the World Institute for Development Economics Research (WIDER).3 These data comprise population shares that we will denote by ci and corresponding income shares that we will denote by si for a number of groups, say i = 1, 2, …, N. We

Description of data and empirical analysis

To compare the relative performance of the GB2, PLN and dPLN distributions we chose a sample of countries from the large number on the World Bank web site.4 The grouped data that are available are such that “income” is typically measured as household per capita expenditure or consumption, although for some countries per capita income is used. A possible disadvantage of the data is that, despite allowance being made for household size, no allowance is made for

Concluding remarks

The Pareto–lognormal and double Pareto–lognormal distributions have been advocated as good choices for modeling income distributions because of their sound theoretical base and their superior empirical goodness-of-fit. We have added to the existing literature in four ways: (1) Expressions for inequality measures in terms of the parameters of the distributions have been derived. (2) We show how the distributions can be estimated from grouped data using the generalized method of moments, and we

References (23)

  • R. Gibrat

    Les Inégalités Économiques

    (1931)
  • Cited by (32)

    • The power-law distribution for the income of poor households

      2020, Physica A: Statistical Mechanics and its Applications
      Citation Excerpt :

      In previous some studies on income distribution, the Pareto distribution is often applied for modeling the upper part of income, whereas other parametric distributions such as lognormal, gamma and exponential distribution, are used for the lower part of the income [5,45–48]. In addition, other studies have utilized parametric distributions, such as generalized beta II, Singh–Maddala, Dagum and dPlN distribution, for describing income distribution [25,49–53]. In this study, we provide an empirical evidence that the heavy lower tail of income distribution can be adequately fitted by reverse Pareto distribution, showing that this model can reasonably explain the lower tail data that cover the poor income group in the society.

    • Estimation and inference for area-wise spatial income distributions from grouped data

      2020, Computational Statistics and Data Analysis
      Citation Excerpt :

      A standard approach for estimating income distributions is to assume a parametric family of income distributions to approximate the true income distribution and estimate its unknown parameters based on the (limited) data. There exists a wide variety of families of distributions available (e.g. Hajargasht and Griffiths, 2013; Kleiber and Kotz, 2003; McDonald, 1984; McDonald and Xu, 1995; Singh and Maddala, 1976). The existing approaches predominantly focus on the income distribution using the data only from a single area, e.g. country or state, in a single period.

    View all citing articles on Scopus
    View full text