Pareto–lognormal distributions: Inequality, poverty, and estimation from grouped income data
Introduction
The estimation of parametric income distributions is useful for assessing inequality and poverty, and for welfare comparisons over time and space. This is particularly so when income data are available in grouped form, such as income and population shares, making the use of nonparametric methods less desirable.1 For more than a century considerable effort has been directed towards providing new parametric functional forms for income distributions. A major motivation of such studies has been to suggest functions with theoretical properties and statistical fits that are better than those of functions already appearing in the literature. The large number of functional forms that has been suggested includes, but is not limited to, the lognormal, gamma, Pareto, Weibull, Dagum, Singh-Maddala, beta-2, and generalized beta-2 distributions, and the quest continues. See Kleiber and Kotz (2003) for a comprehensive review. Despite the large number of distributions that have appeared, there has been no clear winner. An emerging 4-parameter distribution for income and other phenomena is the double Pareto–lognormal distribution (hereafter dPLN) developed by Reed (2003) and studied further by Reed and Jorgensen (2004). It has good properties with a sound theoretical justification (Reed, 2003), and, according to Reed and Wu (2008), provides a very good fit to income data.
The purpose of this paper is to provide some further results for this distribution. We derive closed form solutions for various inequality and poverty measures in terms of the parameters of the dPLN distribution and show how they relate to the corresponding inequality and poverty measures for the lognormal and Pareto distributions. When grouped data are available as income and population proportions, and mean income is also observed, enabling calculation of group mean incomes, the generalized method of moments (GMM) framework developed in Hajargasht et al. (2012) is convenient for estimation. We provide the moment distribution functions which are required for estimating the dPLN distribution within this framework. Results from estimating this model are compared with those from estimating another leading 4-parameter distribution known as the generalized beta of the second kind (hereafter GB2), developed by McDonald (1984) and McDonald and Xu (1995). Using grouped data from ten regions (China rural, China urban, India rural, India urban, Pakistan, Russia, Poland, Brazil, Nigeria and Iran), we estimate the GB2, a 3-parameter Pareto–lognormal, and the dPLN distributions, and compare their performance in terms of goodness of fit. The results suggest that all three distributions provide good fits, and there is not one particular distribution that dominates the others over all datasets.
The paper is organized as follows. In Section 2 we briefly review the dPLN distribution. Expressions are provided for its moments and moment distribution functions, and for various inequality and poverty measures. The GB2 distribution that we later compare with the dPLN is reviewed in Section 3. 2 Pareto–lognormal income distributions, 3 The generalized beta distribution of the second kind contain results about the dPLN and GB2 distributions that are useful irrespective of whether or not the data are in grouped form. In Section 4 we summarize the GMM methodology developed in Hajargasht et al. (2012) for using grouped data to estimate the parameters of a general income distribution, and show how it can be applied to the dPLN distribution. This methodology draws on the results for moments and moment distribution functions that are provided in Section 2. Section 5 contains a description of the data used to illustrate the theoretical framework, and the results. The results include parameter estimates and their standard errors, test results for excess moment conditions, mean-square-error comparisons for goodness-of-fit, Gini and Theil coefficients, and coefficients of variation. Concluding remarks are provided in Section 6.
Section snippets
Pareto–lognormal income distributions
The probability density function (pdf) of the dPLN distribution with parameters (m, σ, α, β) iswhere R(t) = [1 − Φ(t)]/ϕ(t) is a Mills' ratio, ϕ(.) and Φ(.) are, respectively, the pdf and cumulative distribution function (cdf) for a standard normal random variable,
The cdf of the dPLN distribution has been derived as (Reed and Jorgensen, 2004)
Two attractive features of the dPLN distribution
The generalized beta distribution of the second kind
For a distribution with which to compare the performance of the PLN and dPLN distributions, we chose the GB2 distribution whose pdf with positive parameters (a, b, p, q) iswhere B(⋅,⋅) is the beta function. Like the dPLN, the GB2 income distribution is derived from a reasonable economic model. Parker (1999) shows how it arises from a neoclassical model with optimizing firm behavior under uncertainty where the shape parameters p and q become functions of the
GMM estimation from grouped data
We are concerned with estimation of the dPLN distribution from grouped data in the form typically provided by the World Bank's Povcal website2 or by the World Institute for Development Economics Research (WIDER).3 These data comprise population shares that we will denote by ci and corresponding income shares that we will denote by si for a number of groups, say i = 1, 2, …, N. We
Description of data and empirical analysis
To compare the relative performance of the GB2, PLN and dPLN distributions we chose a sample of countries from the large number on the World Bank web site.4 The grouped data that are available are such that “income” is typically measured as household per capita expenditure or consumption, although for some countries per capita income is used. A possible disadvantage of the data is that, despite allowance being made for household size, no allowance is made for
Concluding remarks
The Pareto–lognormal and double Pareto–lognormal distributions have been advocated as good choices for modeling income distributions because of their sound theoretical base and their superior empirical goodness-of-fit. We have added to the existing literature in four ways: (1) Expressions for inequality measures in terms of the parameters of the distributions have been derived. (2) We show how the distributions can be estimated from grouped data using the generalized method of moments, and we
References (23)
- et al.
A generalization of the beta distribution with applications
Journal of Econometrics
(1995) The generalized beta as a model for the distribution of earnings
Economics Letters
(1999)The Pareto, Zipf and other power laws
Economics Letters
(2001)The Pareto law of incomes — an explanation and extension
Physica A
(2003)An axiomatic characterization of the Watts index
Economics Letters
(1993)- et al.
Age, luck and inheritance
A model of income distribution
The Economic Journal
(1953)- et al.
Global income distributions and inequality, 1993 and 2000: incorporating country-level inequality modelled with beta distributions
The Review of Economics and Statistics
(2012) A new model of income distribution: the Pareto–lognormal distribution
Measuring Inequality
(2011)
Les Inégalités Économiques
Cited by (32)
Moderate opulence: the evolution of wealth inequality in Mexico in its first century of independence
2024, Explorations in Economic HistoryMultiple scaling law in networks with dynamic spatial constraint
2023, Chaos, Solitons and FractalsEmergence of double power-law degree distribution by controlling the evolution of BA model
2021, Physica A: Statistical Mechanics and its ApplicationsThe power-law distribution for the income of poor households
2020, Physica A: Statistical Mechanics and its ApplicationsCitation Excerpt :In previous some studies on income distribution, the Pareto distribution is often applied for modeling the upper part of income, whereas other parametric distributions such as lognormal, gamma and exponential distribution, are used for the lower part of the income [5,45–48]. In addition, other studies have utilized parametric distributions, such as generalized beta II, Singh–Maddala, Dagum and dPlN distribution, for describing income distribution [25,49–53]. In this study, we provide an empirical evidence that the heavy lower tail of income distribution can be adequately fitted by reverse Pareto distribution, showing that this model can reasonably explain the lower tail data that cover the poor income group in the society.
Estimation and inference for area-wise spatial income distributions from grouped data
2020, Computational Statistics and Data AnalysisCitation Excerpt :A standard approach for estimating income distributions is to assume a parametric family of income distributions to approximate the true income distribution and estimate its unknown parameters based on the (limited) data. There exists a wide variety of families of distributions available (e.g. Hajargasht and Griffiths, 2013; Kleiber and Kotz, 2003; McDonald, 1984; McDonald and Xu, 1995; Singh and Maddala, 1976). The existing approaches predominantly focus on the income distribution using the data only from a single area, e.g. country or state, in a single period.
Social status competition and the impact of income inequality in evolving social networks: An agent-based model
2019, Journal of Behavioral and Experimental Economics