Elsevier

Journal of Econometrics

Volume 142, Issue 2, February 2008, Pages 698-714
Journal of Econometrics

Manipulation of the running variable in the regression discontinuity design: A density test

https://doi.org/10.1016/j.jeconom.2007.05.005Get rights and content

Abstract

Standard sufficient conditions for identification in the regression discontinuity design are continuity of the conditional expectation of counterfactual outcomes in the running variable. These continuity assumptions may not be plausible if agents are able to manipulate the running variable. This paper develops a test of manipulation related to continuity of the running variable density function. The methodology is applied to popular elections to the House of Representatives, where sorting is neither expected nor found, and to roll call voting in the House, where sorting is both expected and found.

Introduction

One reason for the increasing popularity in economics of regression discontinuity applications is the perception that the identifying assumptions are quite weak. However, while some applications of the design can be highly persuasive, many are subject to the criticism that public knowledge of the treatment assignment rule may invalidate the continuity assumptions at the heart of identification.

Consider a hypothetical example. A doctor plans to randomly assign heart patients to a statin and a placebo to study the effect of the statin on heart attack within 10 years. The doctor randomly assigns patients to two different waiting rooms, A and B, and plans to give those in A the statin and those in B the placebo. If some of the patients learn of the planned treatment assignment mechanism, we would expect them to proceed to waiting room A. If the doctor fails to divine the patients’ contrivance and follows the original protocol, random assignment of patients to separate waiting rooms may be undone by patient sorting after random assignment. In the regression discontinuity context, an analogous evaluation problem may occur in the common case where the treatment assignment rule is public knowledge (cf., Lee, 2007).

In this paper, I propose a formal test for sorting of this type. The test is based on the intuition that, in the example above, we would expect for waiting room A to become crowded. In the regression discontinuity context, this is analogous to expecting the running variable to be discontinuous at the cutoff, with surprisingly many individuals just barely qualifying for a desirable treatment assignment and surprisingly few failing to quality. This test will be informative when manipulation of the running variable is monotonic, in a sense to be made specific below.

The proposed test is based on an estimator for the discontinuity at the cutoff in the density function of the running variable. The test is implemented as a Wald test of the null hypothesis that the discontinuity is zero. The estimator, which is a simple extension of the local linear density estimator (Cheng et al., 1997), proceeds in two steps. In the first step, one obtains a finely gridded histogram. In the second step, one smooths the histogram using local linear regression, separately on either side of the cutoff. To efficiently convey sensitivity of the discontinuity estimate to smoothing assumptions, one may augment a graphical presentation of the second-step smoother with the first-step histogram, analogous to presenting local averages along with an estimated conditional expectation.

This test complements existing specification checks in regression discontinuity applications. Authors routinely report on the smoothness of pre-determined characteristics around the cutoff (e.g., DiNardo and Lee, 2004). If the particular pre-determined characteristics the researcher has at disposal are relevant to the problem, this method should be informative about any sorting around the discontinuity. However, in some applications pre-determined characteristics are either not available, or those which are available are not relevant to the outcome under study. By way of contrast, the density test may always be conducted, since data on the running variable is required for any analysis. The method is also useful in applications where a discontinuous density function is itself the object of interest. For example, Saez, 1999, Saez, 2002 measures tax avoidance using the discontinuity in the density of income reported to the Internal Revenue Service.

To show how the estimator works in practice, I apply the methodology to two distinct settings. The first setting is popular elections to the United States House of Representatives, considered in Lee, 2001, Lee, 2007 incumbency study. In this context, it is natural to assume that the density function of the democratic vote share is continuous at 50%. The data do not reject this prediction.1 The second setting is roll call votes in the House. In this context, the vote tally for a given bill is expected to be subject to manipulation. Although the number of representatives would seem to make coordination between members difficult, these problems are overcome by a combination of the repeated game aspect of roll call votes and the fact that a representative's actual vote becomes public knowledge, enabling credible commitments and vote contracting. In this setting, the density test provides strong evidence of manipulation.

The remainder of the paper is organized as follows. Section 2 defines manipulation and distinguishes between partial and complete manipulation. Section 3 describes the estimator and discusses smoothing parameter methods and inference procedures. Section 4 motivates the manipulation problem with a hypothetical job training program. Section 5 presents the results of a small simulation study. Section 6 presents the empirical analysis, and Section 7 concludes. Appendix A gives a proof of the proposition of Section 3, and Appendix B describes the data.

Section snippets

Identification under partial and complete manipulation

Let Yi denote an outcome and Di a binary treatment. The outcome depends on treatment according toYi=αi+βiDi=α¯+β¯Di+εi,where αi and βi are random variables with means α¯ and β¯, respectively, and εi=αi-α¯+(βi-β¯)Di (cf., appendices of Card, 1999). In counterfactual notation, αi=Yi0 and βi=Yi1-Yi0, where Yi0 is the outcome that would obtain, were Di=0, and Yi1 is the outcome that would obtain, were Di=1. Eq. (1) is viewed as a structural equation, in the sense that the manner in which i is

Estimation and inference procedures

To estimate potentially discontinuous density functions, economists have used either traditional histogram techniques (DiNardo and Lee, 2004, Saez, 2002), or kernel density estimates which smooth over the point of potential discontinuity (DiNardo et al., 1996, Saez, 1999, Jacob and Lefgren, 2004). Neither procedure allows for point estimation or inference. One could estimate a kernel density function separately for points to the left and right of the point of discontinuity, but at boundaries a

Theoretical example

To motivate the potential for identification problems caused by manipulation, consider a simple labor supply model. Agents strive to maximize the present discounted value of utility from income over two periods. Each agent chooses to work full- or part-time in each period. Part-time work requires supplying a fraction fi of full-time labor supply and receiving a fraction fi of full-time income. Each worker has a different fraction fi, which is determined unilaterally by the employer prior to

Simulation evidence

Table 1 presents the results of a small simulation study on the performance of θ^ as an estimator and as part of a testing procedure. In the table, “Design I” corresponds to the data generating process underlying panel C from Fig. 2—50,000 independent draws from the N(12,3) distribution. There are 1000 replication data sets used. For each data set, I calculate θ^ using the binsize and bandwidth produced by the algorithm specified in Section 3.2 (“A. Basic, Basic”). In addition to the “basic”

Empirical example

One of the better examples of the regression discontinuity design is the incumbency study of Lee (2001). Political scientists have postulated that there is an incumbency advantage for both parties and individual candidates, whereby having won the election once makes it easier to win the election subsequently. Credibly establishing the magnitude of any incumbency advantage is challenging because of strong selection effects. Lee notes that in a two-party system with majority rule, incumbency is

Conclusion

This paper describes identification problems encountered in the regression discontinuity design pertaining to manipulation of the running variable and describes a simple test for manipulation. The test involves estimation of the discontinuity in the density function of the running variable at the cutoff. Consistency and asymptotic normality of the log discontinuity in the density at the cutoff was demonstrated theoretically, and inference procedures discussed. The methodology was applied to two

Acknowledgments

I thank two anonymous referees for comments, the editors for multiple suggestions that substantially improved the paper, Jack Porter, John DiNardo, and Serena Ng for discussion, Jonah Gelbach for computing improvements, and Ming-Yen Cheng for manuscripts. Any errors are my own.

References (50)

  • D.J. Aigner et al.

    On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function

    International Economic Review

    (1976)
  • J.D. Angrist et al.

    Identification of causal effects using instrumental variables

    Journal of the American Statistical Association

    (1996)
  • G.S. Becker

    The Economics of Discrimination

    (1957)
  • T. Bouezmarni et al.

    Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data

    Econometric Theory

    (2005)
  • D.E. Card

    The causal effect of education on earnings

  • Cheng, M.-Y., 1994. On boundary effects of smooth curve estimators (dissertation). Unpublished manuscript Series #...
  • M.-Y. Cheng

    A bandwidth selector for local linear density estimators

    Annals of Statistics

    (1997)
  • M.-Y. Cheng

    Boundary aware estimators of integrated density products

    Journal of the Royal Statistical Society, Series B

    (1997)
  • Cheng, M.-Y., Fan, J., Marron, J.S., 1993. Minimax efficiency of local polynomial fit estimators at boundaries....
  • M.-Y. Cheng et al.

    On automatic boundary corrections

    The Annals of Statistics

    (1997)
  • V. Chernozhukov et al.

    Likelihood estimation and inference in a class of nonregular econometric models

    Econometrica

    (2004)
  • C. Chu et al.

    Estimation of jump points and jump values of a density function

    Statistica Sinica

    (1996)
  • D.B. Cline et al.

    Kernel estimation of densities with discontinuities or discontinuous derivatives

    Statistics

    (1991)
  • A.C. Davison et al.

    Bootstrap Methods and their Application

    (1997)
  • A. Deaton

    The Analysis of Household Surveys: A Microeconomic Approach to Development Policy

    (1997)
  • J. DiNardo et al.

    Labor market institutions and the distribution of wages, 1973–1992: a semi-parametric approach

    Econometrica

    (1996)
  • J.E. DiNardo et al.

    Economic impacts of new unionization on private sector employers: 1984–2001

    Quarterly Journal of Economics

    (2004)
  • J. Fan et al.

    Local Polynomial Modelling and its Applications

    (1996)
  • W. Gawronski et al.

    On density estimation by means of Poisson's distribution

    Scandinavian Journal of Statistics

    (1980)
  • W. Gawronski et al.

    Smoothing histograms by means of lattice- and continuous distributions

    Metrika

    (1981)
  • Greenberg, D., 2000. Was nixon robbed? The Legend of the Stolen 1960 Presidential Election....
  • Hahn, J., Todd, P., van der Klaauw, W., 1999. Identification and estimation of treatment effects with a regression...
  • J. Hahn et al.

    Identification and estimation of treatment effects with a regression discontinuity design

    Econometrica

    (2001)
  • P. Hall

    Effect of bias estimation on coverage accuracy of bootstrap confidence intervals for a probability density

    The Annals of Statistics

    (1992)
  • P. Hall et al.

    Performance of wavelet methods for functions with many discontinuities

    Annals of Statistics

    (1996)
  • Cited by (0)

    View full text