Manipulation of the running variable in the regression discontinuity design: A density test
Introduction
One reason for the increasing popularity in economics of regression discontinuity applications is the perception that the identifying assumptions are quite weak. However, while some applications of the design can be highly persuasive, many are subject to the criticism that public knowledge of the treatment assignment rule may invalidate the continuity assumptions at the heart of identification.
Consider a hypothetical example. A doctor plans to randomly assign heart patients to a statin and a placebo to study the effect of the statin on heart attack within 10 years. The doctor randomly assigns patients to two different waiting rooms, A and B, and plans to give those in A the statin and those in B the placebo. If some of the patients learn of the planned treatment assignment mechanism, we would expect them to proceed to waiting room A. If the doctor fails to divine the patients’ contrivance and follows the original protocol, random assignment of patients to separate waiting rooms may be undone by patient sorting after random assignment. In the regression discontinuity context, an analogous evaluation problem may occur in the common case where the treatment assignment rule is public knowledge (cf., Lee, 2007).
In this paper, I propose a formal test for sorting of this type. The test is based on the intuition that, in the example above, we would expect for waiting room A to become crowded. In the regression discontinuity context, this is analogous to expecting the running variable to be discontinuous at the cutoff, with surprisingly many individuals just barely qualifying for a desirable treatment assignment and surprisingly few failing to quality. This test will be informative when manipulation of the running variable is monotonic, in a sense to be made specific below.
The proposed test is based on an estimator for the discontinuity at the cutoff in the density function of the running variable. The test is implemented as a Wald test of the null hypothesis that the discontinuity is zero. The estimator, which is a simple extension of the local linear density estimator (Cheng et al., 1997), proceeds in two steps. In the first step, one obtains a finely gridded histogram. In the second step, one smooths the histogram using local linear regression, separately on either side of the cutoff. To efficiently convey sensitivity of the discontinuity estimate to smoothing assumptions, one may augment a graphical presentation of the second-step smoother with the first-step histogram, analogous to presenting local averages along with an estimated conditional expectation.
This test complements existing specification checks in regression discontinuity applications. Authors routinely report on the smoothness of pre-determined characteristics around the cutoff (e.g., DiNardo and Lee, 2004). If the particular pre-determined characteristics the researcher has at disposal are relevant to the problem, this method should be informative about any sorting around the discontinuity. However, in some applications pre-determined characteristics are either not available, or those which are available are not relevant to the outcome under study. By way of contrast, the density test may always be conducted, since data on the running variable is required for any analysis. The method is also useful in applications where a discontinuous density function is itself the object of interest. For example, Saez, 1999, Saez, 2002 measures tax avoidance using the discontinuity in the density of income reported to the Internal Revenue Service.
To show how the estimator works in practice, I apply the methodology to two distinct settings. The first setting is popular elections to the United States House of Representatives, considered in Lee, 2001, Lee, 2007 incumbency study. In this context, it is natural to assume that the density function of the democratic vote share is continuous at 50%. The data do not reject this prediction.1 The second setting is roll call votes in the House. In this context, the vote tally for a given bill is expected to be subject to manipulation. Although the number of representatives would seem to make coordination between members difficult, these problems are overcome by a combination of the repeated game aspect of roll call votes and the fact that a representative's actual vote becomes public knowledge, enabling credible commitments and vote contracting. In this setting, the density test provides strong evidence of manipulation.
The remainder of the paper is organized as follows. Section 2 defines manipulation and distinguishes between partial and complete manipulation. Section 3 describes the estimator and discusses smoothing parameter methods and inference procedures. Section 4 motivates the manipulation problem with a hypothetical job training program. Section 5 presents the results of a small simulation study. Section 6 presents the empirical analysis, and Section 7 concludes. Appendix A gives a proof of the proposition of Section 3, and Appendix B describes the data.
Section snippets
Identification under partial and complete manipulation
Let denote an outcome and a binary treatment. The outcome depends on treatment according towhere and are random variables with means and , respectively, and (cf., appendices of Card, 1999). In counterfactual notation, and , where is the outcome that would obtain, were , and is the outcome that would obtain, were . Eq. (1) is viewed as a structural equation, in the sense that the manner in which i is
Estimation and inference procedures
To estimate potentially discontinuous density functions, economists have used either traditional histogram techniques (DiNardo and Lee, 2004, Saez, 2002), or kernel density estimates which smooth over the point of potential discontinuity (DiNardo et al., 1996, Saez, 1999, Jacob and Lefgren, 2004). Neither procedure allows for point estimation or inference. One could estimate a kernel density function separately for points to the left and right of the point of discontinuity, but at boundaries a
Theoretical example
To motivate the potential for identification problems caused by manipulation, consider a simple labor supply model. Agents strive to maximize the present discounted value of utility from income over two periods. Each agent chooses to work full- or part-time in each period. Part-time work requires supplying a fraction of full-time labor supply and receiving a fraction of full-time income. Each worker has a different fraction , which is determined unilaterally by the employer prior to
Simulation evidence
Table 1 presents the results of a small simulation study on the performance of as an estimator and as part of a testing procedure. In the table, “Design I” corresponds to the data generating process underlying panel C from Fig. 2—50,000 independent draws from the distribution. There are 1000 replication data sets used. For each data set, I calculate using the binsize and bandwidth produced by the algorithm specified in Section 3.2 (“A. Basic, Basic”). In addition to the “basic”
Empirical example
One of the better examples of the regression discontinuity design is the incumbency study of Lee (2001). Political scientists have postulated that there is an incumbency advantage for both parties and individual candidates, whereby having won the election once makes it easier to win the election subsequently. Credibly establishing the magnitude of any incumbency advantage is challenging because of strong selection effects. Lee notes that in a two-party system with majority rule, incumbency is
Conclusion
This paper describes identification problems encountered in the regression discontinuity design pertaining to manipulation of the running variable and describes a simple test for manipulation. The test involves estimation of the discontinuity in the density function of the running variable at the cutoff. Consistency and asymptotic normality of the log discontinuity in the density at the cutoff was demonstrated theoretically, and inference procedures discussed. The methodology was applied to two
Acknowledgments
I thank two anonymous referees for comments, the editors for multiple suggestions that substantially improved the paper, Jack Porter, John DiNardo, and Serena Ng for discussion, Jonah Gelbach for computing improvements, and Ming-Yen Cheng for manuscripts. Any errors are my own.
References (50)
- et al.
On the estimation of production frontiers: maximum likelihood estimation of the parameters of a discontinuous density function
International Economic Review
(1976) - et al.
Identification of causal effects using instrumental variables
Journal of the American Statistical Association
(1996) The Economics of Discrimination
(1957)- et al.
Consistency of asymmetric kernel density estimators and smoothed histograms with application to income data
Econometric Theory
(2005) The causal effect of education on earnings
- Cheng, M.-Y., 1994. On boundary effects of smooth curve estimators (dissertation). Unpublished manuscript Series #...
A bandwidth selector for local linear density estimators
Annals of Statistics
(1997)Boundary aware estimators of integrated density products
Journal of the Royal Statistical Society, Series B
(1997)- Cheng, M.-Y., Fan, J., Marron, J.S., 1993. Minimax efficiency of local polynomial fit estimators at boundaries....
- et al.
On automatic boundary corrections
The Annals of Statistics
(1997)