Elsevier

Journal of Econometrics

Volume 142, Issue 2, February 2008, Pages 655-674
Journal of Econometrics

Regression discontinuity inference with specification error

https://doi.org/10.1016/j.jeconom.2007.05.003Get rights and content

Abstract

A regression discontinuity (RD) research design is appropriate for program evaluation problems in which treatment status (or the probability of treatment) depends on whether an observed covariate exceeds a fixed threshold. In many applications the treatment-determining covariate is discrete. This makes it impossible to compare outcomes for observations “just above” and “just below” the treatment threshold, and requires the researcher to choose a functional form for the relationship between the treatment variable and the outcomes of interest. We propose a simple econometric procedure to account for uncertainty in the choice of functional form for RD designs with discrete support. In particular, we model deviations of the true regression function from a given approximating function—the specification errors—as random. Conventional standard errors ignore the group structure induced by specification errors and tend to overstate the precision of the estimated program impacts. The proposed inference procedure that allows for specification error also has a natural interpretation within a Bayesian framework.

Introduction

In the classic regression-discontinuity (RD) design (Thistlethwaite and Campbell, 1960) the treatment status of an observation is determined by whether an observed covariate is above or below a known threshold. If the covariate is predetermined it may be plausible to think that treatment status is “as good as randomly assigned” among the subsample of observations that fall just above and just below the threshold.1 As in a true experiment, no functional form assumptions are necessary to estimate program impacts when the treatment-determining covariate is continuous: one simply compares average outcomes in small neighborhoods on either side of the threshold. The width of these neighborhoods can be made arbitrarily small as the sample size grows, ensuring that observed and unobserved characteristics of observations in the treatment and control groups are identical in the limit. This idea underlies the approach of Hahn et al. (2001) and Porter (2003), who describe non-parametric and semi-parametric estimators of RD gaps.

In many applications where the RD design seems compelling, however, the covariate that determines treatment is inherently discrete or is only reported in coarse intervals. For example, government programs like Medicare and Medicaid have sharp age-related eligibility rules that lend themselves to an RD framework, but in most data sets age is only recorded in months or years. In the discrete case it is no longer possible to compute averages within arbitrarily small neighborhoods of the cutoff point, even with an infinite amount of data. Instead, researchers have to choose a particular functional form for the model relating the outcomes of interest to the treatment-determining variable. Indeed, with an irreducible gap between the “control” observations just below the threshold and the “treatment” observations just above, the causal effect of the program is not even identified in the absence of a parametric assumption about this function.

In this paper we propose a simple procedure for inference in RD designs in which the treatment-determining covariate is discrete. The basic idea is to model the deviation between the expected value of the outcome and the predicted value from a given functional form as a random specification error. Modeling potential specification error in this way has a number of immediate implications. Most importantly, it introduces a common component of variance for all the observations at any given value of the treatment-determining covariate. This creates a problem similar to the one analyzed by Moulton (1990) for multi-level models in which some of the covariates are only measured at a higher level of aggregation (e.g., micro models with state-level covariates). Random specification errors can be easily incorporated in inference by constructing sampling errors that include a grouped error component for different values of the treatment-determining covariate. The use of “clustered” standard errors will generally lead to wider confidence intervals that reflect the imperfect fit of the parametric function away from the discontinuity point.

More subtly, inference in an RD design involves extrapolation from observations below the threshold to construct a counterfactual for observations above the threshold. As in a classic out-of-sample forecasting problem, the sampling error of the counterfactual prediction for the point of support just beyond the threshold includes a term reflecting the expected contribution of the specification error at that point. Since the estimated (local) treatment effect is just the difference between the mean outcome for these observations and the counterfactual prediction, the precision of the estimated treatment effect depends on whether one assumes that the same specification error would prevail in the counterfactual world. If so, this error component vanishes. If not, the confidence interval for the local treatment effect has to be widened even further.

The paper is organized as follows. Section 2 describes the RD framework and why discreteness in the treatment-determining covariate implies that the treatment effect is not identified without assuming a parametric functional form. Section 3 describes the proposed inference procedure under a model where specification errors are considered random. Section 4 describes a modified procedure under less restrictive assumptions about the specification errors. Section 5 proposes an alternative, efficient estimator for the treatment effect, and Section 6 relates this estimator to a Bayesian approach. Section 7 concludes.

Section snippets

The problem of discreteness

To illustrate how discreteness causes problems for identification in an RD framework, consider the following potential outcomes formulation.2 There is a binary indicator D of treatment status which is determined by whether an observed covariate X is above or below a known threshold x0: D=1[Xx0]. Let Y1 represent the potential outcome if an observation receives

Random specification error

Suppose a polynomial is chosen to approximate h(·). The regression in Eq. (2) can be re-written asYij=α0+Djβ0+Xjγ0+aj+ɛij,where Xj is a row vector of polynomial terms in xj (with the normalization xk=0), and ajh(xj)-Xjγ0 is specification error—the degree to which the true function h(·) deviates from the polynomial function.5

Mis-specification of counterfactual functions

In this section, we show that the special structure of an RD design implies that in some circumstances, the clustered standard errors may still understate the variability of β^. If the specification error is random, then it is necessary to decide how the error in estimating E[Y1|X=xk] is related to the specification error in estimating E[Y0|X=xk]. As shown below, if the errors are assumed to be identical, then the approach described above is appropriate. If the errors are independent, then the

Efficient estimation

When the specification errors a1j and a0j are assumed to be different, there is an estimator for E[Y1-Y0|X=0] that is more efficient than the OLS estimator β^. This is because the least squares estimate of β0 amounts to the difference between the prediction for E[Y1|X=0] and the prediction for E[Y0|X=0], using data away from the discontinuity threshold. While it is necessary to make such an extrapolation for E[Y0|X=0] (since this quantitity is unobservable), information on E[Y1|X=0] is available

Relation to Bayesian estimation

There is a close connection to the proposed estimator β*^ and a Bayesian approach to the problem. Specifically, the confidence intervals proposed above can be interpreted as Bayesian posterior intervals.

For example, note that (14) can be re-written as β*=[λY¯k+(1-λ)(α^+β^)]-α^.The expression in brackets can be viewed as an estimate of E[Y1|X=0]—a λ-weighted average of the kth cell mean and the predicted value from the regression—and the term α^ as an estimate of E[Y0|X=0].

Consider a simple

Summary

This paper draws attention to functional form issues in the estimation of RD designs when the index variable determining treatment, X, has discrete support. In the discrete case, the conditions for non-parametric or semi-parametric methods are not satisfied; indeed, the treatment effect is not non-parametrically identified. Our goal is to formally incorporate uncertainty in the necessary parametric modeling of the underlying RD function.

We have proposed a procedure for inference that explicitly

Acknowledgments

We are grateful to Guido Imbens and Thomas Lemieux for helpful suggestions, and to Michael Jansson, James Powell, Keisuke Hirano, Bill Evans, and participants in the 2003 Banff International Research Station Regression Discontinuity Conference for helpful discussions and suggestions.

References (15)

  • J. Angrist et al.

    Empirical strategies in labor economics

  • J. Angrist et al.

    Using Maimonides’ rule to estimate the effect of class size on scholastic achievement

    Quarterly Journal of Economics

    (1998)
  • R.L. Brown et al.

    Techniques for testing for the constancy of regression relationships over time (with discussion)

    Journal of the Royal Statistical Society B

    (1975)
  • D. Card et al.

    Using discontinuous eligibility rules to identify the effects of the federal medicaid expansions on low income children

    Review of Economics and Statistics

    (2004)
  • G. Chamberlain

    Quantile regression, censoring, and the structure of wages

  • J. DiNardo et al.

    Economic impacts of new unionization on private sector employers: 1984–2001

    Quarterly Journal of Economics

    (2004)
  • J. Hahn et al.

    Identification and estimation of treatment effects with a regression–discontinuity design

    Econometrica

    (2001)
There are more references available in the full text version of this article.

Cited by (648)

  • Fertility responses to cash transfers in Uruguay

    2024, World Development Perspectives
  • ESG and CEO turnover around the world

    2024, Journal of Corporate Finance
View all citing articles on Scopus
View full text