Download the notebook here! Interactive online version: binder

Repeated observations and the estimation of causal effects

Overview

  • Interrupted time series models

  • Regression discontinuity design

  • Panel data

    • Traditional adjustment strategies

    • Model-based approaches

Interrupted time series models (ITS)

\begin{align*} Y_t = f(t) + D_t b + e_t \end{align*}
  1. before the treatment is introduced (for \(t \le t^*\)), \(D_t = 0\) and \(Y_t = Y^0_t\)

  2. after the treatment is in place (from \(t^*\) through \(T\)), \(D_t = 1\) and \(Y_t = Y^1_t\)

The causal effect of the treatment is then \(\delta_t = Y_t^1 - Y^0_t\) for time periods \(t^*\) through \(T\). This is equal to \(\delta_t = Y_t - Y^0_t\). The crucial assumption is that the obseved values of \(y_t\) before \(t^*\) can be used to speciy \(f(t)\) for all time periods, including those after treatment.

Operation Ceasefire involved meetings with gang-involved youth who were engaged in gang conflict. Gang members were offered educational, employment, and other social services if they committed to refraining from gang-related deviance.

3f0892c5683c4b28a9c1a0c37345cb4f

Strategies to strengthen ITS analysis

  • Assess the effect of the cause on multiple outcomes that should be be affected by the cause.

  • Assess the effect of the cause on outcomes that should not be affected by the cause.

  • Assess the effect of the cause withing subgroups across which the causal effect should vary in predictable ways.

  • Adjust for trends in other variables that may affect or be related to the underlying time series of interest.

  • Assess the impact of the termination of th cause in addition to its initiation.

Panel data

We now need to add a time dimension to our effect analysis, i.e. \(Y^d_t\) for \(d = 0, 1\).

Seminal paper

  • Card and Krueger (1995, 2000)

We briefly discuss the exposition from Angrist & Pischke (2008).

98cab5be3bd54e4a8a7a4432e09804a6

We are interested in

\begin{align*} E[{Y_1}^1 - {Y_1}^0 | D = 1] = E[{Y_1}^1 | D = 1] - \underbrace{E[{Y_1}^0 | D = 1]}_{\text{counterfactual}} \end{align*}

assuming common trend

\begin{align*}\begin{array}{ll} E[{Y_1}^0 - {Y_0}^0 | D = 1] &= E[{Y_1}^0 - {Y_0}^0 | D = 0] \\ E[{Y_1}^0 | D = 1] &= E[{Y_1}^0 - {Y_0}^0 | D = 0] + E[{Y_0}^0 | D = 1] \\[20pt] E[{Y_1}^1 - {Y_1}^0 | D = 1] &= E[{Y_1}^1 | D = 1] - E[{Y_1}^0 | D = 0] + E[{Y_0}^0 | D = 0] - E[{Y_0}^0 | D = 1] \end{array}\end{align*}

moving to observed outcomes where T indicates period in conditioning set.

\begin{align*} E[{Y_1}^1 - {Y_1}^0 | D = 1] & = E[Y | D = 1, T = 1] - E[Y | D = 1, T = 0]\\ & - (E[Y | D = 0, T = 1] - E[Y | D = 0, T = 0])() \end{align*}

We can now map these observed objects to Table 5.2.

\begin{align*} E[Y | D = 1, T = 1] & = 21.03 \\ E[Y | D = 1, T = 0] & = 20.44 \\ E[Y | D = 0, T = 1] & = 21.17 \\ E[Y | D = 0, T = 0] & = 23.33 \\ \end{align*}

Demonstration

We consider how alterantive estimators perform assuming a world where:

  • no catholic elementary schools or middle schools exist

  • all students consider entering either public or Cathlic high schools after end of eight grade

  • pretretment achievement test score is available for the eights grade

\begin{align*} \text{difference-in-difference} &\qquad Y_{i10} - Y_{i8} = a + D_i^* c + e_i \\ \end{align*}

43bde2833e8644548aba5fc9317c8c3a

Control outcomes

\begin{align*} Y^0_{i8} & = 98 + O_i + U_i + X_i + E_i + \nu^0_{i8} \\ Y^0_{i9} & = 99 + O_i + U_i + X_i + E_i + \nu^0_{i9} \\ Y^0_{i10} & = 100 + O_i + U_i + X_i + E_i + \nu^0_{i10} \\ \end{align*}

There is a linear time trend for \(Y^0_{it}\) but we will also consider a diverging trend scenario.

Treated outcomes

\begin{align*} Y^1_{i9} & = Y^0_{i9} + \delta_i^\prime + \delta_i^{\prime\prime}\\ Y^1_{i10} & = Y^0_{i10} + (1 + \delta_i^\prime) + \delta_i^{\prime\prime}\\ \end{align*}

The treatment effect increases in time.

Treatment selection

\begin{align*} \text{baseline}\qquad & Pr[D^*_i = 1 \mid O_i, U_i] = \frac{exp(- 3.8 + O_i + U_i)}{1 + exp(- 3.8 + O_i + U_i)} \\ \text{self-selection on gains}\qquad & Pr[D^*_i = 1 \mid O_i, U_i] = \frac{exp(- 7.3 + O_i + U_i +5 \delta^{\prime\prime})}{1 + exp(- 7.3 + O_i + U_i + 5 \delta^{\prime\prime})} \\ \text{self-selection on pretest}\qquad & Pr[D^*_i = 1 \mid O_i, U_i] = \frac{exp(- 7.3 + O_i + U_i + k(Y_{i8} - E[Y_{i8}]))}{1 + exp(- 7.3 + O_i + U_i + k(Y_{i8} - E[Y_{i8}]))} \end{align*}

Why is the average control outcome higher among the (eventually) treated?

[8]:
num_agents, selection, trajectory = 10, "baseline", "parallel"
df = get_sample_panel_demonstration(num_agents, selection, trajectory)
df.groupby(["D_ever", "Grade"])["Y"].mean()
[8]:
D_ever  Grade
0       8              NaN
        9        97.858309
        10       98.398170
Name: Y, dtype: float64

How do our standard estimators perform in these setting?

[10]:
for selection in [
    "baseline",
    "self-selection on gains",
    "self-selection on pretest",
]:
    for trajectory in ["parallel", "divergent"]:
        print("\n Selection: {:}, Trajectory: {:}".format(selection, trajectory))
        num_agents, selection, trajectory = 1000, selection, trajectory
        df = get_sample_panel_demonstration(num_agents, selection, trajectory)
        for estimator in ["naive", "diff"]:
            rslt = get_panel_estimates(estimator, df)
            print("{:10}: {:5.3f}".format(estimator, rslt.params["D"]))

 Selection: baseline, Trajectory: parallel
naive     : 15.278
diff      : 9.416

 Selection: baseline, Trajectory: divergent
naive     : 15.363
diff      : 9.774

 Selection: self-selection on gains, Trajectory: parallel
naive     : 14.151
diff      : 11.358

 Selection: self-selection on gains, Trajectory: divergent
naive     : 15.986
diff      : 12.460

 Selection: self-selection on pretest, Trajectory: parallel
naive     : 14.082
diff      : 8.971

 Selection: self-selection on pretest, Trajectory: divergent
naive     : 16.011
diff      : 10.543

a09ea2da29bc44e5ac177a0653c50e6f

Resources