Download the notebook here! Interactive online version: binder

Self-selection, heterogeneity, and causal graphs

Introduction

Alternatives to back-door identification

The next chapters deal with:

  • instrumental variables

  • front-door identification with causal mechanisms

  • conditioning estimators using pretreatment variables

Why do we need to consider alternatives?

\(\rightarrow\)selection on unobservables / nonignorability of treatment

What makes an unobservable?

  • simple confounding, stable unobserved common cause of treatment and outcome variable

  • subtle confounding, direct self-selection into the treatment based on accurate perceptions of the individual level treatment effect

Selection on unobservables as a combination of two features:

  • treatment effect heterogeneity

  • self-selection

Nonignorability and selection on the unobservables

Selection on observables

80c6e211eb454478b5366aa700217063

Selection on unobservables

2a2c609f51b7459f85ef9de63f9333e5

Selection on the unobservables and the utility of additional posttreatment measures of the outcome

Catholic school example

  • claim that Catholic schools are more effective than public schools in teaching mathematics and reading to equivalent High School students.

  • conditioning on family background and motivation to learn

  • those enrolling into Catholic school have the most to gain from doing so

Notation

  • \(Y_{10}\), observed score on standardized achievement test given in tenth grade

  • \(D\) causal variable taking value one if student attends Catholic school

  • \(U\) unobserved motivation to learn, differences in home environment, anticipation of causal effect itself

  • \(X\) determinants of achievement tests that have no direct causal effect on school sector or selection

  • \(O\) ultimate background variables that affect all other variables in graph

We proceed in two steps:

  • assess identification for given directed graphs

  • examine structure of directed graph itself

0471d8eae0c34ed4a74361d472d7163c

We cannot identify the causal effect of \(D\) on \(Y_{10}\) in subfigure (a) but in subfigure (b). However, at what cost?

\(Y_{10}\) blocks all back-door paths, however it does not satisfy the Condition 2 of the back-door criterion. As such, it adjusts away some of the total causal effect of \(D\) on \(Y_{12}\).

Let \(E\) denote an student’s ability for test taking and allow for the direct effect of \(U\) on bow both achievement scores. Then maybe this is a more complete picture?

c4e9c1a8e98d4618a07cb690378ce93f

Back-door adjustment by \(Y_{10}\) ineffective again after revisiting economic implications of the imposed graph. In fact, \(Y_{10}\) is now a collider variable that induces a noncausal dependence.

Panel Data Demonstration

The motivation behind this example is simply to show that we cannot learn anything about the underlying causal effect with the conventional strategies and how we model self-selection in the data generating process.

[2]:
def get_propensity_score(o, u):
    """Get the propensity score."""
    level = -3.8 + o + u
    return np.exp(level) / (1 + np.exp(level))


def get_treatment_status(o, u):
    """Sampling treatment status"""
    # Following the causal graph, the treatment indicator is only a function
    # of the background characteristics O and the unobservable U.
    p = get_propensity_score(o, u)
    return np.random.choice([1, 0], p=[p, 1 - p])


def get_covariates():
    """Get covariates."""
    o, e = np.random.normal(size=2)
    x, u = o + np.random.normal(size=2)
    return o, u, x, e
[3]:
def get_potential_outcomes(grade, o, x, e, u, scenario=0, selection=False):
    """Get potential outcomes.

    Sampling of potential outcome of an individual for the panel data demonstration.

    Args:
        grade: an integer for the grade the individual is in.
        o, x, e : floats of observable characteristics.
        u: a float of unobservable characteristic.
        scenario: an integer for the scenario: (0) no role for E, (1) role for E.
        selection: a boolean indicating whether there is selection on unobservables.

    Returns:
        A tuple of potential outcomes (Y_0, Y_1).

    """
    # We want to make sure we only pass in valid input.
    assert scenario in range(2)
    assert selection in [True, False]
    assert grade in [10, 11, 12]

    # There is a natural progression in test scores.
    level = dict()
    level[10] = 100
    level[11] = 101
    level[12] = 102

    if scenario == 0:
        y_0 = level[grade] + o + u + x + np.random.normal()
    elif scenario == 1:
        y_0 = level[grade] + o + u + x + e + np.random.normal()
    else:
        raise NotImplementedError

    # Sampling of treatment effects. The key difference for selection on unobservables is in how
    # the overall treatment effect depends on the  unobservable U that also affects the choice
    # probability. This was the major criticism of Coleman's work.
    delta_1 = np.random.normal(loc=10, scale=1)
    if selection:
        delta_2 = np.random.normal(loc=u)
    else:
        delta_2 = np.random.normal()

    if grade == 10:
        y_1 = y_0 + delta_1 + delta_2
    elif grade == 11:
        y_1 = y_0 + (1 + delta_1) + delta_2
    elif grade == 12:
        y_1 = y_0 + (2 + delta_1) + delta_2

    return y_0, y_1
[4]:
def get_sample_panel_demonstration(num_agents=1000, scenario=0, selection=False, seed=123):
    """Get sample for demonstration.

    Create a random sample for the demonstration of the usefulness of (or lack thereof) of having
    additional posttreatment measures of the outcome.

    Args:
        num_agents: an integer for the number of agents in the sample
        scenario: an integer that indicates whether to include E as a determinant of test scores
        selection: a boolean variable indicating whether selection on unobservables is an issue
        seed: an integer setting the random seed

    Returns:
        A dataframe with the simulated sample.

    """
    # We first initialize an empty DataFrame that holds the information for each individual
    # and each time period.
    columns = ["Y", "D", "O", "X", "E", "U", "Y_1", "Y_0"]
    index = product(range(num_agents), [10, 11, 12])
    index = pd.MultiIndex.from_tuples(index, names=("Identifier", "Grade"))
    df = pd.DataFrame(columns=columns, index=index)

    # Now we are ready to simulate the sample with the desired characteristics.
    np.random.seed(seed)
    for i in range(num_agents):

        o, u, x, e = get_covariates()
        d = get_treatment_status(o, u)
        for grade in [10, 11, 12]:
            y_0, y_1 = get_potential_outcomes(grade, o, x, e, u, scenario, selection)
            y = d * y_1 + (1 - d) * y_0
            df.loc[(i, grade), :] = [y, d, o, x, e, u, y_1, y_0]

    # Finally some type definitions for pretty output.
    df = df.astype(np.float)
    df = df.astype({"D": np.int})

    return df
[5]:
num_agents, scenario, selection = 1000, 0, False
df = get_sample_panel_demonstration(num_agents, scenario, selection)
df.head()
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:36: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:37: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
[5]:
Y D O X E U Y_1 Y_0
Identifier Grade
0 10 95.841898 0 -1.085631 -0.802652 0.997345 -2.591925 105.586179 95.841898
11 98.499140 0 -1.085631 -0.802652 0.997345 -2.591925 106.765876 98.499140
12 97.072351 0 -1.085631 -0.802652 0.997345 -2.591925 110.597380 97.072351
1 10 97.544210 0 -0.619191 -0.042445 -0.769433 -0.492665 108.934451 97.544210
11 100.583068 0 -0.619191 -0.042445 -0.769433 -0.492665 112.137966 100.583068

What is the average treatment effect and how does it depend on the presence of selection?

[6]:
num_agents, scenario = 1000, 0

# This setup allows to freeze some arguments of the function
# that do not change during the analysis.
from functools import partial  # noqa: E402

simulate_sample = partial(get_sample_panel_demonstration, num_agents, scenario)

for selection in [False, True]:
    print(f" Selection {selection}")
    df = simulate_sample(selection)
    for grade in [10, 12]:
        df_grade = df.loc[(slice(None), grade), :]
        stat = (df_grade["Y_1"] - df_grade["Y_0"]).mean()
        print(f" Grade {grade}:  ATE {stat:5.3f}")
    print("\n")
 Selection False
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:36: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:37: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
 Grade 10:  ATE 10.090
 Grade 12:  ATE 12.014


 Selection True
 Grade 10:  ATE 10.116
 Grade 12:  ATE 12.039


The average treatment effects are unaffected by selection. But how does the picture change when we look at subsets of the population?

[7]:
for selection in [False, True]:
    print(f" Selection {selection}")
    df = simulate_sample(selection)
    for grade in [10, 12]:
        subset = df.loc[(slice(None), grade), :]

        stat = list()
        for status in range(2):
            df_status = subset.query(f"D == {status}")
            stat.append((df_status["Y_1"] - df_status["Y_0"]).mean())

        print(" Grade {:}:  ATC {:7.3f}   ATT {:7.3f}".format(grade, *stat))
    print("\n")
 Selection False
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:36: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:37: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
 Grade 10:  ATC  10.098   ATT  10.012
 Grade 12:  ATC  12.001   ATT  12.134


 Selection True
 Grade 10:  ATC   9.958   ATT  11.655
 Grade 12:  ATC  11.861   ATT  13.776


/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:36: DeprecationWarning: `np.float` is a deprecated alias for the builtin `float`. To silence this warning, use `float` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.float64` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
/home/sebastian/anaconda3/envs/grmpy/lib/python3.7/site-packages/ipykernel_launcher.py:37: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
[8]:
for grade in [10, 12]:
    for model in ["Y ~ D", "Y ~ D + X + O"]:
        df_grade = df.loc[(slice(None), grade), :]
        rslt = smf.ols(formula=model, data=df_grade).fit()
        stat = rslt.params["D"]
        print("Grade: {}  Model: {:}".format(*[grade, model]))
        print("   Estimated Treatment Effect: {:5.3f}\n".format(stat))
Grade: 10  Model: Y ~ D
   Estimated Treatment Effect: 15.767

Grade: 10  Model: Y ~ D + X + O
   Estimated Treatment Effect: 12.181

Grade: 12  Model: Y ~ D
   Estimated Treatment Effect: 17.849

Grade: 12  Model: Y ~ D + X + O
   Estimated Treatment Effect: 14.331

None of the estimates come even close to our parameters of interest.

Causal graphs for complex patterns of self-selection

We want to make sure that complex patterns of self-selection can be represented by directed graphs.

Separate graphs for separate latent classes

Groups

  • \(G=1\), selection of schools mainly for lifestyle reasons, proximity to home and taste for school cultures

  • \(G=2\), selection of schools to maximize expected achievement

76fef7e346084322b8512f78425f593e

What are the economic mechanisms are represented by each of the arrows? Why would we expect them to differ across the two groups?

  • families of the second group are more likely to send their children to charter schools \(d_2 > d_1\)

  • parents with higher levels of education are more likely to send their children to charter schools as they value distinctive forms of education \(\alpha_1, \alpha_2 > 0\) and are able to support their children with their homework \(\beta_1, \beta_2 > 0\).

  • existing research suggests \(\delta_1, \delta_2 > 0\) and \(\delta_2 > \delta_1\) as families in second group put more effort in matching their children to schools

What happens if we block the back-door path by conditioning in \(P\) but ignore the existence of two latent classes? If \(P\) is associated with latent class membership, then we do not properly weigh the stratum-specific treatment effects as there is heterogeneity within strata.

A single graph that represents all latent classes

We now let \(G\) capture effect heterogeneity.

39b421fcc0a24f4fa2726beca944a2b4

We outline more and more elaborate ideas about the economic mechanisms that determine class membership and how they modify the structure of the causal graph.

  • no arrow from :math:`P` to :math:`G` implies that students with students who have higher levels of education are no more likely to know of the educational policy dialogue hat claims that charter schools have advantages.

  • no arrow from :math:`G` to :math:`Y` implies that there is in fact (on average) no treatment effect heterogeneity.

82ef794abb5947eb8acc0dbc01952e91

Self-selection into the latent class

We now elaborate on the mechanism that determines class membership. We assume that \(G\) is at least in part determined by a variable that measures a family’s subjective expectations of their child’s likely benefit from attending a charter school instead of a regular school.

1c1c2006c7a74b35bd1969e8ed3f6aee

However, these expectations are potentially based on access to information that is often related to parental background.