Download the notebook here! Interactive online version: binder

Conditioning estimators

Introduction

Approaches to the estimation of causal effects

  • conditioning on variable that block all back-door paths from the causal variable to the outcome variable

  • using exogenous variation in an appropriate instrumental variable to isolate covariation in the causal variable and the outcome variable

  • establishing the exhaustive and isolated mechanism that intercepts the effect of the causal variable on the outcome variable and then calculating the causal effect as it propagates through the mechanisms

Conditioning and directed graphs

2ec4aecc455145be969cd7c6d6692477

This graph is an example where a simple mean-comparison between the treated and untreated is not informative on the effect of the treatment.

  • The total association between \(D\) and \(Y\) is an unknown composite of the true causal effect \(D \rightarrow Y\) and the noncausal association between \(D\) and \(Y\).

Conditioning strategies

  • balancing the determinants of treatment assignment (e.g. matching estimators)

  • adjusting for all other causes of the outcome (e.g. regression estimators)

Back-door path

A back-door path is a path between any causally ordered sequence of two variables that begins with a directed edge that points to the first variable. In the example above, we have two paths: (1) \(D \rightarrow Y\), and (2) \(D\leftarrow C \rightarrow O \rightarrow Y\). The former is a causal path, while the latter is a back-door path.

LaLonde dataset

What was the graph behind our analysis of the Lalonde dataset?

f42d4135266544a9861973c4196215b7

Illustration of collider variables

We introduced collider variables earlier. However, they will play a very important role going forward as conditioning on a collider variable that lies along an back-door path does not help to block that path, but instead creates new associations. Thus, we initially study in an illustration how conditioning on a collider induces a conditional association between two variables without an unconditional association.

17000aed5b1a4f4da2badce50462adf5

[2]:
num_individuals = 250

# Initialize empty data frame
columns = ["SAT", "motivation", "admission"]
df = pd.DataFrame(columns=columns, index=range(num_individuals))

df["motivation"] = np.random.normal(size=num_individuals)
df["SAT"] = np.random.normal(size=num_individuals)

# Both toghether determine college admission
score = df["motivation"] + df["SAT"]
cutoff = np.percentile(df["motivation"] + df["SAT"], 85)
df["admission"] = score > cutoff
df.head()
[2]:
SAT motivation admission
0 -1.030933 1.127653 False
1 0.163968 1.051246 False
2 0.446524 -0.906215 False
3 0.441111 1.977908 True
4 1.190139 0.551189 True
[3]:
def get_joint_distribution(df):
    sns.jointplot(x="SAT", y="motivation", data=df)

    stat = stats.pearsonr(df["SAT"], df["motivation"])[0]
    print(f"The Pearson correlation coefficient is {stat:7.3f}")


get_joint_distribution(df)
The Pearson correlation coefficient is   0.023
../../_images/lectures_conditioning-estimators_notebook_9_1.png

What happens if we condition on college admittance \(C\), i.e. on a collider variable?

[4]:
get_joint_distribution(df.query("admission == True"))
The Pearson correlation coefficient is  -0.836
../../_images/lectures_conditioning-estimators_notebook_11_1.png

Conditioning on a collider variable that lies along a back-door path does not help to block the back-door path but instead creates new associations.

The back-door criterion

The back-door criterion allows to determine the whether or not conditioning on a given set of observed variables will identify the causal effect of interest.

  • Step 1 Write down the back-door paths from the causal variable to the outcome variable, determine which ones are unblocked, and then search for a candidate conditioning set of observed variables that will block all unblocked back-door paths.

  • Step 2 If a candidate conditioning set is found that blocks all back-door paths, inspect the patterns of decent in the graph in order to verify that the variables in the candidate conditioning set do not block or otherwise adjust away any portion of the causal effect of interest.

If one or more back-door paths connect the causal variable to the outcome variable, the causal effect is identified by conditioning on a set of variables \(Z\) if

Condition 1 All back-door paths between the causal variable and the outcome variable are blocked after conditioning on \(Z\), which will always be the case if each back-door path

  • contains a chain of mediation \(A\rightarrow C \rightarrow B\) where the middle variable \(C\) is in \(Z\)

  • contains a fork of mutual dependence \(A \leftarrow C \rightarrow B\), where the middle variable \(C\) is in \(Z\)

  • contains an inverted fork of mutual causation \(A \rightarrow C \leftarrow B\), where the middle variable \(C\) and all of \(C\)’s decendents are not in \(Z\)

and …

Condition 2 No variables in \(Z\) are decendents of the causal variable that lie on (or decend from other variables that lie on) any of the directed paths that begin at the causal variable and reach the outcome variable.

Let’s revisit our example earlier and test our vocabulary.

7a93ca69383846cfb65c5d26b19b4f2d

We have a chain of mediation from \(C \rightarrow O \rightarrow Y\) and a fork of mutual dependence with \(D \leftarrow C \rightarrow O\).

We will now work through two more advanced examples where we focus on only the first conditions of the back-door criterion. Let’s start with a simple example and apply the idea of back-door identification to a graph where we consider conditioning on a lagged outcome variable \(Y_{t -1}\).

993a01bbe0a54015b8c4c90b6ceb850c

There exist two back-door paths and \(Y_{t - 1}\) lies on both of them. However, conditioning on it does not satisfy the back-door criterion. It blocks one path. \(Y_{t - 1}\) is a collider variable on one of the paths.

Let us practice our understanding for some interesting graph structures. The backdoor algorithm is also available here for your reference.

Let’s study the following causal graph:

354e485cfcf746be91fc930c31f10937

Consider the following three candidate conditioning sets. Any thoughts?

  • \(\{F\}\)

  • \(\{A\}\)

  • \(\{A, B\}\)

Finally, let’s focus on the second condition.

  • Condition 2 No variables in \(Z\) are decendents of the causal variable that lie on (or decend from other variables that lie on) any of the directed paths that begin at the causal variable and reach the outcome variable.

We first look at a graph that illustrates what a descendent is and remind ourselves of the difference between a direct and an indirect effect.

e525afbfca45450fab009cdd4a7dceaf

Conditioning on \(N\) (in addition to either \(C\) or \(O\)) does not satisfy the back-door criterion due to its violation of the second condition.

How about this causal structure:

5dc85a9469f549bcbcd3b13a5bf80bb9

Let’s evaluate the candidate conditioning set \(\{O, B\}\) together.

By now you probably recognized the mechanical nature of checking the back-door criterion for a given causal graph. Here are some automated tools to make your life easier in the future, but also allow you to practice your own understanding.