Download the notebook here
!
Interactive online version:
Conditioning estimators
Introduction
Approaches to the estimation of causal effects
conditioning on variable that block all back-door paths from the causal variable to the outcome variable
using exogenous variation in an appropriate instrumental variable to isolate covariation in the causal variable and the outcome variable
establishing the exhaustive and isolated mechanism that intercepts the effect of the causal variable on the outcome variable and then calculating the causal effect as it propagates through the mechanisms
Conditioning and directed graphs
This graph is an example where a simple mean-comparison between the treated and untreated is not informative on the effect of the treatment.
The total association between \(D\) and \(Y\) is an unknown composite of the true causal effect \(D \rightarrow Y\) and the noncausal association between \(D\) and \(Y\).
Conditioning strategies
balancing the determinants of treatment assignment (e.g. matching estimators)
adjusting for all other causes of the outcome (e.g. regression estimators)
Back-door path
A back-door path is a path between any causally ordered sequence of two variables that begins with a directed edge that points to the first variable. In the example above, we have two paths: (1) \(D \rightarrow Y\), and (2) \(D\leftarrow C \rightarrow O \rightarrow Y\). The former is a causal path, while the latter is a back-door path.
LaLonde dataset
What was the graph behind our analysis of the Lalonde dataset?
Illustration of collider variables
We introduced collider variables earlier. However, they will play a very important role going forward as conditioning on a collider variable that lies along an back-door path does not help to block that path, but instead creates new associations. Thus, we initially study in an illustration how conditioning on a collider induces a conditional association between two variables without an unconditional association.
[2]:
num_individuals = 250
# Initialize empty data frame
columns = ["SAT", "motivation", "admission"]
df = pd.DataFrame(columns=columns, index=range(num_individuals))
df["motivation"] = np.random.normal(size=num_individuals)
df["SAT"] = np.random.normal(size=num_individuals)
# Both toghether determine college admission
score = df["motivation"] + df["SAT"]
cutoff = np.percentile(df["motivation"] + df["SAT"], 85)
df["admission"] = score > cutoff
df.head()
[2]:
SAT | motivation | admission | |
---|---|---|---|
0 | -1.030933 | 1.127653 | False |
1 | 0.163968 | 1.051246 | False |
2 | 0.446524 | -0.906215 | False |
3 | 0.441111 | 1.977908 | True |
4 | 1.190139 | 0.551189 | True |
[3]:
def get_joint_distribution(df):
sns.jointplot(x="SAT", y="motivation", data=df)
stat = stats.pearsonr(df["SAT"], df["motivation"])[0]
print(f"The Pearson correlation coefficient is {stat:7.3f}")
get_joint_distribution(df)
The Pearson correlation coefficient is 0.023
What happens if we condition on college admittance \(C\), i.e. on a collider variable?
[4]:
get_joint_distribution(df.query("admission == True"))
The Pearson correlation coefficient is -0.836
Conditioning on a collider variable that lies along a back-door path does not help to block the back-door path but instead creates new associations.
The back-door criterion
The back-door criterion allows to determine the whether or not conditioning on a given set of observed variables will identify the causal effect of interest.
Step 1 Write down the back-door paths from the causal variable to the outcome variable, determine which ones are unblocked, and then search for a candidate conditioning set of observed variables that will block all unblocked back-door paths.
Step 2 If a candidate conditioning set is found that blocks all back-door paths, inspect the patterns of decent in the graph in order to verify that the variables in the candidate conditioning set do not block or otherwise adjust away any portion of the causal effect of interest.
If one or more back-door paths connect the causal variable to the outcome variable, the causal effect is identified by conditioning on a set of variables \(Z\) if
Condition 1 All back-door paths between the causal variable and the outcome variable are blocked after conditioning on \(Z\), which will always be the case if each back-door path
contains a chain of mediation \(A\rightarrow C \rightarrow B\) where the middle variable \(C\) is in \(Z\)
contains a fork of mutual dependence \(A \leftarrow C \rightarrow B\), where the middle variable \(C\) is in \(Z\)
contains an inverted fork of mutual causation \(A \rightarrow C \leftarrow B\), where the middle variable \(C\) and all of \(C\)’s decendents are not in \(Z\)
and …
Condition 2 No variables in \(Z\) are decendents of the causal variable that lie on (or decend from other variables that lie on) any of the directed paths that begin at the causal variable and reach the outcome variable.
Let’s revisit our example earlier and test our vocabulary.
We have a chain of mediation from \(C \rightarrow O \rightarrow Y\) and a fork of mutual dependence with \(D \leftarrow C \rightarrow O\).
We will now work through two more advanced examples where we focus on only the first conditions of the back-door criterion. Let’s start with a simple example and apply the idea of back-door identification to a graph where we consider conditioning on a lagged outcome variable \(Y_{t -1}\).
There exist two back-door paths and \(Y_{t - 1}\) lies on both of them. However, conditioning on it does not satisfy the back-door criterion. It blocks one path. \(Y_{t - 1}\) is a collider variable on one of the paths.
Let us practice our understanding for some interesting graph structures. The backdoor algorithm is also available here for your reference.
Let’s study the following causal graph:
Consider the following three candidate conditioning sets. Any thoughts?
\(\{F\}\)
\(\{A\}\)
\(\{A, B\}\)
Finally, let’s focus on the second condition.
Condition 2 No variables in \(Z\) are decendents of the causal variable that lie on (or decend from other variables that lie on) any of the directed paths that begin at the causal variable and reach the outcome variable.
We first look at a graph that illustrates what a descendent is and remind ourselves of the difference between a direct and an indirect effect.
Conditioning on \(N\) (in addition to either \(C\) or \(O\)) does not satisfy the back-door criterion due to its violation of the second condition.
How about this causal structure:
Let’s evaluate the candidate conditioning set \(\{O, B\}\) together.
By now you probably recognized the mechanical nature of checking the back-door criterion for a given causal graph. Here are some automated tools to make your life easier in the future, but also allow you to practice your own understanding.