8  Causal Inference

Statistics teachers oftentimes proudly declare to their students that correlation is not causation when emphasizing the idea that just because two things move together that does not mean that one thing is causing the other thing. We have discussed examples of this before. So, the question that (at least for me) always itched in the back of my mind was “Okay. Well, what is causality then? What does it mean for a thing to cause another thing?”

Our final few chapters cover treatment effects/causal inference for policy analysis. Strictly speaking, we could have an entire concentration on this in Georgia State’s public policy school. So, as an introduction, this chapter seeks to provide you with the basic philosophy of causal inference, specifically, what it is as a concept. We will cover the potential outcomes framework, randomization of treatments for experiemnts, as well as how we can use regression modeling that we have discussed so far to estimate causal effects in the idealized setup of randomization. Finally, we cover threats to validity in the context of impact evaluation.

8.1 What Is Causality?

As I mentioned in the chapter on correlation, humans have evolved to think hypothetically. It is how we have survived for as long as we have. Causal inference in particular demands that we imagine another world that we believe could exist, but doesn not exist.

Definition 8.1 (Counterfactual) A counterfactual is defined as the quantitative difference between what we really see for one outcome (a realized outcome one might say), and what we would have seen if some intervention did not take place. This unrealized, imagined outcome is the counterfactual because it runs contrary to the observed facts.

We as human beings think in this manner all the time.

  • How would the American economy have evolved post 1860 if the Civil War never happened?

  • What if a school did a new math curriculum? Would math scores improve?

  • How would gun homicide statistics look, 6 months from now, if a state didn’t pass gun control policies?

  • How would a grocery store’s in-store sales have evolved if it did not implement all self checkout scanners?

  • Did Nebraska’s repeal of the tampon tax affect tampon use? How would tampon sales have evolved if Nebraska did not get rid of the tax?

  • How would New Orleans’ outward migration have looked if Hurricane Katrina did not happen?

We as humans believe a counterfactual is byproduct of what we call a treatment effect, or the actual quantitative impact of the treatment, policy, or intervention. The treatment effect may be thought of as the consequence of an action; if I throw my phone into the pool from my balcony, the treatment effect of this is that my phone goes from \(\text{working}=1\) to \(\text{working}=0\).

Definition 8.2 (Treatment Effect) Mathematically, we can represent the treatment effect like \(\tau_{i} = y_{i}^{1} - y_{i}^{0}\), where the \(y_{i}^{1}\) denotes an outcome we observe (the unit under treatment) versus the outcome \(y_{i}^{0}\), which we do not observe (the untreated values).

That is, we assume treatments, interventions, or policies have some non-0 effect on the outcome of interest.

Note

Note that the Greek letter “t-ow” is used as a symbol for the treatment effect, yet you may use other symbols in your writing should you wish. By the way, the subscript \(i\) here indexes the units of interest. You may use any subscript you wish. If you have outcomes for a single state over time, nothing is wrong with saying \(y_{t}\), where the \(t\) indexes some generic time period and \(y\) whatever outcome we’re concerned with. If you have outcomes for multiple cities in a cross sectional dataset, nothing is wrong with writing something like \(y_c\). Similarly, if you are referring to a panel data structure (which you will be) of cities over a periods of weeks, nothing is wrong with saying something like “Our outcome is the unemployment rate for each city over time, or \(y_{cw}\)”. So long as you are consistent with the notation you use.

8.2 The Fundamental Problem of Causal Inference

Unfortunately for us, we get but one copy of reality. The American Civil War ended in 1865. The real GDP of the U.S. in 1866 was roughly 88.8 billion dollars (in today’s money). So, the formula for the treatment effect for the year 1866 is \(\tau_{i} = 88.8 - ?\) Why is there a question mark here? Well, we cannot literally look at two realities. We have only one reality where the United States of America’s Civil War happened (the reality we in fact have). We cannot see the reality where it didn not happen (not that we would really want to, by the way). It is impossible to access that reality in the current universe we exist in. Thus, counterfactuals are things we can estimate, guess about, and speculate on, but never ever see in real life. Before we get into how we would estimate counterfactuals statistically, though, let us use a more relatable example.

Suppose I am going to school today. I think the way I take to school (Way A) is quicker than Way B. This gives us a set of two ways to take, \(d \in \{0,1\}\) (read as “d in 0 1”), where \(d=0\) means we have taken Way B and \(d=1\) means we have taken Way A. The outcome of interest \(y\) is the commute time associated with each way we take.

We may express this scenario formally as \(d \mapsto y\left(d\right)\). This means that our commute time we ultimately see/observe is a function of the road we choose. And this makes sense: assuming no traffic of course, we’d imagine that the highways takes a shorter commute time than the backstreets. We may represent the outcomes of each way as \(y^A\) and \(y^B\), where naturally \(y^A\) is how long it takes if we take my way and \(y^B\) is how long it takes if we go the other way. Here, \(\tau\) is the difference in minutes between the way it took me by taking Way A, and the time it would’ve taken me if I had taken Way B. In fact, I did this as I wrote this. I used Google Maps to tell me how long the drive from my apartment in Marietta to Georgia Tech would be. Using the highway it takes 14 minutes. But, one of the options when I avoid highways takes 23 minutes.

Way Taken Indicator \(d\) Commute Time Outcome \(y\)
Way A \(d=1\) \(y^A = 14\) min \(y = y^A = 14\) min
Way B \(d=0\) \(y^B = 23\) min \(y = y^B = 23\) min
Treatment Effect \(\tau\) \(y^A - y^B = -9\) min N/A

To do this in Stata, let’s get this exact same effect size with regression, using the model \(y=\beta_0+\beta_1x_i\), where \(y_i\) is our commute time, \(\beta_0\) is the constant (or the time we would expect to take if we’d taken Way B), \(\beta_1\) is the effect of taking Way A, and \(x_1\) is the value for having taken Way A (\(x=1\)), relative to Way B.

If you’ve been paying attention (or reading), you rememeber from last class that the way we interpret the OLS coefficients/betas/estimates for a continuous variable is “a one unit increase in \(x\) leads to a \(\beta\) change in \(y\), compared to other similar units with the same characteristics”. In the case of a dichotomous variable however (such as taking the Highway A), we interpret the \(\beta_1\) as the difference of commute times between taking Way A and B.

// Again, in a Stata do file, paste this code

clear * // clears our terminal

set obs 2 // sets the two observations

cls // clears the output on the screen

g commutetime = 14 in 1 // Way A

replace commutetime = 23 in 2 // Way B

g taken = 1 in 1 // we took Way A

replace taken = 0 if taken == . // but not Way B

reg commutetime taken // using regression to estimate the impact

See how this returns the same coefficient as above? The way we interpret the taken coefficient is that the expected commute time when taking Way A is 9 minutes less than what we’d expect had I taken Way B. Similarly, it tells us that not taking Way A (or, if \(x=0\)), we travel 23 minutes.


------------------------------------------------------------------------------
 commutetime | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
     1.taken |         -9          .        .       .            .           .
       _cons |         23          .        .       .            .           .
------------------------------------------------------------------------------
Important

Suppose I do indeed take Way A, as I would, and that it in fact takes 14 minutes. Does this mean the effect of Way A is in general quicker by 9 minutes? No, not exactly. Maybe I do take \(y^A\), but traffic builds on Way A, but traffic does not build up on the other way. Or, maybe \(y^A\) still would take 14 minutes, but the other way, \(y^B\), happens to (this time) take 20 minutes instead of 23, meaning our treatment effect is now \(14-20=-6\). The problem inherent here is I cannot take both ways at once. I have a choice to make, and once I choose I must commmit to it. I can either take my way or the other way, I can’t do both on the same day at the same time. Thus, because of this choice, I can only guess as to what \(y^B\)’s travel time actually would have been for me on that day. Only one outcome exists in reality.

Definition 8.3 (Fundamental Problem of Causal Inference) The fundamental problem of causal inference is what is called the switching equation, or the idea that we observe units as treated or untreated, but never both: \(y_{i}=dy_{i}^{1}+\left(1-d \right)y_{i}^0\). The switch comes from the literal switch to being treated, when once you were previously untreated. That is, we only observe some of our potential outcomes, so termed because their possible values are a function of treatment assignment.

Let’s break down the order of operations from Definition 8.3, shall we?. If we take Way A (\(d=1\)), we get \(y=y^A \times 1 +\left(1-1\right)y^B\), or just \(y^A\) since anything multiplied by 0 is just 0 and \(y^A \times 1\) is just \(y^A\). In other words, taking Way A necessitates that we do not take Way B. If we take Way B (\(d=0\)), we get \(y=y^A \times 0 +\left(1-0\right)y^B\), or just \(y^B\) because now \(y^A \times 0=0\) and \(\left(1-0\right)y^B\) is just \(1 \times y^B\). Taking Way B means we cannot take Way A.

In other words, the 9 minute effect size is but an estimate of an imperfectly accounted for reality. The treatment effect is inherently unobservable because the counterfactual effect is in principle unobservable. The counterfactual, and therefore the treatment effect, are both things we have to estimate.

8.3 Randomized Controlled Trials

Establishing causality and generating counterfactuals are all about comparisons. Typically, we compare a group of one or more units that did an intervention or policy to units that did not do the same policy. We use regression as a vehicle to facilitate this comparison. Before we do this for a real policy example though, let’s think about how this is done in a (close to) ideal setting.

Definition 8.4 (Randomized Controlled Trial) A randomized controlled trial is a form of study design where we assign a treatment at random to a certain number of people or units or entities. In such a design, random assignment means every unit should have a 0.5 probability of recieving the treatment.

In medicine, we must test drugs in order to see if they work before we allow them to be used on humans in a broader sense. We use randomized controlled trials to try to establish the efficacy of drugs. In this case, the individual is the unit of interest. There are \(N\) such individuals. Those who get the treatment we call the treatment group with their number being \(N_{\text{tr}}\). Those who do not get the treatment are called the control group (or, sometimes we call the untreated group the donor pool), with their number being \(N_{\text{co}}=N-N_{\text{tr}}\).

We randomize treatments by using a computer to flip a coin across our \(N\) individuals/units to determine if it gets treatment. If treatment assignment is truly random, this now means that any other covariates that may influence the outcome do not predict treatment status or outcome information. We call these additional covariates that that cloud the relationship between the treatment and the outcome confounders.

Say we wish to study the impact of a vaccine on recovery time. We cannot just give the vaccine to some people and not others in a non-random way (i.e., giving the vaccine to anyone who wants it) because maybe other variables may confound the relationship between the recovery rates and the desire to take the vaccine.

Perhaps those who took the vaccine are younger on average than those who didn’t– maybe they wanted to be extra safe from the virus. Or, maybe younger people had bettter baseline health characteristics (compared to older adults). This means, on average, those who took the vaccine would recover from COVID (say) quicker than the control group, not completely because of the vaccine but because they were already healthier or younger on average compared to the control group.

Example 8.1 (Vaccine) When the coin flip decides who gets the vaccine, then in a large enough representative sample, our treatment and control groups are balanced across all confounders, on average. We say “balanced” because when all study participants of all ages, races, and so on are equally likely to be given the vaccine or not, the average difference in recovery time can be better attributed to the vaccine instead of other factors such as age. The way we practically would check this is to take the average of the covariates in the treatment and control group after we have randomized. Ideally, they should be a similar average of age, in this case, between both groups.

After we have randomized our treatment successfully, we can then calculate what the effect size is. We can do this using averages or OLS regression. In the case of a randomized controlled trial, the estimand of interest is the average treatment effect.

Definition 8.5 (Average Treatment Effect) We may compute the average treatment effect as \(\text{ATE}=\frac{1}{N}\sum_{i=1}^{N}y_i^1-y_i^0\). This is only valid when the treatment is randomized.

Suppose the observed outcome for one person is 8, but their potential outcome would be 5, and that for \(i=2\) the observed outcome is 7, but without the treatment it would have been 6 (again, supposing these are the results of a randomized experiment). Here is an example of how this might be calculated by hand.

\[ \begin{aligned} \text{ATE} &= \frac{1}{N} \sum_{i=1}^{N} \left( y_i^1 - y_i^0 \right) \\ &= \frac{1}{2} \left( (y_1^1 - y_1^0) + (y_2^1 - y_2^0) \right) \\ &= \frac{1}{2} \left( (8 - 5) + (7 - 6) \right) \\ &= \frac{1}{2} \left( 3 + 1 \right) \\ &= \frac{1}{2} \times 4 \\ &= 2 \end{aligned} \]

Here is how we’d estimate the ATE in Stata.


clear *
set seed 1455

set obs 2000


gen age = rnormal(40, 10) // average age is 40, standard dev is 10 years

// Height: Mean = 170 cm, SD = 10 cm


gen gender = round(runiform())

gen height = cond(gender == 1, rnormal(175, 8), rnormal(165, 7))


// Income: a baseline function of age, height, and gender, plus some random noise
generate income = 20000 + 500*age + 100*height + 10000*gender + rnormal(0, 5000)


// Unbalanced treatment assignment: treatment based on gender, age, or height
gen treat_unbalanced = cond(gender == 1, (age > 40 & height > 170), (age > 35 & height > 160))
cls

// Our imbalacned covariates

mean age income gender height, over(treat_unbalanced) // focus on the differences in means

// Our biased ATE

regress income i.treat_unbalanced age height i.gender // use OLS to estimate the treatment effect


/*
We see that guys are much more likely to make more money already.

We also see that every centimeter taller you are, you'll make more money.


We also see every year older you are, you'll make 484 more dollars, roughly.

The treatment effect is very small by comparison since the current effects are

dominated by gender. To have a truly accurate estimate, we have to randomize.
*/


generate treat_randomized = runiform() > 0.5


// check for balance...

mean age income gender height, over(treat_randomized)

// these are definitely balanced


// Post-treatment income
generate income_post = income + 5000 * treat_randomized

// See how income was affected with the randomized treatment. This is our unbiased ATE

regress income_post treat_randomized age height gender

regress income_post treat_randomized

mean income_post, over( treat_randomized)

// HERE is our ATE

di 67129.88 - 61573.52

/* See how regardless of whether we adjust for age, height, and gender, the 
coefficient for treat_randomized is still pretty much the same?

This is because randomization takes care of the other 
stuff that affects the outcome. */

Here I assign a treatment totally at random and use OLS to estimate the causal impact (5000 in this case).The impact is not quite 5000… But very very close!!! As we discussed last class, if we had a bigger sample size, the more likely we are to approximate the true effect size (change the sample size from 2000 to 200000 and tell me what you get.)



      Source |       SS           df       MS      Number of obs   =     2,000
-------------+----------------------------------   F(4, 1995)      =   1276.99
       Model |  1.2433e+11         4  3.1083e+10   Prob > F        =    0.0000
    Residual |  4.8560e+10     1,995  24340786.6   R-squared       =    0.7191
-------------+----------------------------------   Adj R-squared   =    0.7186
       Total |  1.7289e+11     1,999  86489211.5   Root MSE        =    4933.6

----------------------------------------------------------------------------------
     income_post | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
treat_randomized |   5143.774   220.9541    23.28   0.000     4710.449    5577.099
             age |   476.3055   10.80702    44.07   0.000     455.1113    497.4997
          height |   93.74094    15.0525     6.23   0.000     64.22068    123.2612
          gender |   10047.37   267.2298    37.60   0.000     9523.291    10571.45
           _cons |   22088.73   2530.033     8.73   0.000     17126.94    27050.51
----------------------------------------------------------------------------------

reg income_post treat_randomized

      Source |       SS           df       MS      Number of obs   =     2,000
-------------+----------------------------------   F(1, 1998)      =    195.86
       Model |  1.5435e+10         1  1.5435e+10   Prob > F        =    0.0000
    Residual |  1.5746e+11     1,998  78807250.6   R-squared       =    0.0893
-------------+----------------------------------   Adj R-squared   =    0.0888
       Total |  1.7289e+11     1,999  86489211.5   Root MSE        =    8877.3

----------------------------------------------------------------------------------
     income_post | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-----------------+----------------------------------------------------------------
treat_randomized |   5556.364   397.0268    13.99   0.000     4777.734    6334.994
           _cons |   61573.52   279.3331   220.43   0.000      61025.7    62121.33
----------------------------------------------------------------------------------

We can of course do something similar with panel data too.

clear *
set seed 1455
cls

set obs 400
// Number of people

gen id = _n
// Generate a unique ID for each person

expand 4
// Create 4 observations per person

gen time = _n
// Generate time variable (_n = observation number)

bysort id: replace time = _n

// Step 2: Randomly assign treatment

gen treat = 0
// Initialize treatment to 0 for all

gen rand = runiform()  
// Generate random numbers to randomly assign treatment

bysort id: replace treat = 1 if rand < 0.5 & time >= 3  
// Assign treatment starting at time 3
// If it is over .5 you're treated

// Step 3: Generate age
gen age = 20 + 5*runiform()  
// Generate age between 20 and 25

// Step 4: Define income

gen income = 50000 + 200*age + 1000*treat + rnormal(0, 1000)  
// Baseline income with effect of age and treatment

drop rand   // Drop the random number variable as it is no longer needed


cls


xtset id time  
// Define the panel structure

// as usual, you may ask Stata for help via
// "help [command]", 
// such as, help egen

bys id: egen evertreat = max(treat)

xtset id time

xtdescribe

mean age  if time < 3, over(evertreat)
// check for balance in the pre period, given a random treatment

// one may use a bar graph to visualize balance

graph bar (mean) age if time < 3, over(evertreat) blabel(bar)

I don’t simulate a treatment effect. This is just to show how it may look in the setup of a randomized controlled trial. In this case we observe multiple time periods for multiple units, and we assign some to be treated at time 3 and 4, and the others never treated. Note my use of xtset to declare the data as panel data (\(N >1\) and \(T> 1\)) and xtdesctibe to tell me the characteristics of the dataset.

8.4 Problems With Randomization

The central issue with randomization is that there are some interventions (in fact, most of them) that researchers simply cannot randomize, mostly due to ethical concerns. We cannot simply have half of the states in the nation raise wages and the other half do nothing, that simply is not how the government works. Plus, as we will see below in more detail, selection biases and other threats to the validity of statistical findings are often an issue. After all, many treatments of interest have explicit assignment mechanisims (i.e., this neighborhood has high crime rates therefore we elect to send more police as a response to crime). Beyond this, sometimes our available set of control units may differ in important ways from the unit that is treated. Furthermore, ranodomization is simply very expensive and time consuming; indeed, the world does not often provide for good laboratory conditions. Thus, your task as a policy analyst is to use statistical analysis/regression to adjust for the factors nature has not already controlled for.

Example 8.2 (Turkey) In February of 2023, Turkey had an earthquake. Suppose we are interested in the effect of this earthquake on the local economic outcomes (e.g., housing prices/insurance) for the affected cities or some bigger picture outcome for the entire country. Well, researchers cannot randomize earthquakes to strike certain cities versus others, and even if we could this would be morally unacceptable. So assuming we were comparing cities in Turkey that were affected to those that were not, the affected areas may differ in their baseline characteristics from unaffected areas. For example, maybe poorer areas were more vulnerable than richer ones. For a cross-country comparison, maybe bulding codes would explain the differences in the effect of the earthquake which, in turn, affect the economic implications for Turkey versus another unexposed nation.

Example 8.3 (Weed) Another example is cannabis legalization. We cannot flip coins to have some states legalize cannabis and others not. Canabis’ legality is decided by the preferences of the legislature, and sometimes the voters. Thus, we could run into the problem of selection bias (as in, maybe some states are more likely to legalize cannabis than others). We also run into counfounding biases. If we wish to see how legal cannabis affected alcohol sales for Oregon, then we need to consider what other factors may affect alcohol consumption aside from the policy of interest. That is, Oregon may differ from other states (say, Alabama or Mississippi) on key characteristics that makes the causal comparison unreasonable. Maybe the price of alcohol between Oregon and a set of others states was not similar enough. Maybe Oregon simply had different economic conditions that made alcohol consumption more or less likely. Perhaps cultural factors would lead to higher level of alcohol consumption anyways, absent cannabis legalization. The fact that we cannot randomize means that researchers cannot make plausible the unconfoundedness assumption (or, lack of omitted variable bias) which underlies OLS regression models in a wide variety of contexts.

8.5 Threats to Validity

Of course, the subject of the next chapter will be how to properly address for lack of randomization (in some cases). For now though, we cover the kinds of threats to validity of causal estimates, some of which will be quite helpful in characterizing the plausibility of the causal estimates you produce. That is, all of these ideas i list below are threats to a study’s conclusions. Internal validity has to do with if the study is studying what it purports to- that is, how sound are the processes of the study? External validity deals with how we may generalize our findings to a bigger population.

8.5.1 Internal Validity

  1. History: Bias occurs when external events influence the outcome in some way. Example? Holidays. Suppose we are a grocery store and we wish to see if the introduction of some new product will increase sales. Suppose for the moment that this is a new chocolate bar If we introduce the chocolate bar this week, and Valentine’s Day is next week, we cannot really attribute the growth of rose sales to the chocolate bar, even though roses and chocolate are complimentary goods in many cases. Why not? It’s Valentine’s Day! People are more likely to buy things in general that day and in the days leading up to it, we couldn’t attribute the effect of the chocolate bar to sales unless we do additional adjustments to our sales estimates. Another example was this one I mentioned from the correlation chapter, where the historical bias that leads to rise of more school shootings has to do with the fact that schools just opened, and therefore could not have had school shootings when they were not open.

  2. Maturation: Natural changes in participants over time that affect outcomes independent of the treatment. For example, suppose we wish to see if the new chocolate bar has weed in it, and we wish to see if the weed and chocolate affected sales in the following 6 months after its introduction. We’d need to be sure that the treated unit, relative to its comparison units, has the same internal composition. In other words, what if over the next 6 months the area continues to gentrify further, bringing in people with higher incomes. Maybe, spending at the store would have increased anyhow, independent of a new product.

  3. Testing: The effect of giving a treatment more than once. Not really relevant for policy scientists unless you’re doing a true experiment with humans, but perhaps repeated tests mean people get wise to the format of the test such that the outcomes are no lnoger being meaningfully tested.

  4. Instrumentation: Changes in the the way we measure the outcome over time. This is pretty common, in fact. Actually, instrumentation has even been advocated for in the context of COVID-19 policy. If the CDC is testing COVID cases one way, and then they either stop testing or make important chagnes to how they test, this may invalidate the estimated causal effect of some policy because it may make it look like the policy was much more/less effective than it in fact was. The more consistent your outcomes are collected, the better off your estiamtes will be.

  5. Statistical Regression: Extreme results tend to revert back to the average trend over time. Suppose we say Haitian immigrants are the cause of increases in welfare, rising property values or car accidents. Perhaps an empirical effect exists, but presumbly (due to maturation over time), new residents will get jobs and adjust more into the community (i.e., having more driver’s licenses). So, while there may be a spike in some outcome after an intervention, it is likely that the outcome may return to trend, especially for short lived shocks that aren’t expected to have very long term effects. In the example I just linked to, “Local rents [in Springfiled, Ohio] did increase at the third-fastest pace among cities from May 2022 through the end of 2023, rising at a 14.6% annualized pace, data from Zillow shows. But the market also appears to be normalizing: Rents this year have risen at a modest 3.2% pace, 68th fastest among 400 cities sampled.” By the way, this is another paper topic you could do. You could, if you wished, see the causal impact of how the introduction of Haitian migrants affected the price of rent in Springfield Ohio, compared to others who had no such introduction, since data on this exists.

  6. Selection: Baseline differences between treated and untreated units which make their outcomes incomparable. The classic example in economics is the benefits of education on income. People who get PHDs and numerous advanced degrees likely differ substantially from the rest of the population who do not. Maybe they, on average, come from wealthier families or are dissimarly situated compared to those who don’t go to college. In other words, maybe people get education on the grounds that they will make more money as a result; or maybe such people are more motivated in a sense, and would have made more money anyways without the education.

  7. Experimental Mortality (Attrition): Participants leave the study over time. Say a grocery store goes all self checkout on grounds that it will make more money. We compare one grocery store to 50 others, and of those 50, say 20 close over the 6 month study period. Well, unless we attenuate for this, it is likely the case that a store that is doing experimental things, from a corporate perspective, is already “built different” from the ones who closed. In particular, the onws who did close may have closed for reasons that make them dissimilar to the treated units. So, keeping the stores in the sample who all closed may bias our effects because the control units in this case likely were evolving differently with respect to the treated units anyhow.

  8. Selection-Maturation Interaction: When different groups mature differently, this leads to heterogeneous trends that makes it tough to estimate the treatment effect. I mentioned gentrification above; what if one city, or a cluster of them, mature at much different rates compared to the treated units. There may be reasons for this differnece in maturation.

Satisfying these may not always be relevant for every study. For example, if all your units remain in the study, you likely do not need to care about attrition. If nobody can increase their baseline changes of being treated, selection may noy be a big issue.

8.5.2 Threats to External Validity

These are threats that compromise the generalizability of a study’s findings.

  1. Interaction of Testing and Treatment: The act of testing may change participants’ responses. Not really relevant for many public policy settings where we typically study macro-level interventions.

  2. Interaction of Selection and Treatment: If the treatment effects are unit specific, we cannot generalize about them. If we wish to see a NBA training program affected basketball skills, we cannot include normal people in the control group since NBA players will smoke any normal human being. Their baseline differences are so gigantic that the effect of the program would be wildly overstated.

  3. Multiple-Treatment Interference: Multiple treatments harm our ability to isolate the impact of one single treatment.

As above, satisfying these external threats to validity may not be necessary in every study, since each treatment is unique. If only one intervention occurs (which we may have to verify in some cases), we don’t have to worry about interference of treatments.

8.6 Summary

Unlike the OLS chapter which was very math intensive, this chapter is likely the most conceptually heavy. Thinking in a causal manner requires practice. It is not simply a scientific tool, it is a framework with which we can use to evaluate the world through. It forces us to step back and consider, step by step, the methods and mechanisms which policy affects things through. If you do this for long enough, you will even hear causal claims like “This politician reduced crime” and inquire, implicitly, about what policies they did or did not do which brought about that hypothesized reduction.

To say it differently, causal inference forces you to ask if things are true in a systematic way; if things are true BECUASE of a certain thing, or more accurately, the degree to which they’re true because of a thing. In the industry, academia, and non-profits, it is the causal mindset, largely within the potential outcomes framework, that has taken the throne for impact evaluation, policy studies, and decisionmaking.