PSY 505: Foundations of Statistical Methods in Psychological Science

class: center, middle, inverse, title-slide

.title[
# PSY 505: Foundations of Statistical Methods in Psychological Science
]
.subtitle[
## More Causality: Assumptions and Threats
]
.author[
### Jason Geller, Ph.D.
]
.institute[
### Princeton University
]
.date[
### 2022-09-13
]

---

- Give the average treatment effect in the population, and the estimated treatment effect based on a simple comparison of treatment and
control

<table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> names </th>
   <th style="text-align:right;"> x </th>
   <th style="text-align:right;"> z </th>
   <th style="text-align:right;"> y0 </th>
   <th style="text-align:right;"> y1 </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> Cody </td>
   <td style="text-align:right;"> 3 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 5 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Henna </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 10 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Jamie </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 3 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Branson </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 12 </td>
   <td style="text-align:right;"> 13 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Nicole </td>
   <td style="text-align:right;"> 5 </td>
   <td style="text-align:right;"> 0 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Sarah </td>
   <td style="text-align:right;"> 10 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 8 </td>
   <td style="text-align:right;"> 9 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Karen </td>
   <td style="text-align:right;"> 2 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 4 </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> Claire </td>
   <td style="text-align:right;"> 11 </td>
   <td style="text-align:right;"> 1 </td>
   <td style="text-align:right;"> 9 </td>
   <td style="text-align:right;"> 13 </td>
  </tr>
</tbody>
</table>
- Inferring causality requires 3 things. What are they?

- What is the fundamental problem of causal inference? How can we solve it?

---
# Notation

- _Indexing experimental individuals/units:_ the subscript `$i$` refers to
unit 1 to N

- _Defining treatment assignment:_ The variable `$z_i$` indicates whether the ith individual is assigned to receive the treatment

- _Defining treatment:_ The variable `$d_i$` indicates whether the ith subject is treated

- `$z_i = 1$` means the ith subject was assigned to receive the treatment

- `$z_i = 0$` means the ith subject was not assigned to receive the treatment

- `$d_i = 1$` means the ith subject receives the treatment

- `$d_i = 0$` means the ith subject does not receive the treatment

---
# Potential and Observed Outcomes : Switching Equation

- Causal inference is a missing data problem!

- Observed outcome `$Y_i$` -> Underlying potential outcomes

\begin{equation} 
  Y_i = Y_i(1)z_i + Y_i(0)(1-z_i)
\end{equation}

- Treatment Applied: `$$Y_i = 1 * Y^1_i  + 0 * Y^0_i$$`
  `$$Y_i = Y^1_i$$`
- Treatment Not Applied: `$$Y_i = 0 * Y^1_i + 1 * Y_i^0$$`
  `$$Y_i = Y^0_i$$`
---
# Potential Outcomes

- Regardless of which treatment an individual receives, all individuals have a potential response in the event that treatment is or is not received

- Potential outcomes are written `$Y_i(d)$`, where the argument `$d$` indexes the treatment 
    - `$Y_i(1)$` is the potential outcome if the ith individual was treated

- `$Y_i(0)$` is the potential outcome if the ith individual was not treated

- Potential outcomes are fixed attributes of each individual and represent the outcome that would be observed hypothetically if that individual were treated or untreated  
---
# Conditional Potential Outcomes

- Potential outcomes for a subset of subjects

- `$Y_i(d) | X = x$` denotes potential outcomes when the condition `$X = x$` holds for individual `$i$`

- `$Y_i(0) | d_i = 0$`: untreated potential outcome for individuals who do not receive the treatment

- `$Y_i(0) | d_i = 1$`: untreated potential outcome for individuals who do receive the treatment

- `$Y_i(1) | d_i = 0$`: treated potential outcome for individuals who do not receive the treatment

- `$Y_i(1) | d_i = 1$`: treated potential outcome for individuals who do receive the treatment   
---
# Estimation of the ATE

`\begin{equation}
    \begin{split}
\mathrm{ATE} &= \frac{1}{N} \sum_{i=1}^{n}{\tau_i} \\
&= \frac{1}{N} \sum_{i=1}^{n}{(Y_{i}(1) - Y_{i}(0))} \\
&= \frac{1}{N} \sum_{i=1}^{n}{Y_{i}(1)} - \frac{1}{N} \sum_{i=1}^{n}{Y_{i}(0)} \\
&= \mu_{Y(1)} - \mu_{Y(0)}
    \end{split}
\end{equation}`

in which `$\mu_{Y(1)}$` is the average value of `$Y_i(1)$` for all individuals and `$\mu_{Y(0)}$` is the average value of `$Y(0)$` for all subjects.
---
# Estimation of the ATE in Experiments

In experimental studies, researchers estimate `$\mu_{Y_i(1)}$` using the mean `$\widehat{\mu}_{Y(1)}$` of all observed `$Y_i(1)$` and `$Y_i(0)$` using the mean `$\widehat{\mu}_{Y(0)}$` of observed `$Y_i(0)$`. We have:

`\begin{equation}
\widehat{\mathrm{ATE}} = \widehat{\mu}_{Y(1)} - \widehat{\mu}_{Y(0)}
\end{equation}`

in which `$\widehat{\mathrm{ATE}}$` is the estimated ATE, `$\widehat{\mu}_{Y(1)}$` is the estimated `$\mu_{Y(1)}$`, and `$\widehat{\mu}_{Y(0)}$` is the estimated `$\mu_{Y(0)}$`.  
---
# Precision of Individual Experiments

- Do experiments inevitably provide precise estimates of the ATE?

- An estimate from just one experiment is only a best guess about the true value of the ATE

- ATE is often too high or too low

- Our dataset is just one of many possible data sets that could have been created via random assignment. If we would redo the exact same random assignment procedure, different units would be allocated to treatment and control groups!

- So what is the point?  
---
# Bias

- What is bias?

> Estimates are __unbiased__ if they yield the correct estimate of the ATE __in expectation__ (i.e., on average)

- The average estimated ATE across all possible random assignments is equal to the true ATE 
  
- Assumptions: necessary conditions for experimental estimates of the ATE to be unbiased   
---
# ATT and ATU

- Average treatment on the treated

- Effect for those with treatment
--

- Average treatment on the untreated

- Effect for those without treatment

---

.smaller.sp-after[
<table>
 <thead>
  <tr>
   <th style="text-align:center;"> Person </th>
   <th style="text-align:center;"> Age </th>
   <th style="text-align:center;"> Treated </th>
   <th style="text-align:center;"> Outcome with program </th>
   <th style="text-align:center;"> Outcome without program </th>
   <th style="text-align:center;"> Effect </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:center;"> 1 </td>
   <td style="text-align:center;"> Old </td>
   <td style="text-align:center;"> TRUE </td>
   <td style="text-align:center;"> **80** </td>
   <td style="text-align:center;"> 60 </td>
   <td style="text-align:center;"> **20** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 2 </td>
   <td style="text-align:center;"> Old </td>
   <td style="text-align:center;"> TRUE </td>
   <td style="text-align:center;"> **75** </td>
   <td style="text-align:center;"> 70 </td>
   <td style="text-align:center;"> **5** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 3 </td>
   <td style="text-align:center;"> Old </td>
   <td style="text-align:center;"> TRUE </td>
   <td style="text-align:center;"> **85** </td>
   <td style="text-align:center;"> 80 </td>
   <td style="text-align:center;"> **5** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 4 </td>
   <td style="text-align:center;"> Old </td>
   <td style="text-align:center;"> FALSE </td>
   <td style="text-align:center;"> 70 </td>
   <td style="text-align:center;"> **60** </td>
   <td style="text-align:center;"> **10** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 5 </td>
   <td style="text-align:center;"> Young </td>
   <td style="text-align:center;"> TRUE </td>
   <td style="text-align:center;"> **75** </td>
   <td style="text-align:center;"> 70 </td>
   <td style="text-align:center;"> **5** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 6 </td>
   <td style="text-align:center;"> Young </td>
   <td style="text-align:center;"> FALSE </td>
   <td style="text-align:center;"> 80 </td>
   <td style="text-align:center;"> **80** </td>
   <td style="text-align:center;"> **0** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 7 </td>
   <td style="text-align:center;"> Young </td>
   <td style="text-align:center;"> FALSE </td>
   <td style="text-align:center;"> 90 </td>
   <td style="text-align:center;"> **100** </td>
   <td style="text-align:center;"> **-10** </td>
  </tr>
  <tr>
   <td style="text-align:center;"> 8 </td>
   <td style="text-align:center;"> Young </td>
   <td style="text-align:center;"> FALSE </td>
   <td style="text-align:center;"> 85 </td>
   <td style="text-align:center;"> **80** </td>
   <td style="text-align:center;"> **5** </td>
  </tr>
</tbody>
</table>
]

.pull-left.small[
`$\delta = (\bar{Y}_\text{T}\ |\ P = 1) - (\bar{Y}_\text{T}\ |\ P = 0)$`

`$\delta = (\bar{Y}_\text{U}\ |\ P = 1) - (\bar{Y}_\text{U}\ |\ P = 0)$`
]

.pull-right.small[
`$\text{CATE}_\text{Treated} = \frac{20 + 5 + 5 + 5}{4} = 8.75$`

`$\text{CATE}_\text{Untreated} = \frac{10 + 0 - 10 + 5}{4} = 1.25$`
]
---
class:middle center
# Conditions, Assumptions, and Threats to Causal Identification

---
# Independence: A Necessary Condition

Treatment status is statistically independent of potential outcomes and background attributes `$(X)$`:

$$ D_i \perp\!\!\!\perp Y_i(0), \: Y_i(1), \: X $$
This means that knowing whether an individual is treated provides no information about the individual’s potential outcomes or background attributes.
---
# Random Assignment

- __In expectation:__ proper randomization of participants into experimental conditions creates groups that are similar on every single dimension except for the treatment

- __In expectation:__ Random assignment of individuals to different environments `$E_0$` and `$E_1$` creates subpopulations that have the exact same characteristics at the moment they enter these environments. 
  - Same heart rate, amount of sleep, age, income, or level of stress, etc.
---
# Lessons from R Simulations

- Groups are comparable!

- This demonstration is true for every possible characteristic of
participants

**The only difference between treatment and control is the
presence vs. absence of treatment (in expectation)**

---
# Causal Inference Assumption 1: Excludability (A.K.A. Exclusion Restriction)

- The _only_ relevant causal agent is receipt of the treatment

- The exclusion restriction breaks down if:

- Treatment assignment `$z_i$` sets in motion causes of `$Y_i$` other than the treatment `$d_i$`   
  
  - Asymmetries in measurement between conditions  
  
  - Noncompliance to the treatment  
---
# Treatment Assignment Brings in Other Causes

- Study causal effect of writing fiction on students' creativity.

- Treatment group: invitation to "enroll in a writing program that will increase their creativity"

---
# Asymmetries in Measurement

- Experimenter in charge of measuring the outcome of interest knows treatment status

- Participants know their treatment status and hypotheses

--
- **Double blind procedure**

---
# Noncompliance to the Treatment

- Assumption that participants _comply_ (or _adhere_) to their randomly assigned experimental condition

- Why would participants not comply? 
--

- How can noncompliance introduce bias in estimates of the population ATE?

- Invalidates treatment assignment   
  
  - Participants self-select into or select out from their assigned condition   
  
  - Participants who do not comply often have different potential outcomes schedules   
  
---
# Assumption 2: SUTVA

- Stable Unit Treatment Value Assumption
  
  - Consistency: Well-defined treatment

- E.g., Exercise

<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/JB2di69FmhE" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
---

# Assumption 2: SUTVA

- No interference - `$Y_i$` treatment has no effect on outcome any other person `$Y_j$`
  
  - E.g., Tutoring and grades
  
  - Is this realistic?

---
class:middle center

# Threats to Internal Validity

---
# Internal Validity

>  Extent to which you can be confident that a cause-and-effect relationship established in a study cannot be explained by other factors 
---
# Selection
<br>
<br>

- If people can choose to enroll in a program, those who enroll will be different from those who do not

--
- How to fix

- Randomization into treatment and control groups
---
# Maturation
<br>
<br>

- Growth is expected naturally

- E.g. programs targeted at childhood development contend with the fact that children develop on their own too

- How to fix

- Use a comparison group to remove the trend

---
# Attrition

- If the people who leave a program or study are different than those who stay the effects will be biased

- How to fix

- Check characteristics of those who stay and those who leave

---
# History

- An event intervenes to change participants

- How to fix

- Rule out by using a control/comparison group.

---
# Instrumentation

- The meaning of a measuring instrument changes over repeated use. Changes are not due to the treatment.

--
- How to fix

- Solve by using masked coders
  - Randomly assigning coders to stimuli
  - Training
---
# Testing

- Repeated exposure to questions or tasks will make people improve naturally

- How to fix

- Change tests 
  - Maybe don't offer pre-tests
  - Use a control group that receives the test

---
# Regression to the mean
<br>
<br>

- People in the extreme have a tendency to become less extreme over time

- How to fix

- Don't select super high or super low performers

???

This isn’t because the universe trends toward some average; an extreme value is because of systematic and random extremes, which are rare. Luck goes away
---
# Hawthorne effect

- Observing people makes them behave differently

- How to fix

- Hide? 
  - Use completely unobserved control groups

???

Experiments in 1924-1932 at Hawthorne Works

---

# John Henry effect

- Control group works hard to prove they're as good as the treatment group

How to fix

- Keep two groups separate
---
# Statistical Conclusion Validity

- Are the statistics correct?

---
# Power

.box-inv-1[A training program causes incomes to rise by $40]

.center.small[
<table>
 <thead>
  <tr>
   <th style="text-align:left;"> Person </th>
   <th style="text-align:left;"> Group </th>
   <th style="text-align:center;"> Before </th>
   <th style="text-align:center;"> After </th>
   <th style="text-align:center;"> Difference </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> 295 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 122.09 </td>
   <td style="text-align:center;"> 229.04 </td>
   <td style="text-align:center;"> 106.95 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 126 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 205.60 </td>
   <td style="text-align:center;"> 199.84 </td>
   <td style="text-align:center;"> -5.76 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 400 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 133.25 </td>
   <td style="text-align:center;"> 130.40 </td>
   <td style="text-align:center;"> -2.85 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 94 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 270.11 </td>
   <td style="text-align:center;"> 206.56 </td>
   <td style="text-align:center;"> -63.54 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 250 </td>
   <td style="text-align:left;"> Control </td>
   <td style="text-align:center;"> 344.37 </td>
   <td style="text-align:center;"> 222.89 </td>
   <td style="text-align:center;"> -121.49 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> 59 </td>
   <td style="text-align:left;"> Treatment </td>
   <td style="text-align:center;"> 312.41 </td>
   <td style="text-align:center;"> 268.06 </td>
   <td style="text-align:center;"> -44.35 </td>
  </tr>
</tbody>
</table>
]

---

# Power

.pull-left[
.box-1.small[Survey 10 participants]

]

.pull-right[
.box-1.small[Survey 200 participants]

]

---

# What's the right sample size?

.box-inv-1[Use a statistical power calculator to<br>make sure you can potentially detect an effect]

.center[
<figure>
  <img src="power-search.png" alt="Google power calculator" title="Google power calculator" width="50%">
</figure>
]

---
# Test assumptions

- Every statistical test has certain assumptions

- For instance, for OLS:

.center.float-left.smaller[
.box-1[Linearity]&ensp;.box-1[Homoscedasticity]&ensp;.box-1[Independence]&ensp;.box-1[Normality]
]

.box-inv-1.medium.sp-before[Make sure you're doing the stats correctly]

---

# Fishing and p-hacking

- Wouldn't it be awesome to run thousands of models with different combinations of variables until you find coefficients that are statistically significant?
--
.center[
<figure>
  <img src="phack.png" alt="p-hacking" title="p-hacking" width="60%">
</figure>
]

???

<https://projects.fivethirtyeight.com/p-hacking/>

---
# Spurious statistical significance

- If *p* threshold is 0.05 and you measure 20 outcomes, 1 will likely be significant

.center[
<figure>
  <img src="xkcd.png" alt="spurious" title="spurious" width="60%">
</figure>
]
---
# Observational Design

- Relative to the experimental design, observational designs have weaker internal validity (less control) but stronger external validity (more naturalistic)

- Observational design usually requires longitudinal data that comprise both pretest and post-test time periods
  - Key to a good observational design is high-quality comparison, so one must think really hard about the control group and document equivalence

---
# Observational Design

- __Natural experiment__: Naturally occurring event with pseudo-random variation in treatment status
    - Unplanned intervention (“shock”): Mother Nature and government policy changes are frequently the source of natural experiments
    - Assignment is outside the control of the investigator, but also of the participants

- __Quasi-experiment__: Units “choose” whether they receive treatment, but that choice can sometimes be controlled via design and statistical adjustments
      - Assignment is fully controlled by the participants
      - Chief threat to internal validity is selection bias (i.e., endogenous treatment assignment)
---
# Problem Set 1

- On all platforms

- Due September 23

- Turn in rmd and pdf