class: center, middle, inverse, title-slide .title[ # PSY 503: Foundations of Statistical Methods in Psychological Science ] .subtitle[ ## Causality, Potential Outcomes, and Experiments ] .author[ ### Jason Geller, Ph.D.(he/him/his) ] .institute[ ### Princeton University ] .date[ ### 2022-09-12 ] --- # Today .blockquote[ ## Aim - Working definition of causality - Look at a method to formally explain causation - Better understand the role causality in psychology ] --- # Heider and Simmel (1944) - What is going on here? <br> <br> <center> <iframe width="600" height="400" src="https://www.youtube.com/embed/VTNmLt7QX8E" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe> <center> --- # Causal Relationships? -- - Lighting fireworks causes noise -- - Rooster crows cause the sunrise -- - Getting a Ph.D. from Princeton increases your earnings -- - Colds go away a few days after you take vitamin C -- - Hurricanes with female names cause more deaths --- class: middle center How do we know if **X** causes **Y**? -- .box-inv-3.medium[**X** causes **Y** if…] -- .box-inv-3.medium[…we intervene and change **X**<br>without changing anything else…] -- .box-inv-3.medium[…and **Y** changes] --- # **Y** "listens to" **X** <br> <br> > "A variable **X** is a cause of a variable **Y** if **Y** in any way relies on **X** for its value.… **X** is a cause of **Y** if **Y** listens to **X** and decides its value in response to what it hears" <br>-Pearl, Glymour, and Jewell 2016, 5–6 --- class: middle center > __Causation = Correlation + time order + nonspuriousness__ -- --- # Correlation ≠ Causation > Associations: - Describes the world as it happened - No meaningful directionality <img src="nickcage.jpeg" width="75%" style="display: block; margin: auto;" /> --- <br> <br> <br> <img src="hurricane.png" width="75%" style="display: block; margin: auto;" /> --- # Causality in Psychology <br> <br> - Psychology and social sciences: - Generally interested in __identifying__ and __quantifying__ causal relationships - What does this mean? --- # "Causes of Effects" or "Effects of Causes"? -- - Psychology studies generally focus on the __"effects of causes"__ -- - __Effects__ have __many causes__ - Brain chemistry - Hormones - Sensory cues - Prenatal environment - Early experiences - Genes -- **list goes on and on** --- # What do we learn from studies of the effects of causes? - Causal relationships - Manipulating `\(X\)` 💥 `\(Y\)` - Direction of effects - Manipulating `\(X\)` ⬆️/ ⬇️ `\(Y\)` - Manipulating `\(X\)` ⬆️/ ⬇️ the probability of `\(Y\)` - Magnitude of effects --- class: middle center # .blockquote[ ### <svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M144 208c-17.7 0-32 14.3-32 32s14.3 32 32 32 32-14.3 32-32-14.3-32-32-32zm112 0c-17.7 0-32 14.3-32 32s14.3 32 32 32 32-14.3 32-32-14.3-32-32-32zm112 0c-17.7 0-32 14.3-32 32s14.3 32 32 32 32-14.3 32-32-14.3-32-32-32zM256 32C114.6 32 0 125.1 0 240c0 47.6 19.9 91.2 52.9 126.3C38 405.7 7 439.1 6.5 439.5c-6.6 7-8.4 17.2-4.6 26S14.4 480 24 480c61.5 0 110-25.7 139.1-46.3C192 442.8 223.2 448 256 448c141.4 0 256-93.1 256-208S397.4 32 256 32zm0 368c-26.7 0-53.1-4.1-78.4-12.1l-22.7-7.2-19.5 13.8c-14.3 10.1-33.9 21.4-57.5 29 7.3-12.1 14.4-25.7 19.9-40.2l10.6-28.1-20.6-21.8C69.7 314.1 48 282.2 48 240c0-88.2 93.3-160 208-160s208 71.8 208 160-93.3 160-208 160z"></path></svg> ] ## Take a few minutes and think about causality in your research
−
+
05
:
00
--- class: center middle # The Rubin Causal Model: A Powerful Framework to Study Causal Effects --- # Rubin Causal Model - Framework developed by Donald Rubin (Rubin, 1974, 1975) - Mathematical definition of causal effects at the individual level - Establishes the impossibility of measuring causal effects for an individual .footnote[[1].Other Theories: Judea Pearl's **The Book of Why**] --- # What is Causality (Formally)? - Notion of causality is tied to an __action__ applied to a __unit__ -- - In Psychology: - __Unit__: Target of study (__individual__ , __classrooms__) - **i** - __Treatment__(action): Intervention received by one group of units but not the other - **d** or **z** - __Response__: Outcome of study - **Y** --- What is Causality (Formally)? - Let `\(d_i\)` be a treatment (e.g., testing) - Let `\(Y_i\)` be an outcome (e.g., performance on test) <br> <br> > A treatment `\(d_i\)` has a causal effect on an outcome `\(Y_i\)` for individual `\(i\)` (e.g., student) if the action of `\(d_i\)` on individual `\(i\)` impacts `\(Y_i\)` (i.e., the extent to which a student using quizzing affects performance) --- # Core Concept: Potential Outcomes (Counterfactuals) - Each individual has different __potential outcomes__ in alternative environments - To measure the causal effect of a treatment `\(d_i\)` for individual `\(i\)`: - Measure the outcome of interest `\(Y_i\)` for individual `\(i\)` in two environments `\(E_0\)` and `\(E_1\)` that differ on one aspect: `\(d_i\)` -- - Example: - Does the minimum wage increase with the unemployment rate? - Unemployment rate went up after the minimum wage increased - Would the unemployment rate have gone up, had the minimum wage increase not occurred? --- # Potential Outcomes Notation - `\(E_0\)`: `\(d_i = 0\)`, treatment was not applied to individual `\(i\)` - `\(E_1\)`: `\(d_i = 1\)`, treatment was applied to individual `\(i\)` - Imagine we can observe both `\(Y_i(0)\)` and `\(Y_i(1)\)` for the exact same individual `\(i\)` in `\(E_0\)` and `\(E_1\)`, respectively. - For individual `\(i\)`, the causal effect `\(\tau_i\)` of the treatment `\(d_i\)` is defined as the difference between two potential outcomes: \begin{equation} \tau_i = Y_i(1) - Y_i(0) \end{equation} --- # Implications - If `\(\tau_i = 0\)`, using testing has no causal effect on `\(Y_i\)` - If `\(\tau_i \neq 0\)`, testing has a causal effect on `\(Y_i\)` - The _magnitude_ of the causal effect for individual `\(i\)` is `\(\tau_i\)`, such that `$$\tau_i = Y_i(1) - Y_i(0)$$` --- class: middle center # Anyone see a problem here? --- # Fundamental Problem of Causal Inference (Holland, 1986) <img src="potentialoutcomes.JPG" width="40%" height="3%" style="display: block; margin: auto;" /> -- - Prediction problem (do we care about this at individual level?) --- class: center middle main-title section-title-7 # Causal Effects in Populations (ATE) --- # Population <br> <br> - A **population** is a set of units defined a priori by the researcher > The term population refers to the entire group of individuals from a specified group --- # Average Treatment Effect .box-inv-7.medium[Solution: Use averages instead] $$ \text{ATE} = E(Y_1 - Y_0) = E(Y_1) - E(Y_0) $$ > .box-7[Difference between average/expected value when<br>program is on vs. expected value when program is off] $$ \delta = (\bar{Y}\ |\ P = 1) - (\bar{Y}\ |\ P = 0) $$ --- # Hypothetical Scenario: Set Up - Let our _population of interest_ be students in this class (**N** = 8) -- - Let `\(Y_i\)` be performance on test for a student `\(i\)` -- - Let `\(\tau_i\)` be the effect of quizzing or testing for individual `\(i\)` --- # Hypothetical schedule of potential outcomes <table> <thead> <tr> <th style="text-align:center;"> Student </th> <th style="text-align:center;"> `\(Y_i\)`(0) </th> <th style="text-align:center;"> `\(Y_i\)`(1) </th> <th style="text-align:center;"> `\(\tau_i\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 61 </td> <td style="text-align:center;"> 49 </td> <td style="text-align:center;"> -12 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 95 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> -93 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 27 </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> 19 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 13 </td> <td style="text-align:center;"> 91 </td> <td style="text-align:center;"> 78 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 66 </td> <td style="text-align:center;"> 62 </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 11 </td> <td style="text-align:center;"> 53 </td> <td style="text-align:center;"> 42 </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> 80 </td> <td style="text-align:center;"> 48 </td> </tr> </tbody> </table> --- # Average Treatment Effect (ATE) in the population - ATE in this population? - Take the average of the last column .pull-left.small[ `\(\delta = (\bar{Y}\ |\ P = 1) - (\bar{Y}\ |\ P = 0)\)` ] -- ```r avg ``` ``` ## [1] 18.25 ``` - __Conclusion:__ on average in this population, the use of quizzing increases test performance by 18.25 points. --- # ATE: Formal Defintion \begin{equation} \mathrm{ATE} = \frac{1}{N} \sum_{i=1}^{N} \tau_i \end{equation} - Population ATE is defined as the sum of `\(\tau_i\)` divided by N, the number of individuals `\(i\)` the population - Describes how the outcome of interest `\(Y_i\)` would change on average in the population if the treatment was applied to every single individual in the population. - It is an extremely important concept in psychology - Identify and quantify average effects of treatments in populations --- # Problems? <br> <br> - `\(\tau_i\)` is forever unknown - We generally don't have access to the entire population of interest - How do we __estimate__ population average treatment effects? --- class: middle center # Experimental Design --- # Experimental Design - What is an experiment? > __Experiment__: Any study conducted under controlled conditions to measure the impact of a novel treatment or other manipulation > __Randomized experiment__: Chance process determines assignment of units to treatment conditions (e.g., coin flip, random number generator) --- # Random? <br> <br> - Every person (or unit) has some chance (i.e., a non-zero probability) of being selected into the treatment or control group - The selection is based upon a random process (e.g., names out of a hat, a random number generator, rolls of dice, etc.) --- # Experimental Design - Why randomized experiments? - Random assignment ensures that outcomes of the untreated (control) group are an appropriate counterfactual for the treated (experimental) group - Randomization ensures groups are identical (on average) and thus there is no selection bias in the estimate of the ATE - Balances all observed variables as well as unobserved variables --- # Experimental Design - Helps us: - Identify the presence of causal effects - Does a causal effect exist at all? - Statistical significance - Estimate the magnitude of causal effects - What is the direction of the causal effect? - Practical significance - Is the causal effect relevant? --- # Random Porcesses in R - Sampling without replacement ```r # set.seed? #sample(x, size, replace = FALSE, prob = NULL) sample(1:10, 5) ``` ``` ## [1] 10 6 7 1 4 ``` - Sampling with Replacement ```r sample(1:10, 5, replace = TRUE) ``` ``` ## [1] 8 10 8 7 2 ``` --- # Randomization Magic - Simulation in R - Randomly assigns individuals from a population of size `\(N = 500\)` to one of two groups - Repeat this many many times: 100,000 times - Compare the average characteristics of the individuals that are assigned to each group **Open R Studio** --- # Randomization Magic <br> <br> ```r library(tidyverse) # Simulate age of 500 participants (range: 18-99) # store in an object called: age_vector age_vector <- sample(18:99, size = 500, replace = TRUE) head(age_vector) ``` ``` ## [1] 77 44 37 97 81 37 ``` --- ```r # Randomly assign each individual to an experimental group # store in an object called: random_assign # experimental groups are called: "treatment" vs. "control" condition <- c("control", "treatment") random_assign <- sample(condition, size = 500, replace = TRUE) head(random_assign) ``` ``` ## [1] "treatment" "treatment" "control" "treatment" "treatment" "treatment" ``` --- ```r #Put these two vectors into a dataset (tibble) # call this dataset: assignment_tibble assignment_tibble <- tibble(age_vector, random_assign) # Calculate mean for each experimental group assignment_tibble %>% group_by(random_assign) %>% summarise(mean = mean(age_vector)) ``` ``` ## # A tibble: 2 × 2 ## random_assign mean ## <chr> <dbl> ## 1 control 57.2 ## 2 treatment 59.6 ``` --- ```r ################################################################# ### simulate characteristics of group for 100,000 experiments ### ################################################################# control_container <- rep(NA, 100000) # create container in which we will store each of the 100,000 average age of treatment treatment_container <- rep(NA, 100000) # write for loop for (i in 1:100000){ age_vector <- sample(18:99, size = 500, replace = TRUE) random_assign <- sample(c("control", "treatment"), size = 500, replace = TRUE) control_container[i] <- mean(age_vector[random_assign == "control"]) treatment_container[i] <- mean(age_vector[random_assign == "treatment"]) } mean(control_container) ``` ``` ## [1] 58.49775 ``` ```r mean(treatment_container) ``` ``` ## [1] 58.49937 ``` --- # Sample Practice ```r # imagine you are writing paper and need to decide the author order of 5 authors. How would you do this? #Find 10 random numbers between 0 and 100 # Draw 5 random letters from uppercase alphabets ``` --- # Randomization Magic ```r ed_data <- read_csv("https://docs.google.com/spreadsheets/d/e/2PACX-1vTQ9AvbzZ2DBIRmh5h_NJLpC_b4u8-bwTeeMxwSbGX22eBkKDt7JWMqnuBpAVad6-OXteFcjBY4dGqf/pub?gid=300215043&single=true&output=csv") glimpse(ed_data) ``` ``` ## Rows: 335 ## Columns: 10 ## $ ID <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17… ## $ FEMALE <dbl> 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, … ## $ MINORITY <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, … ## $ MOM_ED <chr> "Some college", "Vocational/technical program", "Some col… ## $ DAD_ED <chr> "Vocational/technical program", "Some college", "Bachelor… ## $ SES_CONT <dbl> -0.27, -0.03, 0.48, -0.03, -0.66, 1.53, 0.20, 0.07, -0.32… ## $ READ_pre <dbl> 27.40, 32.48, 48.25, 43.86, 36.12, 95.84, 33.81, 33.08, 3… ## $ MATH_pre <dbl> 18.68, 30.58, 31.57, 31.41, 24.24, 49.75, 27.10, 27.35, 2… ## $ Trt_rand <dbl> 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, … ## $ Trt_non_rand <dbl> 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0, … ``` --- ```r ed_data %>% group_by(Trt_rand) %>% summarise_if(is.numeric, mean) %>% select(-c(ID, Trt_non_rand)) ``` ``` ## # A tibble: 2 × 6 ## Trt_rand FEMALE MINORITY SES_CONT READ_pre MATH_pre ## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ## 1 0 0.542 0.327 0.296 46.8 39.0 ## 2 1 0.569 0.293 0.320 48.2 39.9 ``` ```r # Experimental set up ``` --- # Example - Let our population of interest be all students in this class (**N** = 8) - Let's imagine that we can run an experiment on the entire population of interest - __Random assignment:__ - Control (restudy) vs. Treatment condition (testing) - Let `\(z_i\)` indicate assignment of student `\(i\)` to an experimental condition - `\(z_i = 0\)` if student `\(i\)` was assigned to the control condition - `\(z_i = 1\)` if student `\(i\)` was assigned to the treatment condition - Assume _two sided compliance_: `\(d_i = z_i\)` --- # Hypothetical Experimental Dataset <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:center;"> Student </th> <th style="text-align:center;"> `\(Z_i\)` </th> <th style="text-align:center;"> `\(Y_i\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 32 </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 49 </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 2 </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 12 </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 46 </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 91 </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 66 </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 53 </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> 0 </td> <td style="text-align:center;"> 80 </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 57 </td> </tr> </tbody> </table> --- # Hypothetical Experimental Dataset with Potential Outcomes <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:center;"> Student </th> <th style="text-align:center;"> `\(Y_i\)`(1) </th> <th style="text-align:center;"> `\(Y_i\)`(0) </th> <th style="text-align:center;"> `\(\tau_i\)` </th> </tr> </thead> <tbody> <tr> <td style="text-align:center;"> 1 </td> <td style="text-align:center;"> 32 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> 49 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 3 </td> <td style="text-align:center;"> 2 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 4 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> 12 </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 5 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> 46 </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 6 </td> <td style="text-align:center;"> 91 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 7 </td> <td style="text-align:center;"> 66 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 8 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> 53 </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 9 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> 80 </td> <td style="text-align:center;"> ? </td> </tr> <tr> <td style="text-align:center;"> 10 </td> <td style="text-align:center;"> 57 </td> <td style="text-align:center;"> NA </td> <td style="text-align:center;"> ? </td> </tr> </tbody> </table> --- # Potential and Observed Outcomes : Switching Equation - Causal inference is a missing data problem! - Observed outcome `\(Y_i\)` -> Underlying potential outcomes \begin{equation} Y_i = Y_i(1)z_i + Y_i(0)(1-z_i) \end{equation} - Treatment Applied: `$$Y_i = 1 * Y^1_i + 0 * Y^0_i$$` `$$Y_i = Y^1_i$$` - Treatment Not Applied: `$$Y_i = 0 * Y^1_i + 1 * Y_i^0$$` `$$Y_i = Y^0_i$$` ??? The switching equation works a lot like Schrodinger’s cat paradox. Schrodinger’s cat is placed in a sealed box and receives a dose of poison when an atom emits a radiation. As long as the box is sealed, there is no way we can know whether the cat is dead or alive. When we open the box, we observe either a dead cat or a living cat, but we cannot observe the cat both alive and dead at the same time. The switching equation is like opening the box, it collapses the observed outcome into one of the two potential ones. --- # Urn Example