class: center, middle, inverse, title-slide .title[ # PSY 503: Foundations of Statistical Methods in Psychological Science ] .subtitle[ ## More LM: Moderation/Interactions ] .author[ ### Jason Geller, Ph.D. (he/him/his) ] .institute[ ### Princeton University ] .date[ ### Updated:2022-11-27 ] --- # Outline - Testing interactions/moderation analysis - Categorical by Continuous - Continuous by Continuous - Categorical by Categorical (Monday) --- # Interactions ```r lm(plant_growth ~ sun_exposure + water) ``` --- # Interactions ```r lm(plant_growth ~ sun_exposure + water) ``` <img src="sun-water.bmp" width="100%" style="display: block; margin: auto;" /> --- # Interactions ```r lm(plant_growth ~ sun_exposure * water) ``` <img src="plant-sun.png" width="100%" style="display: block; margin: auto;" /> --- # Interactions ```r lm(plant_growth ~ sun_exposure * water) ``` <img src="sun-inter.png" width="100%" style="display: block; margin: auto;" /> --- # What is a moderator? <img src="moderation1.png" width="100%" style="display: block; margin: auto;" /> `$$Y=\beta_{0}+\beta_{1}*X+\epsilon$$` --- # What is a moderator? - A moderator variable Z is a variable that alters the strength of the relationship between X and Y <img src="moderation2.png" width="90%" style="display: block; margin: auto;" /> --- # What Do Interactions Look Like? <img src="types-of-interaction-flat.png" width="90%" style="display: block; margin: auto;" /> --- class: middle # Categorical x Continuous Interactions --- # Today's Dataset - Student evaluations for a sample of 463 courses taught by 94 professors from the University of Texas at Austin - Six students rated the professors' physical appearance ```r evals_agegend=read_csv("https://raw.githubusercontent.com/jgeller112/psy503-psych_stats/master/evals.csv") evals1= evals_agegend %>% dplyr::select(ID, score, age, gender) ``` ``` ## ID prof_ID score age ## Min. : 1.0 Min. : 1.00 Min. :2.300 Min. :29.00 ## 1st Qu.:116.5 1st Qu.:20.00 1st Qu.:3.800 1st Qu.:42.00 ## Median :232.0 Median :43.00 Median :4.300 Median :48.00 ## Mean :232.0 Mean :45.15 Mean :4.175 Mean :48.37 ## 3rd Qu.:347.5 3rd Qu.:70.50 3rd Qu.:4.600 3rd Qu.:57.00 ## Max. :463.0 Max. :94.00 Max. :5.000 Max. :73.00 ## bty_avg gender ethnicity language ## Min. :1.667 female:195 minority : 64 english :435 ## 1st Qu.:3.167 male :268 not minority:399 non-english: 28 ## Median :4.333 ## Mean :4.418 ## 3rd Qu.:5.500 ## Max. :8.167 ## rank pic_outfit pic_color cls_did_eval ## teaching :102 formal : 77 black&white: 78 Min. : 5.00 ## tenure track:108 not formal:386 color :385 1st Qu.: 15.00 ## tenured :253 Median : 23.00 ## Mean : 36.62 ## 3rd Qu.: 40.00 ## Max. :380.00 ## cls_students cls_level ## Min. : 8.00 lower:157 ## 1st Qu.: 19.00 upper:306 ## Median : 29.00 ## Mean : 55.18 ## 3rd Qu.: 60.00 ## Max. :581.00 ``` --- # Research Question - Does Age and Sex (Males, Females) of the instructor influence instructor ratings? - DV: Evals - IV: - Age - Gender - Age*Gender Interaction --- # Scatterplot ```r ggplot(evals1, aes(x = age, y = score, color = gender)) + geom_point() + labs(x = "Age", y = "Teaching Score", color = "Gender") + geom_smooth(method = "lm", se = FALSE) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> --- # How to Conduct Moderation Analysis? - Moderation analysis can be conducted by adding one or multiple interaction terms in a regression analysis Z is a moderator for the relation between X and Y, we can fit a regression model `$$\begin{eqnarray*} Y & = & \beta_{0}+\beta_{1}*X+\beta_{2}*Z+\beta_{3}*X*Z+\epsilon\\ & = & \begin{cases} \beta_{0}+\beta_{1}*X+\epsilon & \mbox{For females}(Z=0)\\ \beta_{0}+\beta_{2}+(\beta_{1}+\beta_{3})*X+\epsilon & \mbox{For males}(Z=1) \end{cases} \end{eqnarray*}$$` - When Z=0 (females),the effect of X on Y is β1+β3∗0=β1 - When Z=1 (males), the effect of X on Y is β1+β3∗1 --- # Steps for Moderation Analysis A moderation analysis typically consists of the following steps: 1. Compute the interaction term XZ=X*Z 2. Fit a multiple regression model with X, Z, and XZ as predictors 3. Test whether the regression coefficient for XZ (interaction) is significant 4. If so, interpret the moderation effect (ignore main effects) 5. Display the moderation effect graphically --- # Steps for moderation analysis - Compute the interaction term XZ=X*Z - Center continuous variables - Centering solves two problems: - Interpretation - Multicollinearity ```r evals_interact <- evals1 %>% mutate(age_c=datawizard::center(age),gender_trt=ifelse(gender=="female", 0, 1), inter=age_c*gender_trt) ``` --- # Steps for moderation analysis - Fit a multiple regression model with X, Z, and XZ as predictors ```r lm(evals_interact$score~age_c*gender_trt, data=evals_interact) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
4.04
0.0408
99
9.18e-312
age_c
-0.0175
0.00447
-3.92
0.000103
gender_trt
0.208
0.0527
3.95
8.88e-05
age_c:gender_trt
0.0135
0.00553
2.45
0.0148
--- # Steps for moderation analysis - Test whether the regression coefficient for XZ is significant ```r lm(score~age_c*gender_trt, data=evals_interact) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
4.04
0.0408
99
9.18e-312
age_c
-0.0175
0.00447
-3.92
0.000103
gender_trt
0.208
0.0527
3.95
8.88e-05
age_c:gender_trt
0.0135
0.00553
2.45
0.0148
--- # Interpretation `$$\hat{Y}= b_0 + b_1 X + b_2 Z + b_3 X*Z$$` - `\(b_0\)`: the intercept, or the predicted outcome when X = 0 and Z=0 - `\(b_1\)`: the simple effect or slope of `\(X\)`, for a one unit change in `\(X\)` the predicted change in `\(Y\)` at `\(Z = 0\)` - `\(b_2\)`: The offset/difference in the intercept for a one unit change in `\(Z\)` the predicted change in Y at X = 0 - `\(b_3\)`: The interaction of `\(X\)` and `\(Z\)`, the offset in slope for `\(Z\)` for a one-unit increase in `\(X\)` (or vice versa) --- # Interpretation ```r lm(score~age_c*gender_trt, data=evals_interact) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
4.04
0.0408
99
9.18e-312
age_c
-0.0175
0.00447
-3.92
0.000103
gender_trt
0.208
0.0527
3.95
8.88e-05
age_c:gender_trt
0.0135
0.00553
2.45
0.0148
`\(b_0\)` = `\(b_{age}\)` = `\(b_{gender}\)` = `\(b_{age}*{male}\)` = ??? Average Score for females is at average age age_c: slope of age for z = 0 (females) b0 + b1 + b2(0) + b3 (0) gender M: slope offset for difference between males and females at average age age_c*gender: the offset slope males --- # Moderation: Simple Slopes - If the interaction is significant, then you usually ignore the other individual effects (age and gender) - So what do I do if my interaction is significant? **A simple slope analysis** --- # Main vs. Simple Effects (slopes) - Main Effects - Coefficients that do no involve interaction terms - Comparison of marginal means .pull-left[ <img src="marginalMean9.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ `$$\hat{Y}= b_0 + b_1 X + b_2 Z$$` -$b_0$: The intercept, or the predicted outcome when X and Z are 0 -$b_1$: The slope (or main effect) of X ; for a one unit change in the predicted change in Y -$b_2$: The slope (or main effect) of Y ; for a one unit change in the predicted change in Y ] --- # Main vs. Simple effects (slopes) - Simple Effects - Comparison of cell means `$$\hat{Y}= b_0 + b_1 X + b_2 Z + b_3 X*Z$$` - `\(b_0\)`: the intercept, or the predicted outcome when X = 0 and Z=0 - `\(b_1\)`: the simple effect or slope of `\(X\)`, for a one unit change in `\(X\)` the predicted change in `\(Y\)` at `\(Z = 0\)` - `\(b_2\)`:The simple effect or slope of `\(Z\)`, for a one unit change in `\(Z\)` the predicted change in Y at X = 0 - `\(b_3\)`:The interaction of `\(X\)` and `\(Z\)`, the change in the slope of `\(X\)` for a one-unit increase in `\(Z\)` (or vice versa) --- # Steps for Moderation Analysis - Obtain simple slopes - When a continuous independent variable interact with a moderating variable, its slope at a particular level of the moderating variable - Test if slope `\(\neq\)` 0 ```r #hello to our friend emmeans #library(emmeans) d=lm(score~age_c*gender_trt, data=evals_interact) emtrends(d, ~ gender_trt, var="age_c") #simple slopes ``` ``` ## gender_trt age_c.trend SE df lower.CL upper.CL ## 0 -0.01752 0.00447 459 -0.0263 -0.00874 ## 1 -0.00399 0.00325 459 -0.0104 0.00240 ## ## Confidence level used: 0.95 ``` --- # Simple Slopes ```r sim_slopes(d, pred=age_c, modx=gender_trt) ``` ``` ## JOHNSON-NEYMAN INTERVAL ## ## When gender_trt is OUTSIDE the interval [0.87, 4.05], the slope of age_c is ## p < .05. ## ## Note: The range of observed values of gender_trt is [0.00, 1.00] ## ## SIMPLE SLOPES ANALYSIS ## ## Slope of age_c when gender_trt = 0.00 (0): ## ## Est. S.E. t val. p ## ------- ------ -------- ------ ## -0.02 0.00 -3.92 0.00 ## ## Slope of age_c when gender_trt = 1.00 (1): ## ## Est. S.E. t val. p ## ------- ------ -------- ------ ## -0.00 0.00 -1.23 0.22 ``` --- # Steps for Moderation Analysis - Difference in slopes - *Testing simple slopes is not the same thing as testing their difference* ```r d=lm(score~age_c*gender_trt, data=evals_interact) emtrends(d, pairwise ~ gender_trt, var="age_c") ``` ``` ## $emtrends ## gender_trt age_c.trend SE df lower.CL upper.CL ## 0 -0.01752 0.00447 459 -0.0263 -0.00874 ## 1 -0.00399 0.00325 459 -0.0104 0.00240 ## ## Confidence level used: 0.95 ## ## $contrasts ## contrast estimate SE df t.ratio p.value ## gender_trt0 - gender_trt1 -0.0135 0.00553 459 -2.446 0.0148 ``` --- # Interactions - You should only be following up interactions if significant! <img src="gelman.png" width="80%" style="display: block; margin: auto;" /> --- # Visualize ```r ggplot(evals1, aes(x = age, y = score, color = gender)) + geom_point() + labs(x = "Age", y = "Teaching Score", color = "Gender") + geom_smooth(method = "lm", se = FALSE) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-24-1.png" width="90%" style="display: block; margin: auto;" /> --- # Parallel Slopes - Parallel slopes models still allow for different intercepts but force all lines to have the same slope. ```r ggplot(evals1, aes(x = age, y = score, color = gender)) + geom_point() + labs(x = "Age", y = "Teaching Score", color = "Gender") + geom_parallel_slopes(se = FALSE) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-25-1.png" width="80%" style="display: block; margin: auto;" /> --- # Parallel Slopes ```r main<-lm(score~age_c + gender_trt, data=evals_interact) inter<- lm(score~age_c*gender_trt, data=evals_interact) anova(main, inter) ```
Res.Df
RSS
Df
Sum of Sq
F
Pr(>F)
460
131
459
130
1
1.69
5.98
0.0148
--- # Parallel Slopes <br> <br> .pull-left[ <img src="More_LM_interactions_files/figure-html/unnamed-chunk-27-1.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="More_LM_interactions_files/figure-html/unnamed-chunk-28-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Practice In-Class Activity - Pick a new categorical and continuous variable to examine if they interact with teaching evals - Go through the steps outlined in the lecture - Be ready to talk as a group about what you ran and found ```r evals_agegend=read_csv("https://raw.githubusercontent.com/jgeller112/psy503-psych_stats/master/evals.csv") ``` --- # Today (11/14) - Weekly Q & A - More interactions! --- # Weekly Qs - Main Effects and Simple Effects - Main Effect: The effect of one factor on the dependent variable - Interaction: The effect of one factor depends on levels of an additional variable - Simple effects/simple slopes - Comparison of cell means - Follows up any statistically significant interaction - Helps break down interaction to tell us where the interaction is happening --- ```r d=lm(score~age_c*gender_trt, data=evals_interact) d ``` ``` ## ## Call: ## lm(formula = score ~ age_c * gender_trt, data = evals_interact) ## ## Coefficients: ## (Intercept) age_c gender_trt age_c:gender_trt ## 4.03547 -0.01752 0.20836 0.01353 ``` ```r emtrends(d, pairwise ~ gender_trt, var="age_c") ``` ``` ## $emtrends ## gender_trt age_c.trend SE df lower.CL upper.CL ## 0 -0.01752 0.00447 459 -0.0263 -0.00874 ## 1 -0.00399 0.00325 459 -0.0104 0.00240 ## ## Confidence level used: 0.95 ## ## $contrasts ## contrast estimate SE df t.ratio p.value ## gender_trt0 - gender_trt1 -0.0135 0.00553 459 -2.446 0.0148 ``` --- # Weekly Qs - Effect Sizes - *d* for the difference (t-test) - For specific effects `\(\eta_p^2\)` ```r library(effectsize) inter<- lm(score~age_c*gender_trt, data=evals_interact) effectsize::eta_squared(inter) # partial eta squared ```
Parameter
Eta2_partial
CI
CI_low
CI_high
age_c
0.0119
0.95
0.00104
1
gender_trt
0.0282
0.95
0.00858
1
age_c:gender_trt
0.0129
0.95
0.00135
1
--- # Multicollinearity ```r lm(`Life Exp`~Murder*Income, data=data_c ) %>% check_collinearity() ```
Term
VIF
VIF_CI_low
VIF_CI_high
SE_factor
Tolerance
Tolerance_CI_low
Tolerance_CI_high
Murder
64.4
39.9
104
8.03
0.0155
0.00958
0.025
Income
9.03
5.8
14.4
3.01
0.111
0.0693
0.172
Murder:Income
62
38.4
100
7.88
0.0161
0.00995
0.026
--- ```r c %>% check_collinearity() ```
Term
VIF
VIF_CI_low
VIF_CI_high
SE_factor
Tolerance
Tolerance_CI_low
Tolerance_CI_high
murder_c
1.06
1
6.4
1.03
0.943
0.156
0.999
income_c
1.42
1.14
2.21
1.19
0.706
0.452
0.874
murder_c:income_c
1.38
1.13
2.17
1.18
0.722
0.46
0.888
--- class: middle # When in doubt, center! --- # Weekly Qs - Standardizing Data vs. Standardizing Coefs ```r z <- lm(`Life Exp`~scale(Murder)*scale(Income), data=data_c) %>% tidy() z ```
term
estimate
std.error
statistic
p.value
(Intercept)
70.9
0.121
586
9.62e-91
scale(Murder)
-1
0.122
-8.22
1.39e-10
scale(Income)
0.287
0.141
2.03
0.0478
scale(Murder):scale(Income)
-0.109
0.132
-0.829
0.412
--- # Weekly Qs - Standardizing Data vs. Standardizing Coefs ```r z_coef<-lm(`Life Exp`~murder_c*income_c, data=data_c) %>% standardize_parameters( include_response=FALSE) # by default standardizes response/outcome z_coef ```
Parameter
Std_Coefficient
CI
CI_low
CI_high
(Intercept)
70.9
0.95
70.6
71.1
murder_c
-1
0.95
-1.25
-0.757
income_c
0.287
0.95
0.00294
0.57
murder_c:income_c
-0.109
0.95
-0.374
0.156
--- # Weekly Qs - Parallel Slopes - Way to see if interaction term is warranted ```r main<-lm(score~age_c + gender_trt, data=evals_interact) inter<- lm(score~age_c*gender_trt, data=evals_interact) anova(main, inter) ```
Res.Df
RSS
Df
Sum of Sq
F
Pr(>F)
460
131
459
130
1
1.69
5.98
0.0148
--- # Weekly Qs - Power Transformations .pull-left[ - Box-Cox - Is there a transformation that will normalize my data? - What is the optimal value of the transformation parameter? <img src="power.PNG" width="50%" style="display: block; margin: auto;" /> ] .pull-right[ ```r library(MASS) #run test boxcox(lm(x ~ 1)) # Exact lambda lambda <- b$x[which.max(b$y)] lambda ``` ] --- # Weekly Q - Checking heterogeneity assumption - Performing this test does not do anything to alpha - Differing degrees of heterogeneity can lead to Type 1 error --- class: middle # Continuous x Continuous Interactions --- # Continuous x Continuous Interactions - Do violent video games make people aggressive? - DV: Aggression - IV: - Callous unemotional traits - Number of hours spent playing video games per week - Callous unemotional traits*Number of hours spent playing video games per week - If callous-unemotional traits were a moderator then we're saying that the strength or direction of the relationship between game playing and aggression depends on the strength of callous-unemotional traits ```r # grab dataset from link moderation_vio=read_csv("https://raw.githubusercontent.com/jgeller112/psy503-psych_stats/master/moderation.csv") ``` --- # Continuous x Continuous Interactions - Centering - Can reduce multicollinearity - This is because if `\(X*Z\)` creates a line, it means you have added a new predictor (XZ) that strongly correlates with X and Z ```r library(datawizard) # centering vars moderation_vio <- moderation_vio %>% mutate(vid_games_c=center(Vid_Games), caunts_c=center(CaUnTs)) ``` --- # How does this look? <img src="interexamp.png" width="100%" style="display: block; margin: auto;" /> --- # Continuous x Continuous Regression ```r lm(Aggression~ vid_games_c*caunts_c, data=moderation_vio) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
40
0.475
84.1
1.72e-272
vid_games_c
0.17
0.0685
2.48
0.0136
caunts_c
0.76
0.0495
15.4
6.12e-43
vid_games_c:caunts_c
0.0271
0.00698
3.88
0.000122
??? (Intercept): the intercept, or the predicted outcome when hours = 0 and traits = 0. hours: the slope of Hours, for a one unit change in Hours, the predicted change in weight loss at Effort=0. triat: the slope of trait, for a one unit change in vio the predicted change in trait at Hours=0. hours:effort: the interaction of Hours and Effort, the change in the slope of HRS for every one unit increase in CALLOUSNESS (or vice versa). --- # Interpretation Continuous x Continuous Interactions - `\(b_0\)`: the intercept, or the predicted outcome when X = 0 and Z=0 - `\(b_1\)`: the simple effect or slope of `\(X\)`, for a one unit change in `\(X\)` the predicted change in `\(Y\)` at `\(Z = 0\)` - `\(b_2\)`: The simple effect or slope of `\(Z\)`, for a one unit change in `\(Z\)` the predicted change in Y at X = 0 - `\(b_3\)`:The interaction of `\(X\)` and `\(Z\)`, the change in the slope of `\(X\)` for a one-unit increase in `\(Z\)` (or vice versa)
term
estimate
std.error
statistic
p.value
(Intercept)
40
0.475
84.1
1.72e-272
vid_games_c
0.17
0.0685
2.48
0.0136
caunts_c
0.76
0.0495
15.4
6.12e-43
vid_games_c:caunts_c
0.0271
0.00698
3.88
0.000122
--- # Continuous X Continuous Interactions - If the Z (moderator variable) was categorical, you would be checking if separate groups (levels) have different slopes for the non-categorical variable - However, we cant do that with continuous x continuous interactions --- # Decomposing Continuous X Continuous Interactions: Spotlight Analysis - For continuous moderator variables, you "create" low, average, and high groups - Low groups are people who are one SD below the mean - Average groups are people are at the mean - High groups are people who are one SD above the mean --- # Moderation: Simple Slopes - We are examining the interaction between hours of video games and unemotional traits to predict aggression - Think about which variable you want to know the differences in (i.e., low, average, high) - So at different levels of callousness, we want to examine the relationship between hours of video game play and aggression --- # Probing Interactions: Spotlight Analysis - Low/below mean created by *SUBTRACTING* 1 SD - High/above mean created by *ADDING* 1 SD - The rule is that we have to bring them to the middle because we centered so that zero is the middle ```r #create the low and high z score variables a <- mean(moderation_vio$caunts_c) + sd(moderation_vio$caunts_c) at <- mean(moderation_vio$caunts_c) b <- mean(moderation_vio$caunts_c) - sd(moderation_vio$caunts_c) ``` --- # Spotlight Analysis ```r # create a list for values at a, b, and mean and round them mylist <- list(caunts_c=c(round(b, 1), round(at,1), round(a, 1))) # run lm again d=lm(Aggression~vid_games_c*caunts_c,data=moderation_vio) # get simple slopes at each level at b a emtrends(d,~caunts_c, var="vid_games_c", at=mylist) ``` ``` ## caunts_c vid_games_c.trend SE df lower.CL upper.CL ## -9.6 -0.0902 0.0990 438 -0.285 0.104 ## 0.0 0.1696 0.0685 438 0.035 0.304 ## 9.6 0.4294 0.0925 438 0.248 0.611 ## ## Confidence level used: 0.95 ``` -- - At high levels of callousness, the strength of hours of video games predicting aggression is the strongest, b = 0.43 ... --- # Graphing Continuous x Continuous Interactions ```r moderation_vio$caunts_clow <- moderation_vio$caunts_c + sd(moderation_vio$caunts_c) #bring them up moderation_vio$caunts_chigh <- moderation_vio$caunts_c - sd(moderation_vio$caunts_c) #bring them down modmodellow <- lm(Aggression ~ vid_games_c*caunts_clow, data = moderation_vio) modmodelhigh <- lm(Aggression ~ vid_games_c*caunts_chigh, data = moderation_vio) ``` ```r library(ggplot2) cleanup <- theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(), panel.background = element_blank(), axis.line.x = element_line(color = "black"), axis.line.y = element_line(color = "black"), legend.key = element_rect(fill = "white"), text = element_text(size = 15)) modgraph <- ggplot(moderation_vio, aes(vid_games_c, Aggression)) ##change Cal to the new moderator label ##change xlab for the new X label modgraph + xlab("Centered Video Games") + geom_point(color = "gray") + geom_abline(aes(intercept = modmodellow$coefficients[1], slope = modmodellow$coefficients[2], linetype = "-1SD Cal"), size = 1) + geom_abline(aes(intercept = d$coefficients[1], slope = d$coefficients[2], linetype = "Average Cal"), size = 1) + geom_abline(aes(intercept = modmodelhigh$coefficients[1], slope = modmodelhigh$coefficients[2], linetype = "+1SD Cal"), size = 1) + scale_linetype_manual(values = c("dotted", "dashed", "solid"), breaks = c("-1SD Cal", "Average Cal", "+1SD Cal"), name = "Simple Slope") + cleanup ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-48-1.png" width="100%" style="display: block; margin: auto;" /> --- # Graphing Continuous x Continuous Interactions <img src="More_LM_interactions_files/figure-html/unnamed-chunk-49-1.png" width="100%" style="display: block; margin: auto;" /> --- # Probing Interactions: Simplier Way - Use `interact_plot` from the `interactions` ```r library(interactions) interact_plot(d, pred = vid_games_c, modx = caunts_c, interval = TRUE, plot.points = TRUE) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-50-1.png" width="80%" style="display: block; margin: auto;" /> --- # Probing Interactions: Johnson-Neyman Plot .pull-left[ - Is a floodlight analysis on the whole range of the moderator - Provides an interval (2 points) where the slope of a predictor is not statistically significant across different values of the mediator ```r library(interactions) johnson_neyman(model = d, pred = vid_games_c, modx = caunts_c, control.fdr = TRUE) # important bc otherwise does not correct for multiple comparisons ``` ] <br> <br> .pull-right[ <img src="JNplot.png" width="100%" style="display: block; margin: auto;" /> ] --- # Moderation: MeMoBootR - We can use the `MeMoBootR` to complete the entire processing, including data screening for us! - You would enter the raw variables, as the centering is completed for you ```r #devtools::install_github("doomlab/MeMoBootR") library(MeMoBootR) mod_model <- moderation1(y = "Aggression", x = "Vid_Games", m = "CaUnTs", df = moderation_vio) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-53-1.png" width="50%" /><img src="More_LM_interactions_files/figure-html/unnamed-chunk-53-2.png" width="50%" /><img src="More_LM_interactions_files/figure-html/unnamed-chunk-53-3.png" width="50%" /> --- # Moderation: MeMoBootR ```r #data screenin #mod_model$datascreening$fulldata #models #summary(mod_model$model1) #mod_model$interpretation #graphs #mod_model$graphslopes ``` --- # In-Class Activity > A simulated data set containing information on ten thousand customers. The aim here is to predict which customers will default on their credit card debt. ```r library(ISLR) data=ISLR::Credit ``` --- # In-Class Activity - Have a look at variables (?ISLR::Credit) - Pick two continuous variables to model - Plot their interaction and test for significance using both the spotlight and floodlight analyses --- class: middle # Categorical x Categorical Interaction --- # 2 x 2 Between Factorial Dataset - LaPaglia, Miller, and Protexter (2022) - Looked at the impact of instructor fluency and gender on test performance (Quiz) - *N* = 72 (49 females, 23 males) - 2 (Fluency: fluent, disfluent) x 2 (Gender: male, female) design ```r gen<- read_csv("https://raw.githubusercontent.com/jgeller112/psy503-psych_stats/master/static/slides/13_Interactions/in_gen_2x2.csv") gen <- gen %>% mutate(Gender=as.factor(Gender), Fluency=as.factor(Fluency)) ``` --- # Categorical x Categorical Interaction - Factorial design (ANOVA) - Commonly used to refer to experiments where more than one factor is manipulated - 2-way (most common), 3-way factorial designs, 4-way... --- # 2-way (Factorial) ANOVA - In the example above we have two factors: - Factor A (e.g., Gender) with 2 levels (e.g., male vs. female) - Factor B (e.g., Fluency) with 2 levels (e.g., Disfluency vs. Fluency) - Fully crossed design - Every level of factor A is tested with every level of factor B - Total # groups (cells) is a x b - We will see how to formulate in terms of model comparisons: - Main effect of A - Main effect of B - Interaction effect A x B --- # Main Effects and Interactions .pull-left[ <img src="8diff.png" width="100%" style="display: block; margin: auto;" /> ] .pull-right[ <img src="mp4.png" width="100%" style="display: block; margin: auto;" /> ] --- # Model Comparisons - Same approach as before `$$F = \frac{SS_{R}-SS_{F}/{df_{R}-df_{F}} (p-1)}{SS_{F}/df_F(N-p)} = \frac{MS_{model}}{MS_{error}}$$` 1. Write the equation for the full and restricted models 2. Derive the equations for model error restricted and full 3. Derive the expressions for degrees of freedom df restricted and df full 4. End up with an equation for the F ratio --- # Full Model `$$Y_{ijk} = \mu + \alpha_j + \beta_k + (\alpha* \beta)_{ijk}$$` - `\(Y_{ijk}\)`: is an individual score in the jth level of factor A and the kth level of factor B (i indexes subjects within each (j,k) cell) - `\(µ\)`: is the overall mean of all cells - `\(α_j\)`: is the effect of the jth level of factor A - `\(β_k\)`: is the effect of the kth level of factor B - `\((α · β)_{jk}\)`: is the interaction effect of level j of A and level k of B --- # Modeling Approach to Hypothesis Testing - Two-Factor (A x B) design: 3 null hypotheses to be tested: - Main effect of A - Main effect of B - Interaction effect of A x B - We will formulate a separate restricted model for each hypothesis test `$$F = \frac{SS_{R}-SS_{F}/{df_{R}-df_{F}} (p-1)}{SS_{F}/df_F(N-p)} = \frac{MS_{model}}{MS_{error}}$$` --- # Linear Modeling Approach: Treatment/Dummy Coding
Gender
Fluency
D1
D2
Interaction
Female=0
Disfluent=0
0
0
0
Female=0
Fluent=1
0
1
0
Male=1
Disfluent=0
1
0
0
Male=1
Fluent=1
1
1
1
`$$\hat{Y}= b_0 + b_1{0} + b_2 {0} + b_3{0}$$` `$$\bar Y_{Female,Disfluent} = b_0$$` ---
Gender
Language
D1
D2
Interaction
Female=0
Disfluent=0
0
0
0
Female=0
Fluent=1
0
1
0
Male=1
Disfluent=0
1
0
0
Male=1
Fluent=1
1
1
1
`$$\hat{Y}= b_0 + b_1{0} + b_2 {1} + b_3{0}$$` `$$\bar Y_{Female,Fluent}= b_0 + b_2$$` ---
Gender
Language
D1
D2
Interaction
Female=0
Disfluent=0
0
0
0
Female=0
Fluent=1
0
1
0
Male=1
Disfluent=0
1
0
0
Male=1
Fluent=1
1
1
1
`$$\hat{Y}= b_0 + b_1{1} + b_2 {0} + b_3{0}$$` `$$\bar Y_{Male,Disfluent} = b_0 + b_1 {1}$$` ---
Gender
Language
D1
D2
Interaction
Female=0
Disfluent=0
0
0
0
Female=0
Fluent=1
0
1
0
Male=1
Disfluent=0
1
0
0
Male=1
Fluent=1
1
1
1
`$$\hat{Y}= b_0 + b_1{1} + b_2 {1} + b_3{1}$$` $$\bar Y_{Male,Fluent} = b_0+b_1{1} + b_2{1} + b_3{1} $$ --- # Fitting the model - Fit this as a linear model ```r lm(Quiz~Gender*Fluency, data=gen) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
7.38
0.552
13.4
1.86e-17
GenderM
-0.135
0.797
-0.169
0.867
FluencyFluent
-0.385
0.797
-0.482
0.632
GenderM:FluencyFluent
-0.481
1.13
-0.426
0.672
--- # Marginal Means: Using `Emmeans` ```r lm(Quiz~Gender*Fluency, data=gen) %>% emmeans::emmeans(specs=~Gender|Fluency) %>% as.data.frame() ```
Gender
Fluency
emmean
SE
df
lower.CL
upper.CL
F
Disfluent
7.38
0.552
46
6.27
8.5
M
Disfluent
7.25
0.575
46
6.09
8.41
F
Fluent
7
0.575
46
5.84
8.16
M
Fluent
6.38
0.552
46
5.27
7.5
--- # Marginal Means: Using `Easystats` ```r library(modelbased) # load to use estimate_means lm(Quiz~Gender*Fluency, data=gen) %>% estimate_means() ```
Gender
Fluency
Mean
SE
CI_low
CI_high
F
Disfluent
7.38
0.552
6.27
8.5
M
Disfluent
7.25
0.575
6.09
8.41
F
Fluent
7
0.575
5.84
8.16
M
Fluent
6.38
0.552
5.27
7.5
--- # Linear Modeling Approach: Sum-Coding - Treatment coding tests simple effects in LM - Sum coding tests main effects/interaction effects (2 x 2) ```r contrasts(gen$Gender) <- c(0.5, -0.5)# sum code gend contrasts(gen$Fluency) <- c(0.5, -0.5)# sum code fluency # fit linear model lm(Quiz~Gender*Fluency, data=gen) %>% tidy() ```
term
estimate
std.error
statistic
p.value
(Intercept)
7
0.282
24.9
2.64e-28
Gender1
0.375
0.564
0.665
0.509
Fluency1
0.625
0.564
1.11
0.273
Gender1:Fluency1
-0.481
1.13
-0.426
0.672
--- # ANOVA - Sum coded LM is the same as running an ANOVA! ```r library(afex) aov_ez(id="id", between=c("Gender", "Fluency"), dv="Quiz", data=gen) %>% summary() %>% tidy() ```
term
num.Df
den.Df
MSE
statistic
ges
p.value
Gender
1
46
3.97
0.443
0.00953
0.509
Fluency
1
46
3.97
1.23
0.026
0.273
Gender:Fluency
1
46
3.97
0.182
0.00394
0.672
--- # Visualzing Categorical x Cateorgical Interactions <img src="More_LM_interactions_files/figure-html/unnamed-chunk-68-1.png" width="100%" style="display: block; margin: auto;" /> --- # Using the F Test: Sums of Squares <img src="ss_2x2.jpg" width="50%" style="display: block; margin: auto;" /> --- # Main Effect of Gender `\(SS_A\)` > Does Gender contribute significantly over and above an intercept-only model? <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Restricted/Null model: </td> <td style="text-align:left;"> `Quiz ~ 1` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Gender` </td> </tr> </tbody> </table> --- # F-Statistic for Gender Main Effect: `\(SS_A\)`
term
df.residual
rss
df
sumsq
statistic
p.value
Quiz ~ 1
49
190
Quiz ~ Gender
48
188
1
2
0.511
0.478
--- # Main Effect of *B* > Does Fluency contribute meaningfully to the model over and above Gender? <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Restricted/Null model: </td> <td style="text-align:left;"> `Quiz ~ Gender` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Gender + Fluency` </td> </tr> </tbody> </table> --- # F-Statistic for Fluency Main Effect: `\(SS_B\)` .pull-left[
term
df.residual
rss
df
sumsq
statistic
p.value
Quiz ~ Gender
48
188
Quiz ~ Gender + Fluency
47
183
1
4.87
1.25
0.269
] --- # Interaction Effect of *AB* > Does the interaction between Fluency and Gender contribute meaningfully to the model over and above the main effects? <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Restricted/Null model: </td> <td style="text-align:left;"> `Quiz ~ Gender + Fluency` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Gender + Fluency + Gender:Fluency` </td> </tr> </tbody> </table> --- # F-Statistic for Gender*Fluency Effect: `\(SS_{AB}\)` <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:right;"> Res.Df </th> <th style="text-align:right;"> RSS </th> <th style="text-align:right;"> Df </th> <th style="text-align:right;"> Sum of Sq </th> <th style="text-align:right;"> F </th> <th style="text-align:right;"> Pr(>F) </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 47 </td> <td style="text-align:right;"> 183.125 </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> <td style="text-align:right;"> NA </td> </tr> <tr> <td style="text-align:right;"> 46 </td> <td style="text-align:right;"> 182.404 </td> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0.721 </td> <td style="text-align:right;"> 0.182 </td> <td style="text-align:right;"> 0.672 </td> </tr> </tbody> </table> --- # Full ANOVA Table
term
df
sumsq
meansq
statistic
p.value
Gender
1
2
2
0.504
0.481
Fluency
1
4.87
4.87
1.23
0.273
Gender:Fluency
1
0.721
0.721
0.182
0.672
Residuals
46
182
3.97
--- # Type I, Type II, and Type III SS - By default R `aov` calculates Type I (SS) - Sequential (order listed in model) - First assign a maximum of variation to variable A - In the remaining variation, assign the maximum of variation to variable B - In the remaining variation, assign the maximum of variation to the interaction effect - Assign the rest to the residual SS - *Can change depending on order terms are placed in the model* --- # Type I, Type II, and Type III ANOVAs - Type II - Hierarchical SS - Based the marginality principle which states that you should not omit a lower order term from your model if there are any higher order ones that depend on it - Tests main effects first - Ignores interactions `$$SS(A | B) A$$` `$$SS(B | A) B$$` --- # Type II SS - Main effect Gender <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Reduced/Null model: </td> <td style="text-align:left;"> `Quiz ~ Fluency` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Fluency + Gender` </td> </tr> </tbody> </table> - Main effect Fluency <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Reduced/Null model: </td> <td style="text-align:left;"> `Quiz ~ Gender` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Gender + Fluency` </td> </tr> </tbody> </table> --- # Type II SS - `Anova` function in `car` package can handle these cases ```r library(car) Anova(lm(Quiz~Gender*Fluency, data=gen), type="II") %>% tidy() ```
term
sumsq
df
statistic
p.value
Gender
1.75
1
0.443
0.509
Fluency
4.87
1
1.23
0.273
Gender:Fluency
0.721
1
0.182
0.672
Residuals
182
46
--- # Type I, Type II, and Type III SS - Type III - Treats main effects and interactions simultaneously - Fit full model and remove effect of interest - How much of the variance is accounted for by X after taking into consideration all the other effects - *Preferable if unequal cell sizes* <table class=" lightable-material-dark" style='font-family: "Source Sans Pro", helvetica, sans-serif; margin-left: auto; margin-right: auto;'> <thead> <tr> <th style="text-align:left;"> </th> <th style="text-align:left;"> </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Reduced/Null model: </td> <td style="text-align:left;"> `Quiz ~ Fluency + Gender:Fluency` </td> </tr> <tr> <td style="text-align:left;"> Full/Alternative model: </td> <td style="text-align:left;"> `Quiz ~ Gender + Fluency + Gender:Fluency` </td> </tr> </tbody> </table> ---
term
sumsq
df
statistic
p.value
(Intercept)
2.45e+03
1
618
2.64e-28
Gender
1.75
1
0.443
0.509
Fluency
4.87
1
1.23
0.273
Gender:Fluency
0.721
1
0.182
0.672
Residuals
182
46
Df
Sum of Sq
RSS
AIC
F value
Pr(>F)
182
72.7
1
1.76
184
71.2
0.443
0.509
1
4.87
187
72
1.23
0.273
1
0.721
183
70.9
0.182
0.672
--- # Testing Categorical x Categorial Interactions - First look at the interaction effect - IF interaction effect is significant, perform mean comparisons (e.g., *F* or *t*) - DON’T bother looking at involved main effects (not informative) - If not, follow-up with main effect comparisons *For learning purposes we will examine the interaction* --- # Simple Effects Analysis - An examination of one factor at the level of another Our example: - Difference in quiz performance for videos with Males and Females at Disfluent - Difference in quiz performance for videos with Males and Females at Fluent - Get F test for these comparisons (2 one-way ANOVAs) `$$F=\frac{MS_{gender_{oneway}}}{MS_{W_{omnibus}}}$$` --- # Simple Effects Analysis ```r lm(Quiz~Gender*Fluency, data=gen) %>% emmeans::emmeans(pairwise~Gender|Fluency) %>% joint_tests(by="Fluency") ```
model term
Fluency
df1
df2
F.ratio
p.value
Gender
Disfluent
1
46
0.029
0.867
Gender
Fluent
1
46
0.596
0.444
- Simple effect for Fluency, *F*(1, 46) = 0.29, *p* = .867 - Simple effect for Disfluency, *F*(1, 46) = 0.596, *p* = .867 --- # As *t*-tests ```r lm(Quiz~Gender*Fluency, data=gen) %>% emmeans::emmeans(., ~Gender|Fluency) %>% pairs(adjust="bon") # in 2x2 design does not need to be corrected as it is only one test per family ``` ``` ## Fluency = Disfluent: ## contrast estimate SE df t.ratio p.value ## F - M 0.135 0.797 46 0.169 0.8666 ## ## Fluency = Fluent: ## contrast estimate SE df t.ratio p.value ## F - M 0.615 0.797 46 0.772 0.4441 ``` --- # Effect Sizes for Simple Effects ```r # get eta t_to_eta2( t = c(0.169, 0.772), df_error = 46 ) ```
Eta2_partial
CI
CI_low
CI_high
0.000621
0.95
0
1
0.0128
0.95
0
1
```r # get omega #t_to_omega2( # t = c(0.169, 0.772), #df_error = 46 #) ``` --- # Effect Sizes: Main Effects and Interaction -Report `\(eta_p^2\)` or `\(\omega_2^2\)` for Main Effects and interactions ```r lm(Quiz~Gender*Fluency, data=gen) %>% effectsize::omega_squared() ```
Parameter
Omega2_partial
CI
CI_low
CI_high
Gender
-0.01
0.95
0
1
Fluency
0.00457
0.95
0
1
Gender:Fluency
-0.0166
0.95
0
1
--- # ANOVA Power - `Superpower` .pull-left[ ```r string <- "2b*2b" n <- 100 # We are thinking of running 100 people in each condition mu <- c(7.25, 6.38, 7.38, 7) # Enter means in the order that matches the labels below. # In this case, control, pet. sd <-c(1.76,1.80,2.63,1.54) labelnames <- c("Gender", "male", "female", "Flu", "Disfluency", "Fluency") # # the label names should be in the order of the means specified above. ``` ] .pull-right[ ```r design_result <- ANOVA_design(design = string, n = n, mu = mu, sd = sd, labelnames = labelnames) ``` <img src="More_LM_interactions_files/figure-html/unnamed-chunk-87-1.png" width="100%" style="display: block; margin: auto;" /> ] --- # Run Simulation ```r #100 is not enough simulation_result <- ANOVA_power(design_result, alpha_level = .05, nsims = 100, verbose = FALSE) simulation_result ``` ``` ## Power and Effect sizes for ANOVA tests ## power effect_size ## anova_Gender 45 0.010939 ## anova_Flu 88 0.026972 ## anova_Gender:Flu 28 0.007015 ## ## Power and Effect sizes for pairwise comparisons (t-tests) ## power effect_size ## p_Gender_male_Flu_Disfluency_Gender_male_Flu_Fluency 94 -0.49412 ## p_Gender_male_Flu_Disfluency_Gender_female_Flu_Disfluency 10 0.05204 ## p_Gender_male_Flu_Disfluency_Gender_female_Flu_Fluency 21 -0.15855 ## p_Gender_male_Flu_Fluency_Gender_female_Flu_Disfluency 86 0.44305 ## p_Gender_male_Flu_Fluency_Gender_female_Flu_Fluency 79 0.37062 ## p_Gender_female_Flu_Disfluency_Gender_female_Flu_Fluency 26 -0.17665 ``` --- # 2 x 2 ANOVA Assumptions - Remember to check for normality and homogeneity before running ```r lm(Quiz~Gender*Fluency, data=gen) %>% check_normality() %>% check_homogeneity() ``` --- # Reporting Results I would report the three effects from this model as follows: - Main effect 1 - Main effect 2 - Interaction - *M* and *SD* for main effects - Significance tests (F, degrees of freedom, p, effect size) - If interaction is significant, simple effects analysis - Interpretation of simple effects - Figure visualizing either the main effects (if interaction is not significant) or interaction --- # Thoughts on Interactions 1. Avoid them if not theoretically motivated - Use contrasts when you can - Use model comparison approach to determine if interactions are warranted 2. Testing interactions require lots of data - Power is worse in observational studies compared to experimental paradigms 3. Center continuous variables and sum code categorical predictors 4. For non-simple designs (> 2:2) use `afex::aov_ez` (uses Type III SS) --- # Class Activity Does dress attire and tattoos influence how long a person will interact with a stranger asking for directions? - Time (in s): Long long person interacted - Tattoos: 0 (no tat visible) 1(tat visible) - Dress: 0 (causal dress) 1 (professional dress) ```r data=read_csv("https://raw.githubusercontent.com/jgeller112/psy503-psych_stats/master/static/slides/13_Interactions/data/tats.csv") ``` - Conduct 2 x 2 ANOVA (discuss main effects and interaction effects) - Follow-up interaction with simple effects analysis - Create a figure of results