Problem Set 2 grades posted
Problem Set 3 will be posted later today
Data for the final project needs to be approved by October 31st
NHST
We can only falsify a theory
One- and two-sided hypotheses
One sample tests
Two sample t-tests
Independent
Dependent (paired)
Non-parametric
Multiple Comparisons
Simple experiments:
For example: manipulation of the independent variable involves having an experimental condition and a control
This situation can be analyzed with a t-test
We can also use t-tests to analyze any binary independent variable
The t-test is a simple regression model with one categorical predictor
Don't make a continuous variable categorical just so you can do a t-test
People used to split variables into high versus low or simply split down the middle
You separate the people who are close together and lump them with people who are not really like them
Effect sizes get smaller
You will also decrease power and see Type II errors
Between subjects / Independent designs
Repeated measures / within subjects / dependent designs
Independent t-test:
Compares two means based on independent data
Used when different participants were assigned to each condition of the study
Dependent t-test:
Compares two means based on related data
Used when the same participants took part in both conditions of the study
William Gosset discovered it while working for Guinness
Small samples: more conservative test
t-distribution has fatter tails
N-1 is non-biased (not going to concern ourselves with the proof). We are taking sample SD instead of pop SD. It is going to be biased if we dont.
Are invisible people mischievous?
Manipulation
Placed participants in an enclosed community riddled with hidden cameras
12 participants were given an invisibility cloak
12 participants were not given an invisibility cloak
Outcome measured how many mischievous acts participants performed in a week
library(rio)library(tidyverse)library(easystats)library(kableExtra)longdata <- read_csv("https://raw.githubusercontent.com/doomlab/statsofdoom-files/master/graduate/R%20Flip/11_ttests/data/invisible.csv")head(longdata)
## # A tibble: 6 × 2## Cloak Mischief## <chr> <dbl>## 1 No Cloak 3## 2 No Cloak 1## 3 No Cloak 5## 4 No Cloak 4## 5 No Cloak 6## 6 No Cloak 4
H0: The no cloak and cloak groups would have the same mean
H1: The no cloak and cloak groups would have different means
H0: The no cloak and cloak groups would have the same mean
H1: The no cloak and cloak groups would have different means
M <- tapply(longdata$Mischief, longdata$Cloak, mean)STDEV <- tapply(longdata$Mischief, longdata$Cloak, sd)N <- tapply(longdata$Mischief, longdata$Cloak, length)M;STDEV;N
## Cloak No Cloak ## 5.00 3.75
## Cloak No Cloak ## 1.651446 1.912875
## Cloak No Cloak ## 12 12
Our means appear slightly different. What might have caused those differences?
Our means appear slightly different. What might have caused those differences?
Variance created by our manipulation: The cloak (systematic variance)
Variance created by unknown factors (unsystematic variance)
If the samples come from the same population, then we expect their means to be roughly equal
Although it is possible for their means to differ by chance alone, here, we would expect large differences between sample means to occur very infrequently
We compare the difference between the sample means that we collected to the difference between the sample means that we would expect to obtain if there were no effect (i.e. if the null hypothesis were true)
We use the standard error as a gauge of the variability between sample means
If the difference between the samples we have collected is larger than what we would expect based on the standard error then we can assume one of two interpretations:
There is no effect and sample means in our population fluctuate a lot and we have, by chance, collected two samples that are atypical of the population from which they came (Type 1 error)
The two samples come from different populations but are typical of their respective parent population. In this scenario, the difference between samples represents a genuine difference between the samples (and so the null hypothesis is incorrect)
As the observed difference between the sample means gets larger, the more confident we become that the second explanation is correct (i.e., that the null hypothesis should be rejected)
If the null hypothesis is incorrect, then we gain confidence that the two sample means differ because of the different experimental manipulation imposed on each sample
t=(¯¯¯¯¯X1−¯¯¯¯¯X2)−(μ1−μ2)√s21n1+s22n2t=¯¯¯¯¯X1−¯¯¯¯¯X2√s21n1+s22n2
s2p=(n1−1)s21+(n2−1)s22n1+n2−2
Assumptions:
longdata %>% drop_na()
## # A tibble: 24 × 2## Cloak Mischief## <chr> <dbl>## 1 No Cloak 3## 2 No Cloak 1## 3 No Cloak 5## 4 No Cloak 4## 5 No Cloak 6## 6 No Cloak 4## 7 No Cloak 6## 8 No Cloak 2## 9 No Cloak 0## 10 No Cloak 5## # … with 14 more rows## # ℹ Use `print(n = ...)` to see more rows
library(rstatix)longdata %>% group_by(Cloak) %>% identify_outliers(Mischief)
## [1] Cloak Mischief is.outlier is.extreme## <0 rows> (or 0-length row.names)
longdata %>% group_by(Cloak) %>% shapiro_test(Mischief)
## # A tibble: 2 × 4## Cloak variable statistic p## <chr> <chr> <dbl> <dbl>## 1 Cloak Mischief 0.973 0.936## 2 No Cloak Mischief 0.913 0.231
Normality
library(ggpubr)# Draw a qq plot by groupg=ggqqplot(longdata, x = "Mischief", facet.by = "Cloak")
The most common problem is lack of homogeneity
longdata %>% levene_test(Mischief~Cloak)
## # A tibble: 1 × 4## df1 df2 statistic p## <int> <int> <dbl> <dbl>## 1 1 22 0.270 0.609
Welch’s t-test gives gives equivalent answer to traditional t-test when there is an equal sample size or variances, BUT can also handle unequal sample size and variance
Same t-statistic calculation and t-distribution, just have to apply correction for degrees of freedom (df)
df=(σ21n1+σ22n2)2(σ21n1)2n1−1+(σ22n2)2n2−1 A=s21n1andB=s22n2 df=(A+B)2A2n1−1+B2n2−1
library(report)d_ind <- t.test(Mischief ~ Cloak, data = longdata, var.equal = TRUE, #assume equal variances paired = FALSE) #independentd_ind <- t.test(Mischief ~ Cloak, data = longdata, var.equal = FALSE, #assume unequal variances paired = FALSE) #independent
No differences between groups was found: t(22)=1.71,p=.101
Easystats
package in R can help write this up for you :)
The Welch Two Sample t-test testing the difference of Mischief by Cloak (mean in group Cloak = 5.00, mean in group No Cloak = 3.75) suggests that the effect is positive, statistically not significant, and medium (difference = 1.25, 95% CI [-0.26, 2.76], t(21.54) = 1.71, p = 0.101; Cohen's d = 0.74, 95% CI [-0.14, 1.60])
report(d_ind)
Mischief=β0+β1(CloakNo )+ϵ
Categorical variables are dummy coded or treatment coded
In R, levels of categorical variable transformed to 0 and 1
By default, 0 is attached to whatever variable comes first in alphabet
β1 = difference between the two groups
library(tidyverse)library(broom)d=lm(Mischief ~ Cloak,data = longdata)broom::tidy(d)
## # A tibble: 2 × 5## term estimate std.error statistic p.value## <chr> <dbl> <dbl> <dbl> <dbl>## 1 (Intercept) 5 0.516 9.69 0.00000000212## 2 CloakNo Cloak -1.25 0.730 -1.71 0.101
library(ggplot2)library(ggpubr)df.summary <- longdata %>% group_by(Cloak) %>% summarise( sd = sd(Mischief, na.rm = TRUE), Mischief = mean(Mischief) )d=ggplot(longdata, aes(Cloak, Mischief)) + geom_bar(stat = "identity", data = df.summary, fill = NA, color = "black") + geom_jitter( position = position_jitter(0.2), color = "black") + geom_errorbar( aes(ymin = Mischief-sd, ymax = Mischief+sd), data = df.summary, width = 0.2) + xlab("Invisible Cloak Group") + ylab("Average Mischief Acts")
ggstatsplot::ggbetweenstats( data = longdata, x = Cloak, y = Mischief)
An educator believes that new directed reading activities in the classroom will help elementary school students improve some aspects of their reading ability. She arranges for a third grade class of 21 students to take part in these activities for an 8-week period. A control classroom of 23 third graders follows the same curriculum without the activities. At the end of 8 weeks, all students are given a Degree of Reading Power (DRP) test, which measures the aspects of reading ability that the treatment is designed to improve.
treatment=c(24,43,58,71,43,49,61,44,67,49,53,56,59,52,62,54,57,33,46,43,57)control=c(42,43,55,26,62,37,33,41,19,54,20,85,46,10,17,60,53,42,37,42,55,28,48)
State hypotheses
Check assumptions
Run t.test
Decision/conclusion
Visualize
A math test was given to 300 17 year old students in 1978 and again to another 17 year old students in 1992.
Group 1: X1 = 300.4, S1 = 34.9, n = 300 Group 2: X2 = 306.7, S2 = 30.1, n = 350
State hypotheses
Simulate data (rnorm
)
Check assumptions
Calculate t and DF correction
Run t.test
Decision/conclusion
n1=300n2=350t.stat=(300.4-306.7)/sqrt(34.9^2/300+30.1^2/350)#df correction Welsch#A=s1^2/n1#B=s2^2/n2A=34.9^2/300B=30.1^2/350df=(A+B)^2/(A^2/(n1-1)+B^2/(n2-1))df
## [1] 594.7025
t4 =((306.7 - 300.4)-(0-0))/(34.9^2 / 300 + 30.1^2 / 350)^(1/2)v4 =(34.9 ^2 / 300 + 30.1^2 / 350)^2 / (34.9^4 / (300^2 * (300 - 1)) + 30)alpha4 =0.01tcrit4 = qt(alpha4/2, v4)pval4 =2 *pt(-abs(t4),v4) abs(t4)
## [1] 2.443286
group1 <- rnorm(300,mean =300.4,sd=34.9)group2 <- rnorm(350,mean =306.7,sd=30.1)c=t.test(group1, group2,alternative ="two.sided")
Effect sizes were labelled following Cohen's (1988) recommendations.
The Welch Two Sample t-test testing the difference between group1 and group2 (mean of x = 303.72, mean of y = 304.89) suggests that the effect is negative, statistically not significant, and very small (difference = -1.17, 95% CI [-6.48, 4.14], t(566.72) = -0.43, p = 0.665; Cohen's d = -0.03, 95% CI [-0.19, 0.12])
Are invisible people mischievous?
Manipulation
Outcome: We measured how many mischievous acts participants performed in week 1 and week 2
Note: Same data, but instead the study is dependent. Let's see what happens to our t-test
t=¯D−μDSD/√N
SD = ⎷(d1 − ¯¯¯d)2 + (d2 − ¯¯¯d)2 + ⋯ + (dn − ¯¯¯d)2n − 1
We are going to use the standard error of the differences rather than standard error
The standard error of the differences is calculated by subtracting the two sets of scores and calculating standard deviation on that difference score
The data screening can be treated in the same fashion
However, homogeneity between groups is not examined, because you do not have separate groups!
The variance is calculated on one difference score, so there is not a homogeneity concern
The cloak and no cloak conditions were different: t(11)=3.80,p=.003
Why is this result different than independent t?
d_pair <-t.test(Mischief ~ Cloak, data = longdata, var.equal = TRUE, #ignored in dependent t paired = TRUE) #dependent t
The Paired t-test testing the difference of Mischief by Cloak (mean difference = 1.25) suggests that the effect is positive, statistically significant, and large (difference = 1.25, 95% CI [0.53, 1.97], t(11) = 3.80, p = 0.003; Cohen's d = 1.15, 95% CI [0.37, 1.89])
Use the lm to test for significance
Calculate the t value
Use the lm to test for significance
Calculate the t value
Use the lm to test for significance
Calculate the t value
library(lme4)library(sjPlot)longdata$id<-rep(1:12, length(longdata))d_reg<-lme4::lmer(Mischief~Cloak + (1|id), data=longdata)
Dependent variable | Dependent variable | |||||||
---|---|---|---|---|---|---|---|---|
Predictors | Estimates | CI | p | df | Estimates | CI | p | df |
(Intercept) | 5.00 | 3.89 – 6.11 | <0.001 | 13.45 | ||||
Cloak [No Cloak] | -1.25 | -1.97 – -0.53 | 0.003 | 11.00 | ||||
Mischief | 1.25 | 0.53 – 1.97 | 0.003 | 11.00 | ||||
Random Effects | ||||||||
σ2 | 0.65 | |||||||
τ00 | 2.55 id | |||||||
ICC | 0.80 | |||||||
N | 12 id | |||||||
Observations | 24 | NA | ||||||
Marginal R2 / Conditional R2 | 0.113 / 0.820 | NA |
library(ggstatsplot)## parametric t-testp1 <- ggwithinstats( data = longdata, x = Cloak, y = Mischief, type = "p", effsize.type = "d", conf.level = 0.95, title = "Cloaks vs. No Cloaks", package = "ggsci", palette = "nrc_npg")
library(raincloudplots)wide <- longdata %>% pivot_wider(names_from = "Cloak", values_from ="Mischief")df_1x1 <- data_1x1( array_1 = wide$Cloak, array_2 = wide$`No Cloak`)raincloud_2 <- raincloud_1x1_repmes( data = df_1x1, colors = (c('dodgerblue', 'darkorange')), fills = (c('dodgerblue', 'darkorange')), line_color = 'gray', line_alpha = .3, size = 1, alpha = .6, align_clouds = FALSE) +scale_x_continuous(breaks=c(1,2), labels=c("Cloak", "No Cloak"), limits=c(0, 3)) + xlab("Invisibility") + ylab("Mischief") + theme_classic()raincloud_2
A manufacturer claims it has developed an additive that increases gas mileage. But you are not sure whether the additive will increase or decrease performance. They recruit 10 drivers. Each driver drives a car on a well-conditioned track. They record the gas mileage without any additive, then with additive. Assume α = .05
data5a = c(22,25,17,24,16,29,20,23,19,20) data5b = c(18,21,16,22,19,24,17,21,23,18)
State hypotheses
Check assumptions
run t.test
Decision/conclusion
Visualize the data
data5diff=data5a-data5bt5 =( mean(data5diff) - 0) / (sd(data5diff) / (length(data5diff))^(1/2.)) tcrit5 = qt(0.05/2, length(data5diff)-1)pval5 =2 *pt(-abs(t5),length(data5diff)-1) abs(t5)
## [1] 1.714286
t.test(data5a, data5b,paired =TRUE,alternative ="two.sided")
## ## Paired t-test## ## data: data5a and data5b## t = 1.7143, df = 9, p-value = 0.1206## alternative hypothesis: true mean difference is not equal to 0## 95 percent confidence interval:## -0.5113467 3.7113467## sample estimates:## mean difference ## 1.6
We know the weight of 10 mice before and after a treatment
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)# Weight of the mice after treatmentafter <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
We want to know, if there is any significant difference in the mean weights after treatment? Assume α .05
State hypotheses
Check assumptions
run t.test
Decision/conclusion
Visualize the data
Sometimes data is non-normal (skewed, bimodal, etc.), or ordinal, so what do we do?
Use Shapiro-Wilk normality test
Can transform data (e.g., log, sqrt, etc.), but these also make assumptions
Robust methods
Mann-Witney U test (indep)
Wilcoxon (paired)
U1=n1n2+n1(n1+1)2−R1 U2=n1n2+n2(n2+1)2−R2 U=min(U1,U2)
female = c(34,36, 41, 43, 44, 37)male = c(45, 33, 35, 39, 42)wilcox.test(male, female, paired = FALSE)
## ## Wilcoxon rank sum exact test## ## data: male and female## W = 14, p-value = 0.9307## alternative hypothesis: true location shift is not equal to 0
W=Nr∑i=1sgn(x2−x1,i)Ri
Where sgn is an indicator variable with if is negative and if is positive
R = Rank
W is then the sum of the positive signed ranks
Exclude pairs where difference equals zero, Nr is the reduced sample size
If W < Wcrit we reject the null (opposite from t-test; always reject if t > tcrit)
G1 = c(125,115,130,140,140,115,140,125,140,135)G2 = c(110,122,125,120,140,124,123,137,135,145)
G1 = c(125,115,130,140,140,115,140,125,140,135)G2 = c(110,122,125,120,140,124,123,137,135,145)wilcox.test(G1, G2,paired =TRUE,alternative ="two.sided")
## ## Wilcoxon signed rank test with continuity correction## ## data: G1 and G2## V = 27, p-value = 0.6353## alternative hypothesis: true location shift is not equal to 0
We want our tests to find true positives and true negatives
Multiple comparisons
Type I error (false positive)
α-inflation
Testing each new pairwise comparison is costly
α/m
Controls for false positives (Type I errors)
Overly conservative
pvals = c(0.01,0.02,0.04)p.adjust(pvals,method ="bonferroni", n = length(pvals))
## [1] 0.03 0.06 0.12
Sort p-values from smallest to largest
Test whether p < αm+1−k
If so, reject and move to the next
Typically you report the adjusted p-value. Just multiply your p-value by the adjusted alpha’s denominator
pvals = c(0.01,0.02,0.04)
pvals = c(0.01,0.02,0.04)p.adjust(pvals,method ="holm",n = length(pvals))
## [1] 0.03 0.04 0.04
In this lecture, you've learned:
Coming Up
Effect size and power
Regression
Problem Set 2 grades posted
Problem Set 3 will be posted later today
Data for the final project needs to be approved by October 31st
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |