PSY 503: Foundations of Statistical Methods in Psychological Science

class: center, middle, inverse, title-slide

.title[
# PSY 503: Foundations of Statistical Methods in Psychological Science
]
.subtitle[
## Measurement
]
.author[
### Jason Geller, Ph.D. (he/him/his)
]
.institute[
### Princeton University
]
.date[
### 2022-09-19
]

---

# Knowledge Check

---

<img src="confused-imconfused.gif" width="36%" style="display: block; margin: auto;" />
---
# Today

- What is measurement?

- Scales of measurement

- Visualizing data on different scales 
      - Bar graphs
      - Histograms
      - Raincloud plots

- Reliability and validity

- Measurement in practice: listening effort
---
class: inverse center middle

> "Whatever exists at all exists in some amount. To know it thoroughly involves knowing its quantity as well as its quality.”
  -Edward L. Thorndike
  
???

Welcome back, gang. Today we are going to talk about measurement. Measurement is probably the most fundamental part of doing science. For instance, as a cognitive scientist I am interested in learning and memory. If I wanted to know how learning increased over time or how many items are stored in memory and for how long I would have measure it. Knowing how to measure them is critical knowing about the thing itself.

---
# What is measurement?

> The assignment of scores so that the scores represent some characteristic of the individuals

???

A formal definition from your book is: 
---
# What things do we want to measure?

.pull-left[

- Depression
- Effort
- Intelligence
- Memory
- Social support
- Extroversion
- Eating behavior
- Parent child relationships
- Attention
- Burn out
- Hopelessness

]

.pull-right[
<img src="poppins.png" width="50%" style="display: block; margin: auto;" />
]

???

A lot of what we want to study in the cognitive sciences cannot be measured easily--they defy direct observation. So we could use a measuring tape to get our heights but we could not use it to measure personality, unless we are Mary poppins. To examine constructs, psychologists do many things. They observe, they get ratings, ask people to fill out surveys.

There are multiple ways to do this. We usually give them tasks to do or we ask them to fill out surveys.

---
# Constructs ≠ Variables

???

constructs are these big broad mental abstractions

we call these latent constructs

To study them we need ways to operationalize them by turning them into numbers. For depression we can base it on a score or rating  ore self-reported. We care about the scores bc we think they tell us about the construct.

Important to know here. Super important
constructs are not variables. We use variables to get at the constructs. It is hard to understand this bc the media talks about constructs all the time.   
---
<img src="Slide1.png" width="100%" style="display: block; margin: auto;" />
---
# Scales of Measurement

- Variables are defined and categorized four ways:

1. *N*ominal
  
  2. *O*rdinal
  
  3. *I*nterval
  
  4. *R*atio

---
# Nominal

- Nominal ≈ name ≈ categorical

- Arbitrary numbers assigned to categories

- No rank order

- Presented as frequency counts

- Gender/sex, eye color
---
# Ordinal
--

- Nominal plus:

– Scores indicate rank
--

- Increase/decrease is meaningful

- Comparative information (1 > 2 > 3)

- No magnitude information (1 > 2 != 2 > 3)

- **Most statistical techniques inappropriate**

---
# Likert Scales
<img src="pickel.jpeg" width="70%" style="display: block; margin: auto;" />
---
# Olympic Example
<img src="gold.png" width="70%" style="display: block; margin: auto;" />
---
# Interval

- Differences in adjacent intervals are equal `\((2-1)=(4-3)=(21-22)\)`

- Numeric, but lacks true 0

- Lacks multiplicative properties

- Examples: 
  - Temperature
  
  - Time
---
# Ratio
--

- All properties of interval plus:

– True zero

– Ratios are meaningful
  
--

- Examples: 
  - Reaction times
  - Test Performance
  - Other examples?

---
# Olympic Example
<img src="gold.png" width="70%" style="display: block; margin: auto;" />
---
# Considerations for Scaling

--
- Information content

- Nominal -> Ordinal -> Interval -> Ratio
  
  - Can always go backwards but never forward
  
  - Higher level = higher sensitivity = higher power
---
# Visualizing Scales of Measurement

- Bar charts

- Nominal
  
<img src="04_Measurement_files/figure-html/unnamed-chunk-9-1.png" width="90%" />
---
# Visualizing Data

- Bar charts

- Ordinal
  
<img src="04_Measurement_files/figure-html/unnamed-chunk-10-1.png" width="90%" style="display: block; margin: auto;" />
---
# Visualizing Data

- Histograms

- Continuous variables
  
      - Distribution of data in terms of frequency count
<img src="04_Measurement_files/figure-html/unnamed-chunk-11-1.png" width="100%" height="100%" style="display: block; margin: auto;" />
---
# Visualizing Data

- Density plots

- Alternative to histogram

- Helps visualize distribution of data
  
<img src="04_Measurement_files/figure-html/unnamed-chunk-12-1.png" width="100%" height="100%" style="display: block; margin: auto;" />
---
# Visualizing Data

- Boxplot

<img src="boxplot-outliers.png" width="70%" style="display: block; margin: auto;" />
---
# Boxplot

<img src="04_Measurement_files/figure-html/unnamed-chunk-14-1.png" width="100%" height="100%" style="display: block; margin: auto;" />
---
<img src="rain.jpeg" width="50%" style="display: block; margin: auto;" />
---
# Visualizing Data

- Raincloud Plots

- Density plot, box plot, raw data points all in one
  - https://wellcomeopenresearch.org/articles/4-63
<img src="04_Measurement_files/figure-html/unnamed-chunk-16-1.png" width="100%" height="100%" style="display: block; margin: auto;" />
---
# Reliability

> How consistent or how precise a measure/method is

- Test-Retest (over time)
  
  - Internal (across items)
  
  - Inter-rater  (between different researchers)
  
---
# Reliability

> Consistency of a measure:

- Test-Retest (across time)
  
<img src="testretest.png" width="50%" style="display: block; margin: auto;" />
---
# Reliability

> Consistency of a measure:

- Test-retest (across time)

> Consistency of a measure:
  
  - Internal (across items)
  
  1. I love Halloween. **Agree**
  2. I feel happy when I decorate my house for Halloween. **Agree**
  3. I feel angry when the Halloween season is approaching. **Disagree**
  
- Go here: https://forms.gle/x53hpZWhUwDmRW3M8
---
# Cronbach's Alpha
<center>
`\(\alpha = \frac{N \bar{c}}{\bar{v} + (N-1) \bar{c}}\)`
Where:

`\(N\)` = the number of items.
`\(c̄\)` = average covariance between item-pairs.
`\(v̄\)` = average variance.
<end>

```r
library(ltm)

#enter survey responses as a data frame
data <- data.frame(Q1=c(1, 2, 2, 3, 2, 2, 3, 3, 2, 3),
                   Q2=c(1, 1, 1, 2, 3, 3, 2, 3, 3, 3),
                    Q3=c(1, 1, 2, 1, 2, 3, 3, 3, 2, 3))

#calculate Cronbach's Alpha
cronbach.alpha(data)
```
---
# Cronbach's Alpha

<template id="2de469ef-9e57-47df-b6e0-425c4efbdd45"><style>
.tabwid table{
  border-spacing:0px !important;
  border-collapse:collapse;
  line-height:1;
  margin-left:auto;
  margin-right:auto;
  border-width: 0;
  display: table;
  margin-top: 1.275em;
  margin-bottom: 1.275em;
  border-color: transparent;
}
.tabwid_left table{
  margin-left:0;
}
.tabwid_right table{
  margin-right:0;
}
.tabwid td {
    padding: 0;
}
.tabwid a {
  text-decoration: none;
}
.tabwid thead {
    background-color: transparent;
}
.tabwid tfoot {
    background-color: transparent;
}
.tabwid table tr {
background-color: transparent;
}
.katex-display {
    margin: 0 0 !important;
}
</style><div class="tabwid"><style>.cl-0fa8088e{}.cl-0fa3a2ee{font-family:'Helvetica';font-size:11pt;font-weight:normal;font-style:normal;text-decoration:none;color:rgba(0, 0, 0, 1.00);background-color:transparent;}.cl-0fa3b34c{margin:0;text-align:left;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);padding-bottom:5pt;padding-top:5pt;padding-left:5pt;padding-right:5pt;line-height: 1;background-color:transparent;}.cl-0fa3d7c8{width:54pt;background-color:transparent;vertical-align: middle;border-bottom: 0 solid rgba(0, 0, 0, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-0fa3d7d2{width:54pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 0 solid rgba(0, 0, 0, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}.cl-0fa3d7dc{width:54pt;background-color:transparent;vertical-align: middle;border-bottom: 2pt solid rgba(102, 102, 102, 1.00);border-top: 2pt solid rgba(102, 102, 102, 1.00);border-left: 0 solid rgba(0, 0, 0, 1.00);border-right: 0 solid rgba(0, 0, 0, 1.00);margin-bottom:0;margin-top:0;margin-left:0;margin-right:0;}</style><table class='cl-0fa8088e'><thead><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7dc"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Cronbach’s.Alpha</span></p></td><td class="cl-0fa3d7dc"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Internal.consistency</span></p></td></tr></thead><tbody><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">0.9 ≤ α</span></p></td><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Excellent</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">0.8 ≤ α &lt; 0.9</span></p></td><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Good</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">0.7 ≤ α &lt; 0.8</span></p></td><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Acceptable</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">0.6 ≤ α &lt; 0.7</span></p></td><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Questionable</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">0.5 ≤ α &lt; 0.6</span></p></td><td class="cl-0fa3d7c8"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Poor</span></p></td></tr><tr style="overflow-wrap:break-word;"><td class="cl-0fa3d7d2"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">α &lt; 0.5</span></p></td><td class="cl-0fa3d7d2"><p class="cl-0fa3b34c"><span class="cl-0fa3a2ee">Unacceptable</span></p></td></tr></tbody></table></div></template>
<div class="flextable-shadow-host" id="d9bbeaf7-84e1-4f3b-ad45-df2c1496d51e"></div>
<script>
var dest = document.getElementById("d9bbeaf7-84e1-4f3b-ad45-df2c1496d51e");
var template = document.getElementById("2de469ef-9e57-47df-b6e0-425c4efbdd45");
var caption = template.content.querySelector("caption");
if(caption) {
  caption.style.cssText = "display:block;text-align:center;";
  var newcapt = document.createElement("p");
  newcapt.appendChild(caption)
  dest.parentNode.insertBefore(newcapt, dest.previousSibling);
}
var fantome = dest.attachShadow({mode: 'open'});
var templateContent = template.content;
fantome.appendChild(templateContent);
</script>
---
# Reliability

> Consistency of a measure:
  
  - Inter-rater (across different researchers)
---
---
# Bank Robbery

- Eyewitness memory plays an important role in helping police solve crimes. However, people’s abilities to accurately recall what they saw can substantially impact whether a criminal is convicted—and equally, if an innocent person is wrongfully convicted. So, it’s important to get it right.

- First, let’s find out how well you can remember what happens during a bank robbery
<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/1TkSy_e5WTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>

---
# Viewing #1

- You have now viewed a clip of a simulated bank robbery.

Take a few minutes to write down your description of the main offender

---
# Work with partner

Take a few minutes to write down your description of the main offender

- Inter-observer agreement = # agreements X 100 / # agreements + # disagreements
For example: if there were 5 agreements and 3 disagreements…

1. [a] 5 x 100 = 500
2. [b] 5 + 3 = 8
3. [c] 500/8 = 62.5% inter-observer agreement

---
# Group Discussion

- What sort of percentage agreements did we get?
- What do you think these percentages tell us about the reliability of the instructions you were given?
---
# Viewing #2

- We will watch the video again. 
- Then you will get new instructions for describing the offender. 
- Then you will work with your partner to calculate inter-rater reliability again

<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/1TkSy_e5WTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<center>
---
# Viewing #2

Using the checklist below, describe the main offender:

- Were they male or female?
- What was their hair colour?
- What was their skin colour?
- What colour were their eyes?
- What was the colour of their shirt?
- Were they wearing a jacket? If so, what colour was it?
- Were they wearing long pants, jeans, or shorts?
- What colour was their pants/jeans/shorts?
- Were they wearing glasses?
- Were they wearing a hat? 
- Were they wearing a balaclava?
- Did he/she have a gun?
- Did he/she have a knife?
- Were they carrying anything? If yes, what was it?

---
# Work with partner
Take a few minutes to write down your description of the main offender

- Inter-observer agreement = # agreements X 100 / # agreements + # disagreements

- What percentage agreement did you end up with? What do you think it says about the reliability of the instructions you were given?

> For example: if there were 5 agreements and 3 disagreements
 [a] 5 x 100 = 500
 [b] 5 + 3 = 8
 [c] 500/8 = 62.5% inter-observer agreement
 
<div class="countdown" id="timer_a44f117c" data-update-every="1" tabindex="0" style="right:0;bottom:0;">
<div class="countdown-controls"><button class="countdown-bump-down">−</button><button class="countdown-bump-up">+</button></div>
<code class="countdown-time"><span class="countdown-digits minutes">05</span><span class="countdown-digits colon">:</span><span class="countdown-digits seconds">00</span></code>
</div>

---
# Viewing #3
<center>
<iframe width="560" height="315" src="https://www.youtube.com/embed/1TkSy_e5WTg" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
<center>
---
Using the checklist below, describe the main offender
- Were they male or female?
- What was their hair colour?
- What was their skin colour?
- What colour were their eyes?
- What was the colour of their shirt?
- Were they wearing a jacket? If so, what colour was it?
- Were they wearing long pants, jeans, or shorts?
- What colour was their pants/jeans/shorts?
- Were they wearing glasses?
- Were they wearing a hat? 
- Were they wearing a balaclava?
- Did he/she have a gun?
- Did he/she have a knife?
- Were they carrying anything? If yes, what was it?

---
# On your own this time

Now, using the same formula, calculate how well you agree within yourself (test-retest reliability). That is, what is the level of correspondence between your observations at Viewing #2 and your observations at Viewing #3?

- Inter-observer agreement = `# agreements X 100 / # agreements + # disagreements`

---
# Group Discussion

How do these results compare with the results for the inter-rater reliability calculations, i.e., are they different? In what way? Any ideas why or why not?

---
# Validity

- Construct Validity

> Are you measuring what you want to measure?

- Teaching evaluations

---
# Construct Validity

- Face validity
  
>  The extent to which a measurement method appears “on its face” to measure the construct of interest

- Criterion or convergent validity
  
> The extent to which people’s scores on a measure are correlated with other variables (known as criteria) that one would expect them to be correlated with.

- Discriminate or divergent validity
  
> The extent to which scores on a measure of a construct are not correlated with measures of other, conceptually distinct, constructs and thus discriminate between them.
---
background-image: url(listeningeffort.png)
background-position: center
background-size: cover
---
# Listening Effort

- Conceptual definition

> The deliberate allocation of mental resources to overcome obstacles in goal pursuit when carrying out a listening task

<img src="list.png" width="70%" style="display: block; margin: auto;" />
---
# Operationalization

- How do we measure "accuracy"?

<img src="acc.png" width="70%" style="display: block; margin: auto;" />
---
# Operationalization

- How do we measure "effort"

- Self-report:

“how hard did you have to work mentally to accomplish your level of performance?”

---
# How do we measure "effort"

.pull-left[

- Physiological measures

- Pupil size
  - Skin conductance (GSR)
  - Heart rate
]

--
.pull-right[
<img src="pupil.png" width="70%" style="display: block; margin: auto;" />
]
---
# How do we measure "effort"

.pull-left[

- Behavioral measures

- Recall
]
.pull-right[
<img src="behaveeffort.png" width="100%" style="display: block; margin: auto;" />
]

---
# How do we measure "effort"?

.pull-left[

- Behavioral measures

- dual-task
]

.pull-right[
<img src="dual.png" width="100%" style="display: block; margin: auto;" />
]
---
# How many measures of listening effort in the literature?

- 27!
---
# Measuring Listening Effort (Strand et al., 2018)

<img src="corr1.png" width="60%" style="display: block; margin: auto;" />
--

- *r* - .2 
---
# Jingle and Jangle Fallacy (Thorndike, 1904; Kelley, 1927; Flake & Fried, 2020)

- Jingle

> Falsely assuming that two tasks measure the same construct because they have the same name

---
# Measuring Listening Effort (Strand et al., 2018)

---
# Jingle and Jangle Fallacy (Thorndike, 1904; Kelley, 1927; Flake & Fried, 2020)

- Jingle

> Falsely assuming that two tasks measure the same construct because they have the same name

- Jangle

> Falsely assuming that two tasks measure different constructs because they have different names
---

# Key Points

- Constructs ≠ variables

- Constructs are measured multiple ways, which may lead to different outcomes

- Thinking carefully about measurement is fundamental to research! 
---
<img src="timeline.jpeg" width="100%" style="display: block; margin: auto;" />