Authors: Patricia Chen, Dennis W. H. Teo, Daniel X. Y. Foo, Holly A. Derry, Benjamin T. Hayward, Kyle W. Schulz, Caitlin Hayward, Timothy A. McKay, & Desmond C. Ong
Accepted in principle at npj Science of Learning.
APA Citation:
Chen, P., Teo, D. W. H., Foo, D. X. Y., Derry, H. A., Hayward, B. T., Schulz, K. W., Hayward, C., McKay, T. A., & Ong, D. C. (accepted). Real-World Effectiveness of a Social-Psychological Intervention Translated from Controlled Trials to Classrooms. npj Science of Learning.
Analysis code written by Dennis Teo, Daniel Foo, and Desmond Ong. Please direct any questions to Desmond Ong.
Code dated 25 May 2022, and available at https://osf.io/6qej7
This file contains all the code we used to perform the analyses reported in the paper. This R Markdown file, run together with the data files, will produce the output HTML file.
To request access the data files, please see the Data Availability Statement in the paper.
The remainder of this file is structured in a similar flow as the Results section of the paper (and the Supplemental Information), and numbers in the text are automatically piped in from R variables (and usually left to more decimal places, so there may be slight rounding discrepancies with the numbers in the paper, which are tailored to APA standards).
library(car) # Anova
library(tidyverse)
library(lme4)
library(meta) # metagen
library(MatchIt) # matching analysis
library(lmtest) #coeftest
library(lmerTest)
library(sandwich) #vcovCL
library(gridExtra) # grid.arrange
select <- dplyr::select
recode <- dplyr::recode
mutate <- dplyr::mutate
set.seed(42) # for reproducibility if there are any stochastic functions.
source('ECoach Functions.R', echo=F)
# To request access the data files, please see the Data Availability Statement.
exam.lvl = read.csv("ecoach-exam-lvl-full.csv")
user.lvl = read.csv("ecoach-user-lvl-full.csv")
new.labels <- c("Introductory Biology","General Chemistry","Introductory Economics",
"Elementary Programming","Introductory Programming (Engin)",
"General Physics", "Introduction to Statistics")
ORDERED.LABELS <- sort(new.labels)
COURSE_NUM_VECTOR = rep(1:7, each=2) + rep(c(-0.2, +0.2), 7)
#Sample breakdown at class level
class.breakdown <- user.lvl %>%
group_by(course, semester) %>%
summarize(n = n(),
num_playbook = sum(pb_condition == "playbook"),
num_non_playbook = sum(pb_condition == "non-playbook"),
playbook_use_percentage = round(num_playbook/n, digits=3) *100,
.groups = "drop_last")
# # total students (double counting across classes)
# cat("Total number of students:", nrow(user.lvl))
#
# # total unique students (no double count)
# cat("Total number of unique students:", length(unique(user.lvl$user_id)))
### Note that we did not specially account for students enrolled in multiple classes
We examined 12065 students’ use (versus non-use) of the Exam Playbook across 14 introductory STEM and Economics classes over 2 consecutive (Fall and Winter) semesters. The 7 courses included in each semester were: Introductory Statistics, Introductory Biology, General Chemistry, General Physics, Introductory Programming (for Engineers), Introductory Programming (for Programmers), and Introductory Economics. A breakdown of sample demographics is presented in Supplemental Table 1.
Across both semesters, on average, 43.63% (SD = 29.28%; range: 5.6 - 91.4%) of students in each class engaged with the Exam Playbook at least once. We operationalized a “use” of the Exam Playbook to mean accessing and completing the intervention, which includes: completing the resource checklist, explaining why each resource would be useful, and planning resource use. That is, students had to click through to the end of the intervention to be counted as having used it (Supplementary Note 1 contains further details about how we defined and operationalized “use”). Apart from varying across classes, Exam Playbook use also varied between exams, as a student might choose to use it on one exam but not another. Note that the original intervention was only offered before 2 exams (i.e., 2 doses maximum), but in this translational study, it was offered before all available exams in each class, which could differ by class (with the exception of Physics Exam 4 when it was not offered). Table 1 gives a detailed breakdown of the number of times the Exam Playbook was offered and used on each exam across the different classes.
LABELS_ORDERED_FOR_TABLE <-
c("Introduction to Statistics",
"Introductory Biology", "General Chemistry", "General Physics",
"Introductory Programming (Engin)", "Elementary Programming",
"Introductory Economics")
class.breakdown %>%
arrange(match(course, LABELS_ORDERED_FOR_TABLE), semester) %>%
select(-num_non_playbook) %>%
rename(
Course = course,
Semester = semester,
`Class Size` = n,
`Number of users on any exam` = num_playbook,
`(Percentage)` = playbook_use_percentage
)
Note. “Any Exam” gives the number (and percentage) of students who used the Exam Playbook at least once in the class. Numbers for individual exams indicate percentage of students in the class who used the Exam Playbook on that exam. Classes had between 2 to 4 exams.
## Demographics
ALL_DEMO = user.lvl %>% group_by(semester) %>% summarize(n=n(), .groups="keep")
FALL_TOTAL = ALL_DEMO$n[ALL_DEMO$semester=="Fall"]
WINTER_TOTAL = ALL_DEMO$n[ALL_DEMO$semester=="Winter"]
gender.demo <- user.lvl %>%
mutate(gender = as.character(gender),
gender = ifelse(is.na(gender), "Not Indicated", gender),
gender = factor(gender, levels=c("Male", "Female", "Not Indicated"))) %>%
group_by(semester, gender) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = gender,
names_from = semester,
values_from = n) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
acad.lvl <- user.lvl %>%
mutate(academic.level = as.character(ACAD_LVL_BOT_SHORT_DES),
academic.level = ifelse(is.na(academic.level), "None", academic.level),
academic.level = factor(academic.level,
levels=c("Freshman", "Sophomore", "Junior",
"Senior", "USpec/NCFD", "None")),
academic.level = fct_collapse(academic.level,
`Not Indicated` = c("USpec/NCFD", "None"))) %>%
group_by(semester, academic.level) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = academic.level,
names_from = semester,
values_from = n) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
race.demo <- user.lvl %>%
mutate(race_table = as.character(race),
race_table = ifelse(is.na(race_table), "None", race_table),
race_table = as.factor(race_table),
race_table = fct_collapse(race_table,
Caucasian = c("White"),
`African-American` = c("Black"),
Others = c("Hawaiian", "Native Amr", "2 or More"),
`Not Indicated` = c("Not Indic", "None"))) %>%
group_by(semester, race_table) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = race_table,
names_from = semester,
values_from = n) %>%
arrange(match(race_table,
c("Caucasian", "African-American", "Hispanic",
"Asian", "Others", "Not Indicated") )) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
income <- user.lvl %>%
mutate(EST_GROSS_FAM_INC_CD =
recode(EST_GROSS_FAM_INC_CD, "Lower Income" = "Less than US$50,000",
"Middle Income" = "US$50,000 - US$99,999",
"Upper Income" = "More than US$100,000"),
income.level = as.character(EST_GROSS_FAM_INC_CD),
income.level = ifelse(is.na(income.level), "Not Indicated", income.level),
income.level = factor(income.level,
levels=c("Less than US$50,000",
"US$50,000 - US$99,999",
"More than US$100,000",
"Not Indicated"))) %>%
group_by(semester, income.level) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = income.level,
names_from = semester,
values_from = n) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
firstgen.demo <- user.lvl %>%
mutate(first.gen.status = as.character(firstgen),
first.gen.status = ifelse(is.na(first.gen.status), "Not Indicated", first.gen.status),
first.gen.status = factor(first.gen.status,
levels=c(1,0,"Not Indicated"),
labels=c("First-generation", "Non-First-generation",
"Not Indicated"))) %>%
group_by(semester, first.gen.status) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = first.gen.status,
names_from = semester,
values_from = n) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
multi.enrol <- user.lvl %>%
group_by(semester, multi_enroll) %>%
summarize(n = n(), .groups="keep") %>%
pivot_wider(id_cols = multi_enroll,
names_from = semester,
values_from = n) %>%
filter(multi_enroll == 1) %>%
mutate(Fall_percent = paste("(",
as.character(format(Fall/FALL_TOTAL*100, digits=3)),
"%)", sep=""),
Winter_percent = paste("(",
as.character(format(Winter/WINTER_TOTAL*100, digits=3)),
"%)", sep="")) %>%
unite(Fall, c("Fall", "Fall_percent"), sep = " ") %>%
unite(Winter, c("Winter", "Winter_percent"), sep = " ")
gender.demo
##Gender
# gender.demo <- user.lvl %>%
# group_by(semester, gender) %>%
# summarize(n = n(), .groups="keep")
#
# gender.demo$total <- NA
# gender.demo$total[which(gender.demo$semester == "Fall")] <- sum(gender.demo$n[which(gender.demo$semester == "Fall")])
# gender.demo$total[which(gender.demo$semester == "Winter")] <- sum(gender.demo$n[which(gender.demo$semester == "Winter")])
# gender.demo <- gender.demo %>% # add percentage
# rowwise() %>%
# mutate(percentage = n/total*100)
#
# #race
# race.demo <- user.lvl %>%
# group_by(semester, race) %>%
# count()
# race.demo$total <- NA
# race.demo$total[which(race.demo$semester == "Fall")] <- sum(race.demo$n[which(race.demo$semester == "Fall")])
# race.demo$total[which(race.demo$semester == "Winter")] <- sum(race.demo$n[which(race.demo$semester == "Winter")])
# race.demo <- race.demo %>% # add percentage
# rowwise() %>%
# mutate(percentage = n/total*100)
#
# #First Generation
# firstgen.demo <- user.lvl %>%
# group_by(semester, firstgen) %>%
# count()
#
# firstgen.demo$total <- NA
# firstgen.demo$total[which(firstgen.demo$semester == "Fall")] <- sum(firstgen.demo$n[which(firstgen.demo$semester == "Fall")])
# firstgen.demo$total[which(firstgen.demo$semester == "Winter")] <- sum(firstgen.demo$n[which(firstgen.demo$semester == "Winter")])
# firstgen.demo <- firstgen.demo %>% # add percentage
# rowwise() %>%
# mutate(percentage = n/total*100)
#
# #academic level
# acad.lvl <- user.lvl %>%
# group_by(semester, ACAD_LVL_BOT_SHORT_DES) %>%
# count()
# acad.lvl$total <- NA
# acad.lvl$total[which(acad.lvl$semester == "Fall")] <- sum(acad.lvl$n[which(acad.lvl$semester == "Fall")])
# acad.lvl$total[which(acad.lvl$semester == "Winter")] <- sum(acad.lvl$n[which(acad.lvl$semester == "Winter")])
# acad.lvl <- acad.lvl %>%
# rowwise() %>%
# mutate(percentage = n/total*100) # add percentage
#
# # multi enroll students
# multi.enrol <- user.lvl %>%
# group_by(semester, multi_enroll) %>%
# count()
# multi.enrol$total <- NA
# multi.enrol$total[which(multi.enrol$semester == "Fall")] <- sum(multi.enrol$n[which(multi.enrol$semester == "Fall")])
# multi.enrol$total[which(multi.enrol$semester == "Winter")] <- sum(multi.enrol$n[which(multi.enrol$semester == "Winter")])
# multi.enrol <- multi.enrol %>%
# rowwise() %>%
# mutate(percentage = n/total*100) # add percentage
#
# #income
# income <- user.lvl %>%
# group_by(semester, EST_GROSS_FAM_INC_CD) %>%
# count() %>%
# mutate(EST_GROSS_FAM_INC_CD = recode(EST_GROSS_FAM_INC_CD, "Lower Income" = "Less than US$50,000",
# "Middle Income" = "US$50,000 - US$99,999",
# "Upper Income" = "More than US$100,000"))
# income$total <- NA
# income$total[which(income$semester == "Fall")] <- sum(income$n[which(income$semester == "Fall")])
# income$total[which(income$semester == "Winter")] <- sum(income$n[which(income$semester == "Winter")])
# income <- income %>%
# rowwise() %>%
# mutate(percentage = n/total*100) # add percentage
#
# gender.demo
# race.demo
# firstgen.demo
# acad.lvl
# multi.enrol
# income
# initialising
pb_condition.effect <- data.frame(
estimate = rep(NA,length(unique(user.lvl$course_semester))),
se = rep(NA,length(unique(user.lvl$course_semester))),
course_semester = unique(user.lvl$course_semester),
standardized_estimate = rep(NA,length(unique(user.lvl$course_semester))),
standardized_se = rep(NA,length(unique(user.lvl$course_semester)))
)
for (i in 1:length(unique(user.lvl$course_semester))){ # regress by class
this_course_semester = pb_condition.effect$course_semester[i]
model.temp <- user.lvl %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg ~ pb_condition, data = .) %>%
summary()
pb_condition.effect$estimate[i] <-
model.temp$coefficients["pb_conditionplaybook", "Estimate"]
pb_condition.effect$se[i] <-
model.temp$coefficients["pb_conditionplaybook", "Std. Error"]
model.temp.standardized <- user.lvl %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg_standardized ~ pb_condition, data = .) %>%
summary()
pb_condition.effect$standardized_estimate[i] <-
model.temp.standardized$coefficients["pb_conditionplaybook", "Estimate"]
pb_condition.effect$standardized_se[i] <-
model.temp.standardized$coefficients["pb_conditionplaybook", "Std. Error"]
}
pb_condition.effect <- pb_condition.effect %>%
arrange(course_semester) %>% # arrange by course semester
mutate(
course = rep(ORDERED.LABELS, each=2),
semester = rep(c("Fall", "Winter"), 7 ),
course_num = COURSE_NUM_VECTOR
)
pb_condition.effect.summary <- metagen(pb_condition.effect$estimate,
pb_condition.effect$se)
pb_condition.effect.standardized.summary <- metagen(pb_condition.effect$standardized_estimate,
pb_condition.effect$standardized_se)
# look at "random effects model" of metagen output
semester_correlation = cor.test(
(pb_condition.effect %>% filter(semester == "Fall") %>% arrange(course))$estimate,
(pb_condition.effect %>% filter(semester == "Winter") %>% arrange(course))$estimate)
We tested the hypothesis that using the Exam Playbook benefits students’ exam performance, by comparing the average exam scores of students who used the Exam Playbook at least once in the class with students who did not use the Exam Playbook at all. Following recent recommendations in statistics and psychological science to move toward a focus on effect-size estimation (Wasserstein & Lazar, 2016; Brady et al., 2016; Cumming, 2014), we ran a “mini meta-analysis” (Goh et al., 2016) across the 14 classes using a random-effects meta-analysis model (Borenstein et al., 2010), treating each class as a separate “experiment” and with a mind towards analyzing heterogeneity across classes. This allowed us to estimate the generalizability of the effect across classes, as well as the variation due to inter-class differences—both of which are important for understanding how the Exam Playbook can benefit future students in various subjects.
Our meta-analysis, summarized in Figure 1, revealed that students who used the Exam Playbook in their class scored 2.17 ([95% CI: 1.13 3.21], p < .001) percentage points higher than non-users, for their average exam score (normalized and upon 100 percentage points). To put this effect size into context, a 2.17 percentage point difference translates to a standardized difference (Cohen’s d) of 0.18 –a substantial effect for a free, highly scalable, and self-administered intervention. As mentioned earlier, a difference of 0.2 is considered a large difference in field research on factors that predict educational outcomes, especially for low-cost and scalable interventions (Hill et al., 2008; Kraft et al., 2018; Yeager et al., 2019). As Figure 1 shows, the effect was positive in 13 out of 14 classes, and there was a high correlation of r = 0.87 (p= 0.01) between the effect sizes for each class across both semesters.
pb_condition.effect.plothelp <- data.frame(
"x.diamond" = c(
pb_condition.effect.summary$TE.random - 1.96*pb_condition.effect.summary$seTE.random,
pb_condition.effect.summary$TE.random,
pb_condition.effect.summary$TE.random + 1.96*pb_condition.effect.summary$seTE.random,
pb_condition.effect.summary$TE.random),
"y.diamond" = c(
8,
8 + 0.2,
8,
8 - 0.2)
)
## re-ordering based on effect size
pb_condition.effect.sum_by_class = pb_condition.effect %>%
group_by(course) %>%
summarise(es = mean(estimate), .groups="drop_last") %>%
arrange(es)
# 1 General Physics 0.0857
# 2 General Chemistry 0.734
# 3 Intro Economics 1.73
# 4 Intro Biology 1.90
# 5 Intro Programming (Engineers) 2.79
# 6 Intro Programming (Programmers) 2.85
# 7 Intro Statistics 5.83
# 1 General Physics 0.1234743
# 2 General Chemistry 0.7419212
# 3 Introductory Biology 1.3172378
# 4 Introductory Programming (Engin) 1.6031804
# 5 Introductory Economics 1.7309801
# 6 Elementary Programming 2.8814635
# 7 Introduction to Statistics 5.9574770
pb_condition.effect.reorder.by.es <- pb_condition.effect %>%
arrange(factor(course, levels =
c("General Physics", "General Chemistry",
"Introductory Biology", "Introductory Programming (Engin)",
"Introductory Economics", "Elementary Programming",
"Introduction to Statistics")), semester) %>%
mutate(course_num = COURSE_NUM_VECTOR)
reordered.labels.for.graph =
c("General Physics", "General Chemistry",
"Intro Biology", "Intro Programming (Engineers)",
"Intro Economics", "Intro Programming (Programmers)",
"Intro Statistics")
overall.meta.graph <- ggplot(pb_condition.effect.reorder.by.es) +
geom_point(aes(x = estimate, y = course_num, color = semester), size = 3) +
geom_errorbarh(aes(xmin=estimate - 1.96*se, xmax=estimate + 1.96*se, y = course_num, color = semester), height = .2) +
scale_colour_manual(values = c("black", "grey")) +
geom_vline(xintercept = 0, lty = 2) +
scale_y_reverse(breaks = c(1,2,3,4,5,6,7,8),
labels = c(reordered.labels.for.graph, "All Courses")) +
# plotting meta-analytic effect
geom_polygon(data = pb_condition.effect.plothelp,
aes(x = x.diamond, y = y.diamond), size = 0.1) +
scale_x_continuous(breaks = round(seq(
min(pb_condition.effect$estimate),
max(pb_condition.effect$estimate), by = 1),0)) +
labs(color = "Term") +
xlab("Difference in Average Exam Score") + ylab("Course") + theme_bw() +
theme(legend.position = "top",
legend.title = element_text(size=18),
legend.text = element_text(size=18),
axis.title = element_text(size=18),
axis.text.x = element_text(size=18),
axis.text.y = element_text(size=18))
#1200x800
overall.meta.graph
Note. Forest plot summarizing a meta-analysis of the effect of using the Exam Playbook on students’ averaged exam score. Data points represent the effect size for each class in each semester, with error bars representing 95% confidence intervals. The diamond in the last row represents the weighted meta-analytic effect size (Borenstein et al, 2010), and corresponds to a standardized effect size (Cohen’s d) of 0.18.
## 95%-CI %W(fixed) %W(random)
## 1 2.3954 [ 0.7904; 4.0005] 11.6 8.6
## 2 3.3675 [ 1.5931; 5.1419] 9.5 8.2
## 3 0.2999 [-1.7768; 2.3767] 6.9 7.5
## 4 1.1839 [-1.9213; 4.2891] 3.1 5.5
## 5 -0.4295 [-3.0960; 2.2370] 4.2 6.3
## 6 0.6764 [-1.7535; 3.1064] 5.1 6.8
## 7 5.1752 [ 3.2901; 7.0603] 8.4 7.9
## 8 6.7398 [ 4.7199; 8.7596] 7.3 7.6
## 9 0.7801 [-0.8758; 2.4361] 10.9 8.4
## 10 1.8544 [ 0.2344; 3.4743] 11.4 8.5
## 11 2.4872 [-0.0623; 5.0368] 4.6 6.5
## 12 0.9747 [-4.2239; 6.1733] 1.1 2.9
## 13 1.5722 [-0.1362; 3.2806] 10.3 8.3
## 14 1.6342 [-0.7188; 3.9872] 5.4 6.9
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model 2.2764 [1.7290; 2.8237] 8.15 < 0.0001
## Random effects model 2.1708 [1.1338; 3.2078] 4.10 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 2.6019 [0.6820; 8.6925]; tau = 1.6130 [0.8258; 2.9483]
## I^2 = 70.1% [48.3%; 82.7%]; H = 1.83 [1.39; 2.40]
##
## Test of heterogeneity:
## Q d.f. p-value
## 43.49 13 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
Two robustness checks further validated these results:
pb_condition.effect.act <- data.frame(
estimate = rep(NA, length(unique(user.lvl$course_semester))),
se = rep(NA, length(unique(user.lvl$course_semester))),
course_semester = unique(user.lvl$course_semester),
standardized_estimate = rep(NA, length(unique(user.lvl$course_semester))),
standardized_se = rep(NA, length(unique(user.lvl$course_semester)))
)
for (i in 1:length(unique(user.lvl$course_semester))){ # regress by class
this_course_semester = pb_condition.effect$course_semester[i]
model.temp <- user.lvl %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg ~ pb_condition + act_convtd, data = .) %>%
summary()
pb_condition.effect.act$estimate[i] <-
model.temp$coefficients["pb_conditionplaybook", "Estimate"]
pb_condition.effect.act$se[i] <-
model.temp$coefficients["pb_conditionplaybook", "Std. Error"]
model.temp.standardized <- user.lvl %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg_standardized ~ pb_condition + act_convtd, data = .) %>%
summary()
pb_condition.effect.act$standardized_estimate[i] <-
model.temp.standardized$coefficients["pb_conditionplaybook", "Estimate"]
pb_condition.effect.act$standardized_se[i] <-
model.temp.standardized$coefficients["pb_conditionplaybook", "Std. Error"]
}
pb_condition.effect.act <- pb_condition.effect.act %>%
arrange(course_semester) %>% # arrange by course semester
mutate(
course = rep(ORDERED.LABELS, each=2),
semester = rep(c("Fall", "Winter"), 7 ),
course_num = COURSE_NUM_VECTOR
)
pb_condition.effect.act.summary <- metagen(pb_condition.effect.act$estimate,
pb_condition.effect.act$se)
pb_condition.effect.act.standardized.summary <-
metagen(pb_condition.effect.act$standardized_estimate,
pb_condition.effect.act$standardized_se)
# look at "random effects model" of metagen output, coefficient = 0.1382 = 0.14
pb_condition.effect.act.summary
## 95%-CI %W(fixed) %W(random)
## 1 0.3400 [-1.5101; 2.1901] 8.3 7.8
## 2 0.6637 [-0.8883; 2.2157] 11.7 8.4
## 3 2.4514 [-0.5726; 5.4754] 3.1 5.7
## 4 -1.9284 [-6.7849; 2.9281] 1.2 3.4
## 5 5.0941 [ 3.4231; 6.7650] 10.1 8.2
## 6 0.3960 [-1.3400; 2.1320] 9.4 8.0
## 7 3.5881 [ 1.8214; 5.3549] 9.1 8.0
## 8 5.4986 [ 3.5716; 7.4257] 7.6 7.7
## 9 -0.4430 [-3.2561; 2.3702] 3.6 6.1
## 10 1.7817 [-0.6987; 4.2620] 4.6 6.7
## 11 1.6350 [-0.0721; 3.3421] 9.7 8.1
## 12 2.1058 [ 0.5521; 3.6596] 11.7 8.4
## 13 -0.3822 [-2.8881; 2.1238] 4.5 6.6
## 14 -0.5081 [-2.8037; 1.7875] 5.4 7.0
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model 1.8837 [1.3518; 2.4156] 6.94 < 0.0001
## Random effects model 1.6536 [0.5546; 2.7526] 2.95 0.0032
##
## Quantifying heterogeneity:
## tau^2 = 3.1240 [1.0687; 10.3060]; tau = 1.7675 [1.0338; 3.2103]
## I^2 = 74.9% [57.6%; 85.1%]; H = 2.00 [1.54; 2.59]
##
## Test of heterogeneity:
## Q d.f. p-value
## 51.75 13 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
One, controlling for students’ college entrance exam scores as a covariate (students in our sample were mostly freshmen who did not yet have college GPA), the overall meta-analytic trend remained consistent: Exam Playbook users scored an average of 1.65 ([95% CI: 0.55 2.75], Cohen’s d = 0.14, p = 0.003) percentage points higher than non-users on their average exam score. We tested demographic factors (gender, race/ethnicity and first-generation status) as potential moderators later in the Results.
meta_by_class <- NULL
for (i in 1:length(unique(exam.lvl$course_semester))){ # for each class
this_course_semester = pb_condition.effect$course_semester[i]
temp.class <- exam.lvl %>% filter(course_semester == this_course_semester)
# initialising
exam.lvl.effect <- data.frame(
estimate = rep(NA, length(unique(temp.class$exam_key))),
se = rep(NA, length(unique(temp.class$exam_key))),
standardized_estimate = rep(NA, length(unique(temp.class$exam_key))),
standardized_se = rep(NA, length(unique(temp.class$exam_key)))
)
for (j in 1:length(unique(temp.class$exam_key))){ # for each exam
this_exam_key = temp.class$exam_key[j]
temp.summary <- temp.class %>%
filter(exam_key == this_exam_key) %>%
lm(exam_score ~ pb_use, data = .) %>%
summary()
exam.lvl.effect$estimate[j] <- temp.summary$coefficients["pb_use", "Estimate"]
exam.lvl.effect$se[j] <- temp.summary$coefficients["pb_use", "Std. Error"]
temp.summary.standardized <- temp.class %>%
filter(exam_key == this_exam_key) %>%
lm(exam_score_standardized ~ pb_use, data = .) %>%
summary()
exam.lvl.effect$standardized_estimate[j] <-
temp.summary.standardized$coefficients["pb_use", "Estimate"]
exam.lvl.effect$standardized_se[j] <-
temp.summary.standardized$coefficients["pb_use", "Std. Error"]
}
exam.effect <- metagen(exam.lvl.effect$estimate, exam.lvl.effect$se)
exam.effect.standardized <- metagen(exam.lvl.effect$standardized_estimate,
exam.lvl.effect$standardized_se)
meta_by_class <- rbind(meta_by_class,
data.frame(
estimate = exam.effect$TE.fixed,
# fixed effects assumed within same class, but random across classes (later)
se = exam.effect$seTE.fixed,
course_semester = unique(exam.lvl$course_semester)[i],
standardized_estimate = exam.effect.standardized$TE.fixed,
standardized_se = exam.effect.standardized$seTE.fixed
))
}
meta_by_class <- meta_by_class %>%
arrange(course_semester) %>%
mutate(
course = rep(ORDERED.LABELS, each=2),
semester = rep(c("Fall", "Winter"), 7 ),
course_num = COURSE_NUM_VECTOR
)
# overall meta analysis effect
pb_exam.effect <- metagen(meta_by_class$estimate, meta_by_class$se)
pb_exam.effect.standardized <- metagen(meta_by_class$standardized_estimate,
meta_by_class$standardized_se)
pb_exam.effect
## 95%-CI %W(fixed) %W(random)
## 1 1.5121 [-0.4503; 3.4746] 3.3 7.0
## 2 1.4247 [-0.1663; 3.0156] 5.1 7.6
## 3 2.3944 [-0.3193; 5.1080] 1.7 5.8
## 4 3.3505 [-1.9505; 8.6515] 0.5 2.9
## 5 5.1616 [ 4.4124; 5.9107] 22.9 8.7
## 6 2.3550 [ 1.1187; 3.5913] 8.4 8.1
## 7 3.7110 [ 2.4598; 4.9622] 8.2 8.1
## 8 6.6469 [ 5.8268; 7.4669] 19.1 8.6
## 9 -0.5289 [-3.3710; 2.3132] 1.6 5.6
## 10 3.0059 [ 1.0460; 4.9659] 3.3 7.0
## 11 2.0888 [ 0.9656; 3.2120] 10.2 8.3
## 12 3.2683 [ 1.9651; 4.5714] 7.6 8.0
## 13 1.4636 [-0.7296; 3.6568] 2.7 6.6
## 14 3.0759 [ 1.5138; 4.6380] 5.3 7.6
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model 3.8930 [3.5343; 4.2517] 21.27 < 0.0001
## Random effects model 2.9095 [1.8130; 4.0060] 5.20 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 3.4622 [1.1343; 9.0824]; tau = 1.8607 [1.0650; 3.0137]
## I^2 = 87.4% [80.6%; 91.8%]; H = 2.82 [2.27; 3.50]
##
## Test of heterogeneity:
## Q d.f. p-value
## 103.12 13 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
Two, to supplement our class-level analyses, our results held when we examined Exam Playbook use on performance at the exam-level within class. A mixed-effects meta-analysis (with exam as a fixed effect and class as a random effect) across all 40 exams observed showed that students who used the Exam Playbook on a given exam scored an average of 2.91 ([95% CI: 1.81 4.01], Cohen’s d = 0.22, p < .001) percentage points higher than students who did not use the Exam Playbook on a given exam.
As shown in Figure 1, there was substantial heterogeneity in the estimated effect size of using the Exam Playbook across different classes. The average effect size was largest in the Introductory Statistics course ( 5.18 percentage points in Fall and 6.74 in Winter), which was the exact course for which the original intervention was designed and experimentally tested (Chen et al., 2017). Thus, this serves as an assessment of the effectiveness of the intervention when made freely available within the same class context (c.f. an RCT-based efficacy effect size of 3.64 and 4.21 percentage points in two studies in Chen et al, 2017).
### generalizability beyond intro stats
user.lvl.nostats <- user.lvl %>% filter(!(course %in% c("Introduction to Statistics")))
# initialising
pb_condition.effect.nostats <- data.frame(
estimate = rep(NA,length(unique(user.lvl.nostats$course_semester))),
se = rep(NA,length(unique(user.lvl.nostats$course_semester))),
course_semester = unique(user.lvl.nostats$course_semester),
standardized_estimate = rep(NA,length(unique(user.lvl.nostats$course_semester))),
standardized_se = rep(NA,length(unique(user.lvl.nostats$course_semester)))
)
pb_condition.effect.nostats.act = pb_condition.effect.nostats
for (i in 1:length(unique(user.lvl.nostats$course_semester))){ # regress by class
this_course_semester = pb_condition.effect.nostats$course_semester[i]
model.temp <- user.lvl.nostats %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg ~ pb_condition, data = .) %>%
summary()
pb_condition.effect.nostats$estimate[i] <- model.temp$coefficients[2, "Estimate"]
pb_condition.effect.nostats$se[i] <- model.temp$coefficients[2, "Std. Error"]
model.temp.standardized <- user.lvl.nostats %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg_standardized ~ pb_condition, data = .) %>%
summary()
pb_condition.effect.nostats$standardized_estimate[i] <-
model.temp.standardized$coefficients[2, "Estimate"]
pb_condition.effect.nostats$standardized_se[i] <-
model.temp.standardized$coefficients[2, "Std. Error"]
# controlling for covariates
model.temp.act <- user.lvl.nostats %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg ~ pb_condition + act_convtd, data = .) %>%
summary()
pb_condition.effect.nostats.act$estimate[i] <- model.temp.act$coefficients[2, "Estimate"]
pb_condition.effect.nostats.act$se[i] <- model.temp.act$coefficients[2, "Std. Error"]
model.temp.act.standardized <- user.lvl.nostats %>%
filter(course_semester == this_course_semester) %>%
lm(exam_score_avrg_standardized ~ pb_condition + act_convtd, data = .) %>%
summary()
pb_condition.effect.nostats.act$standardized_estimate[i] <-
model.temp.act.standardized$coefficients[2, "Estimate"]
pb_condition.effect.nostats.act$standardized_se[i] <-
model.temp.act.standardized$coefficients[2, "Std. Error"]
}
pb_condition.effect.nostats <- pb_condition.effect.nostats %>%
arrange(course_semester) # arrange by course semester
pb_condition.effect.nostats.summary <- metagen(pb_condition.effect.nostats$estimate,
pb_condition.effect.nostats$se)
pb_condition.effect.nostats.standardized.summary <-
metagen(pb_condition.effect.nostats$standardized_estimate,
pb_condition.effect.nostats$standardized_se)
# controlling for covariates
pb_condition.effect.nostats.act <- pb_condition.effect.nostats.act %>%
arrange(course_semester) # arrange by course semester
pb_condition.effect.nostats.act.summary <- metagen(pb_condition.effect.nostats.act$estimate,
pb_condition.effect.nostats.act$se)
pb_condition.effect.nostats.act.standardized.summary <-
metagen(pb_condition.effect.nostats.act$standardized_estimate,
pb_condition.effect.nostats.act$standardized_se)
The other courses allow us to examine the generalization of the Exam Playbook to different class contexts. As a conservative test of the generalizability of Exam Playbook use on exam performance beyond the Introductory Statistics course, we repeated our analyses using only the 6 other courses (12 classes total) excluding Introductory Statistics. On average, using the Exam Playbook still conferred benefits to students in these courses. The meta-analytic effect size was smaller and still significant: students who used the Exam Playbook scored an average of 1.6 ([95% CI: 1 2.19], d = 0.13, p < .001). percentage points higher than non-users. When controlling for college entrance exam scores, we observed a 1.07 percentage point difference ([95% CI: 0.29 1.85], d = 0.09, p = 0.007).
After Introductory Statistics, which had the highest use rates and effect sizes, students in the two Introductory Programming courses enjoyed the next-largest average benefits— 2.24 percentage points averaged across both semesters and both programming courses (we note that the Introductory Economics course had substantial differences in effect sizes and uptake across Fall and Winter semesters). On the other end of the spectrum, the smallest average effect sizes from using the Exam Playbook were observed in the General Physics and General Chemistry courses ( 0.12 percentage points averaged across both semesters for General Physics; 0.74 percentage points for General Chemistry).
One plausible reason for such heterogeneity at the class level could be how much the climate of the course supported such strategic resource use, including Exam Playbook-use. According to contemporary theorizing about psychological intervention effect heterogeneity, “change requires planting good seeds (more adaptive perspectives)… in fertile soil (a context with appropriate affordances)” (Walton & Yeager, 2020, emphasis ours). That is, perhaps the Exam Playbook was more useful to students who were in course climates more conducive to the psychology of the Exam Playbook.
Two possible operationalizations of this course climate (at the class-level) are peers’ uptake of the Exam Playbook (Powers et al., 2016; Yeager et al., 2019) and teachers’ degree of support toward engaging in the Exam Playbook as a useful learning resource (Matz et al., 2021)–both of which reflect powerful social norms that could influence students’ engagement with and degree of benefit from the Exam Playbook (Bierman et al., 2010; Walton & Yeager, 2020; Yeager et al., 2019).
We fit two separate linear models using (a) the average Exam Playbook usage (by course) and (b) the quantifiable presence/absence of extra course credit offered for engaging in the Exam Playbook, to predict the effect size for each class. Instructors in 4 of the 7 courses (specifically Introductory Statistics, Introductory Biology, Introductory Programming (Programmers), and Introductory Programming (Engineers)) incentivized the use of the Exam Playbook by offering bonus credit to students’ final course grade for using it. Importantly, however, these bonuses did not influence our main outcome measure: exam performance.
## course level data
course.lvl <- pb_condition.effect
course.lvl <- course.lvl %>% left_join(
(user.lvl %>% group_by(course_semester) %>%
summarize(pb_use_sum_gmc = mean(pb_use_sum), .groups="keep") %>%
ungroup), by = "course_semester"
)
lm.model.dosage.sum <- summary(lm(estimate ~ pb_use_sum_gmc, data=course.lvl))
lm.model.dosage.standardized.sum <- summary(lm(standardized_estimate ~ pb_use_sum_gmc, data=course.lvl))
lm.model.dosage.sum
##
## Call:
## lm(formula = estimate ~ pb_use_sum_gmc, data = course.lvl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0992 -0.4311 -0.2228 0.5016 1.9914
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1464 0.3498 0.418 0.683
## pb_use_sum_gmc 2.4858 0.3418 7.272 9.85e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8678 on 12 degrees of freedom
## Multiple R-squared: 0.815, Adjusted R-squared: 0.7996
## F-statistic: 52.88 on 1 and 12 DF, p-value: 9.846e-06
Indeed, the average Exam Playbook usage in a class (the peer norm) was positively associated with the effect size of using the Exam Playbook (b = 2.49 [95% CI: 1.82 3.16], d = 0.2, p < .001).
bonus.data <- data.frame(
course_semester = c("Elementary Programming Fall",
"Elementary Programming Winter",
"General Chemistry Fall",
"General Chemistry Winter",
"General Physics Fall",
"General Physics Winter",
"Introduction to Statistics Fall",
"Introduction to Statistics Winter",
"Introductory Biology Fall",
"Introductory Biology Winter",
"Introductory Economics Fall",
"Introductory Economics Winter",
"Introductory Programming (Engin) Fall",
"Introductory Programming (Engin) Winter"),
bonus = c(1,1,
0,0,
0,1,
1,1,
0,1,
0,0,
1,1)
)
course.lvl <- course.lvl %>% left_join(bonus.data, by = "course_semester")
lm.model.bonus.sum <- summary(lm(estimate~bonus, data = course.lvl))
lm.model.bonus.standardized.sum <- summary(lm(standardized_estimate~bonus, data = course.lvl))
lm.model.bonus.sum
##
## Call:
## lm(formula = estimate ~ bonus, data = course.lvl)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.2504 -1.2377 -0.3170 0.4057 3.8129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.8827 0.6926 1.275 0.2266
## bonus 2.0441 0.9162 2.231 0.0455 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.696 on 12 degrees of freedom
## Multiple R-squared: 0.2932, Adjusted R-squared: 0.2343
## F-statistic: 4.978 on 1 and 12 DF, p-value: 0.04552
Similarly, teacher support in the form of course credit incentives offered related to a larger effect size than when it was not offered (b = 2.04 [95% CI: 0.25 3.84], d = 0.17, p = 0.046).
Could differences in the extensiveness of resources provided or the kinds of resources most students selected to use (such as practice-based versus simple reading and memorization) have explained the variation in effect sizes across classes? Our data did not support either of these possibilities: the number of resources offered varied only slightly among classes (range: 11-15), and the types of resources that students selected the most for use were generally similar across classes (see Supplementary Note 2). Hence, we ruled out that that either of these factors strongly explained class-level heterogeneity.
One difficulty of observational (effectiveness) studies, compared to experimental (efficacy) studies, is teasing apart the effects of confounding variables. Methods such as matching and difference-in-difference modelling try to control for these effects. We conducted two analyses based on matching, that examined how intra-individual variation in Exam Playbook usage tracked changes in academic performance. We matched students using their background and behavior in the initial portion of the class, and then examined how subsequent behavior tracked exam performance. In these classes, there were natural variations in Exam Playbook usage. Some students started off not using the Exam Playbook, and picked up (or “adopted”) the Exam Playbook on later exams, while others used the Exam Playbook early on but dropped it later in the class (see Supplemental Table 2 for descriptives). These natural covariations allowed us to assess the average effect of “adopting” and “dropping” the Exam Playbook within individuals. If Exam Playbook usage benefits students’ performance, we should expect their exam performance to covary with students’ Exam Playbook usage patterns—with “adopting” and “dropping” associated with increased and decreased exam performance, respectively.
Using stratified matching (Austin, 2011), we matched these students on their initial exam performance (the first exam in the class), gender, race, first-generation status, and college entrance scores, and estimated the average effect of adopting and dropping the Exam Playbook on their subsequent exams. Because most of the activity of Exam Playbook usage within a class occurred within the first two exams of the class (94%), we restricted this analysis to only the first two exams of each class. Stratified matching analysis was performed for each class separately (13 classes) and we computed a meta-analytic estimate using a mixed-effects meta-analysis.
### extracting first two exams for intra-student analyses
exam.lvl.match <- exam.lvl %>%
filter(exam_key %in% c("Exam 1", "Exam 2")) %>%
mutate(time = ifelse(exam_key == "Exam 1", 0, 1))
playbk_use <- spread(exam.lvl[, c("user_source_id_sem", "pb_use", "exam_key")], key = exam_key, value = pb_use)
colnames(playbk_use) <- c("user_source_id_sem","exam1_pbuse","exam2_pbuse","exam3_pbuse","exam4_pbuse")
exam.lvl.match <- exam.lvl.match %>%
left_join(playbk_use, by="user_source_id_sem") %>%
mutate(
dropped_pb = ifelse(exam1_pbuse == 1 & exam2_pbuse == 0, 1,0),
picked_up = ifelse(exam1_pbuse == 0 & exam2_pbuse == 1, 1,0),
no.use = ifelse(exam1_pbuse == 0 & exam2_pbuse == 0, 1,0),
all.use = ifelse(exam1_pbuse == 1 & exam2_pbuse == 1, 1,0),
usage_pattern = factor(ifelse(dropped_pb == 1, "dropped",
ifelse(picked_up == 1, "adopted",
ifelse(no.use == 1, "never",
ifelse(all.use == 1, "consistent", NA)))))
)
exam.lvl.match.adopt <- exam.lvl.match %>%
filter(first_use_exam1==0) %>%
mutate(usage_pattern = factor(usage_pattern, levels = c("never", "adopted")))
match.adopt.df <- exam.lvl.match.adopt %>%
select(user_source_id_sem, exam_key, course_semester,
usage_pattern, exam_score, course, semester,
gender, act_convtd, race, firstgen) %>%
group_by(exam_key, course_semester) %>%
mutate(class_mean_for_exam = mean(exam_score, na.rm =T),
class_sd_for_exam = sd(exam_score, na.rm =T),
exam_key = recode(exam_key, `Exam 1` = "E1", `Exam 2` = "E2")) %>%
pivot_wider(id_cols = c(user_source_id_sem, course_semester, usage_pattern),
names_from = exam_key,
values_from = c(exam_score, class_mean_for_exam, class_sd_for_exam,
course, semester, gender, act_convtd, race, firstgen))
match.adopt.meta.df = data.frame()
for(course_name in unique(match.adopt.df$course_semester)) {
course_data_full <- match.adopt.df %>% filter(course_semester == course_name)
course_data <- na.omit(course_data_full)
#print(paste(nrow(course_data_full) - nrow(course_data), "observations were omitted in", course_name))
if(course_name != "Introductory Economics Winter"){
propensity <- matchit(
factor(usage_pattern) ~ exam_score_E1 + gender_E1 +
act_convtd_E1 + race_E1 + firstgen_E1,
data = course_data,
method = "subclass",
subclass = 5,
estimand = "ATE"
) #one to one matching - each treated with one control
matched_data <- match.data(propensity)
fit1 <- lm(exam_score_E2 ~ factor(usage_pattern) +
exam_score_E1 + gender_E1 + act_convtd_E1 +
race_E1 + firstgen_E1,
data = matched_data, weights = weights)
match_result <- coeftest(fit1, vcov. = vcovCL, cluster = ~subclass)
standardized_fit1 = lm(
I((exam_score_E2-class_mean_for_exam_E2)/class_sd_for_exam_E2) ~
factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 +
race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_standardized_result = coeftest(standardized_fit1,
vcov. = vcovCL, cluster = ~subclass)
}
# get class total
class_total <- user.lvl %>% filter(course_semester == course_name)
if(course_name != "Introductory Economics Winter"){
match.adopt.meta.df = rbind(match.adopt.meta.df,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = match_result[2,1],
se = match_result[2,2],
num = nrow(matched_data),
total = nrow(class_total),
standardized_estimate = match_standardized_result[2,1],
standardized_se = match_standardized_result[2,2]
))
} else {
match.adopt.meta.df = rbind(match.adopt.meta.df,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = NA,
se = NA,
num = NA,
total = NA,
standardized_estimate = NA,
standardized_se = NA
))
}
}
match.adopt.meta.df.summary <- metagen(match.adopt.meta.df$estimate,
match.adopt.meta.df$se)
match.adopt.meta.df.standardized.summary <-
metagen(match.adopt.meta.df$standardized_estimate,
match.adopt.meta.df$standardized_se)
match.adopt.meta.df.summary
## 95%-CI %W(fixed) %W(random)
## 1 3.3579 [-5.2049; 11.9207] 0.5 1.4
## 2 3.4906 [ 2.2136; 4.7675] 22.8 14.2
## 3 1.1054 [-2.3990; 4.6098] 3.0 6.1
## 4 0.7712 [-5.3556; 6.8980] 1.0 2.6
## 5 -0.4513 [-1.7195; 0.8169] 23.2 14.2
## 6 2.7423 [-1.3012; 6.7859] 2.3 5.0
## 7 4.3221 [-0.7391; 9.3833] 1.5 3.5
## 8 0.2929 [-2.0535; 2.6394] 6.8 9.5
## 9 NA 0.0 0.0
## 10 3.0995 [ 1.5058; 4.6932] 14.7 12.7
## 11 2.1940 [ 0.3793; 4.0088] 11.3 11.7
## 12 2.8098 [-1.8060; 7.4256] 1.7 4.1
## 13 -0.2757 [-4.9989; 4.4476] 1.7 3.9
## 14 0.9893 [-0.9845; 2.9631] 9.6 11.0
##
## Number of studies combined: k = 13
##
## 95%-CI z p-value
## Fixed effect model 1.7382 [1.1279; 2.3486] 5.58 < 0.0001
## Random effects model 1.7468 [0.6865; 2.8071] 3.23 0.0012
##
## Quantifying heterogeneity:
## tau^2 = 1.6369 [0.0000; 4.6298]; tau = 1.2794 [0.0000; 2.1517]
## I^2 = 54.3% [14.5%; 75.6%]; H = 1.48 [1.08; 2.02]
##
## Test of heterogeneity:
## Q d.f. p-value
## 26.24 12 0.0099
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
To estimate the average effect of adopting the Exam Playbook, we took the subset of students who did not use the Exam Playbook on their first exam. Of these, some students adopted the Exam Playbook on their second exam, while others did not. When matched on their first exam performance, college entrance scores, and demographics, students who adopted the Exam Playbook performed an average of 1.75 percentage points ([95% CI: 0.69 2.81], d = 0.12, p = 0.001) better on the second exam, compared to those who never used it (Figure 2 left panel).
# keep the same ordering as the course-level forest plot graph
match.adopt.meta.df.reorder <- match.adopt.meta.df %>%
arrange(factor(course, levels= c("General Physics", "General Chemistry",
"Introductory Biology", "Introductory Programming (Engin)",
"Introductory Economics", "Elementary Programming",
"Introduction to Statistics")), semester) %>%
mutate(course_num = COURSE_NUM_VECTOR)
#reordered.labels.for.graph = c("General Physics", "General Chemistry", "Intro Biology", "Intro Programming (Engineers)", "Intro Economics", "Intro Programming (Programmers)", "Intro Statistics")
### --- %%%
match.adopt.meta.df.plothelp <- data.frame(
x.diamond = c(
match.adopt.meta.df.summary$TE.random - 1.96*match.adopt.meta.df.summary$seTE.random,
match.adopt.meta.df.summary$TE.random,
match.adopt.meta.df.summary$TE.random + 1.96*match.adopt.meta.df.summary$seTE.random,
match.adopt.meta.df.summary$TE.random),
y.diamond = c(8,
8 + 0.2, # can change this 0.1 to make the diamond less fat.
8,
8 - 0.2)
)
reordered.labels.with.n = c(
"General Physics",
"General Chemistry",
"Intro Biology",
"Intro Programming (Engineers)",
"Intro Economics",
"Intro Programming (Programmers)",
"Intro Statistics"
)
sample.labels.adopt = c(
paste(match.adopt.meta.df.reorder$num[1], " (",
format(match.adopt.meta.df.reorder$num[1]/match.adopt.meta.df.reorder$total[1]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[2], " (",
format(match.adopt.meta.df.reorder$num[2]/match.adopt.meta.df.reorder$total[2]*100, digits=3), "%) ",
sep=""),
paste(match.adopt.meta.df.reorder$num[3], " (",
format(match.adopt.meta.df.reorder$num[3]/match.adopt.meta.df.reorder$total[3]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[4], " (",
format(match.adopt.meta.df.reorder$num[4]/match.adopt.meta.df.reorder$total[4]*100, digits=3), "%) ",
sep=""),
paste(match.adopt.meta.df.reorder$num[5], " (",
format(match.adopt.meta.df.reorder$num[5]/match.adopt.meta.df.reorder$total[5]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[6], " (",
format(match.adopt.meta.df.reorder$num[6]/match.adopt.meta.df.reorder$total[6]*100, digits=3), "%) ",
sep=""),
paste(match.adopt.meta.df.reorder$num[7], " (",
format(match.adopt.meta.df.reorder$num[7]/match.adopt.meta.df.reorder$total[7]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[8], " (",
format(match.adopt.meta.df.reorder$num[8]/match.adopt.meta.df.reorder$total[8]*100, digits=3), "%) ",
sep=""),
paste(match.adopt.meta.df.reorder$num[9], " (",
format(match.adopt.meta.df.reorder$num[9]/match.adopt.meta.df.reorder$total[9]*100, digits=3), "%), ",
sep=""),
"NA",
paste(match.adopt.meta.df.reorder$num[11], " (",
format(match.adopt.meta.df.reorder$num[11]/match.adopt.meta.df.reorder$total[11]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[12], " (",
format(match.adopt.meta.df.reorder$num[12]/match.adopt.meta.df.reorder$total[12]*100, digits=3), "%) ",
sep=""),
paste(match.adopt.meta.df.reorder$num[13], " (",
format(match.adopt.meta.df.reorder$num[13]/match.adopt.meta.df.reorder$total[13]*100, digits=3), "%), ",
sep=""),
paste(match.adopt.meta.df.reorder$num[14], " (",
format(match.adopt.meta.df.reorder$num[14]/match.adopt.meta.df.reorder$total[14]*100, digits=3), "%) ",
sep="")
)
matching.plot.adopt <- ggplot(match.adopt.meta.df.reorder) +
geom_point(aes(x = estimate, y = course_num, color = semester), size=3.5) +
geom_errorbarh(aes(xmin=estimate - 1.96*se, xmax=estimate + 1.96*se,
y = course_num, color = semester), height=.2) +
scale_colour_manual(values = c("black", "grey")) +
geom_vline(xintercept=0, lty=2) +
scale_x_continuous(breaks = round(seq(-10, 10, by = 5),0)) +
scale_y_reverse(breaks=c(1,2,3,4,5,6,7,8), labels=c(reordered.labels.with.n, "All Courses")) +
geom_polygon(data = match.adopt.meta.df.plothelp, aes(x = x.diamond, y = y.diamond), size = 0.1) +
#scale_x_continuous(breaks = round(seq(min(pb_condition.effect$estimate), max(pb_condition.effect$estimate), by = 1),0)) +
labs(color = "Term") +
#ggtitle("Effect of adopting the Exam Playbook") +
xlab("Adoption effect size \n (percentage points on Exam 2)") +
ylab("Course") +
theme_bw() +
theme(legend.position = "top",
legend.title = element_text(size=18),
legend.text = element_text(size=18),
axis.title = element_text(size=18),
axis.text.x = element_text(size=18),
axis.text.y = element_text(size=18))+
annotate("text", x = -33, y = 1.3, label = sample.labels.adopt[1], size = 5, color ="black") +
annotate("text", x = -23, y = 1.3, label = sample.labels.adopt[2], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 2.3, label = sample.labels.adopt[3], size = 5, color ="black") +
annotate("text", x = -23, y = 2.3, label = sample.labels.adopt[4], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 3.3, label = sample.labels.adopt[5], size = 5, color ="black") +
annotate("text", x = -23, y = 3.3, label = sample.labels.adopt[6], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 4.3, label = sample.labels.adopt[7], size = 5, color ="black") +
annotate("text", x = -23, y = 4.3, label = sample.labels.adopt[8], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 5.3, label = sample.labels.adopt[9], size = 5, color ="black") +
annotate("text", x = -23, y = 5.3, label = sample.labels.adopt[10], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 6.3, label = sample.labels.adopt[11], size = 5, color ="black") +
annotate("text", x = -23, y = 6.3, label = sample.labels.adopt[12], size = 5, color = "dark grey") +
annotate("text", x = -33, y = 7.3, label = sample.labels.adopt[13], size = 5, color ="black") +
annotate("text", x = -23, y = 7.3, label = sample.labels.adopt[14], size = 5, color = "dark grey") +
coord_cartesian(xlim = c(-16, 16), clip = "off")
# theme(text = element_text(size=25))
# 8.5 by 7, pdf
# matching.plot.adopt
exam.lvl.match.drop <- exam.lvl.match %>% filter(exam1_pbuse == 1)
match.drop.df <- exam.lvl.match.drop %>%
select(user_source_id_sem, exam_key, course_semester,
usage_pattern, exam_score, course, semester,
gender, act_convtd, race, firstgen) %>%
group_by(exam_key, course_semester) %>%
mutate(class_mean_for_exam = mean(exam_score, na.rm =T),
class_sd_for_exam = sd(exam_score, na.rm =T),
exam_key = recode(exam_key, `Exam 1` = "E1", `Exam 2` = "E2")) %>%
pivot_wider(id_cols = c(user_source_id_sem, course_semester, usage_pattern),
names_from = exam_key,
values_from = c(exam_score, class_mean_for_exam, class_sd_for_exam, course, semester, gender, act_convtd, race, firstgen))
match.drop.meta.df = data.frame()
for(course_name in unique(match.drop.df$course_semester)) {
course_data_full <- match.drop.df %>% filter(course_semester == course_name)
course_data <- na.omit(course_data_full)
#print(paste(nrow(course_data_full) - nrow(course_data), "observations were omitted in", course_name))
if(course_name != "Introductory Economics Winter"){
propensity <- matchit(
factor(usage_pattern) ~ exam_score_E1 + gender_E1 +
act_convtd_E1 + race_E1 + firstgen_E1,
data = course_data,
method = "subclass",
subclass = 5,
estimand = "ATE"
) #one to one matching - each treated with one control
matched_data <- match.data(propensity)
fit1 <- lm(exam_score_E2 ~ factor(usage_pattern) +
exam_score_E1 + gender_E1 + act_convtd_E1 +
race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_result <- coeftest(fit1, vcov. = vcovCL, cluster = ~subclass)
standardized_fit1 = lm(
I((exam_score_E2-class_mean_for_exam_E2)/class_sd_for_exam_E2) ~
factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 +
race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_standardized_result = coeftest(standardized_fit1,
vcov. = vcovCL, cluster = ~subclass)
}
# get class total
class_total <- user.lvl %>% filter(course_semester == course_name)
if(course_name != "Introductory Economics Winter"){
match.drop.meta.df = rbind(match.drop.meta.df,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = match_result[2,1],
se = match_result[2,2],
num = nrow(matched_data),
total = nrow(class_total),
standardized_estimate = match_standardized_result[2,1],
standardized_se = match_standardized_result[2,2]
))
}else{
match.drop.meta.df = rbind(match.drop.meta.df,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = NA,
se = NA,
num = NA,
total = NA,
standardized_estimate = NA,
standardized_se = NA
))
}
}
match.drop.meta.df.summary <- metagen(match.drop.meta.df$estimate,
match.drop.meta.df$se)
match.drop.meta.df.standardized.summary <-
metagen(match.drop.meta.df$standardized_estimate,
match.drop.meta.df$standardized_se)
match.drop.meta.df.summary
## 95%-CI %W(fixed) %W(random)
## 1 -2.2001 [ -3.6361; -0.7642] 33.9 20.8
## 2 1.3890 [ -5.1400; 7.9181] 1.6 3.2
## 3 -1.0785 [ -6.4041; 4.2470] 2.5 4.5
## 4 -2.9918 [ -4.6682; -1.3154] 24.9 18.9
## 5 -8.3627 [-14.9558; -1.7696] 1.6 3.1
## 6 2.2179 [ -1.5275; 5.9632] 5.0 7.9
## 7 -3.4175 [-15.1038; 8.2689] 0.5 1.1
## 8 -1.6616 [ -3.9388; 0.6156] 13.5 14.6
## 9 -3.2053 [ -6.5643; 0.1537] 6.2 9.2
## 10 -7.9290 [-14.8639; -0.9941] 1.5 2.9
## 11 5.0767 [ -3.9249; 14.0783] 0.9 1.8
## 12 -0.6983 [ -3.8643; 2.4677] 7.0 10.0
## 13 1.1305 [ -7.4606; 9.7216] 0.9 1.9
## 14 NA 0.0 0.0
##
## Number of studies combined: k = 13
##
## 95%-CI z p-value
## Fixed effect model -2.0695 [-2.9060; -1.2329] -4.85 < 0.0001
## Random effects model -1.8757 [-3.1120; -0.6394] -2.97 0.0029
##
## Quantifying heterogeneity:
## tau^2 = 1.3733 [0.0000; 18.2713]; tau = 1.1719 [0.0000; 4.2745]
## I^2 = 33.2% [0.0%; 65.5%]; H = 1.22 [1.00; 1.70]
##
## Test of heterogeneity:
## Q d.f. p-value
## 17.97 12 0.1166
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
To estimate the effect of dropping the Exam Playbook, we repeated this analysis on the subset of students who had used the Exam Playbook for their first exam. Of these students, some dropped the Exam Playbook on their second exam, while others continued using it. When matched on their first exam performance, college entrance scores, and demographics, students who dropped the Exam Playbook performed an average of -1.88 percentage points ([95% CI: -3.11 -0.64], d = -0.14, p = 0.003) worse, compared to those who kept using it (Figure 2 right panel).
# keep the same ordering as the course-level forest plot graph
match.drop.meta.df.reorder <- match.drop.meta.df %>%
arrange(factor(course, levels= c("General Physics", "General Chemistry", "Introductory Biology", "Introductory Programming (Engin)",
"Introductory Economics","Elementary Programming", "Introduction to Statistics")), semester) %>%
mutate(course_num = COURSE_NUM_VECTOR)
### --- %%%
match.drop.meta.df.plothelp <- data.frame(
x.diamond = c(match.drop.meta.df.summary$TE.random - 1.96*match.drop.meta.df.summary$seTE.random,
match.drop.meta.df.summary$TE.random,
match.drop.meta.df.summary$TE.random + 1.96*match.drop.meta.df.summary$seTE.random,
match.drop.meta.df.summary$TE.random),
y.diamond = c(8,
8 + 0.2, # can change this 0.1 to make the diamond less fat.
8,
8 - 0.2)
)
reordered.labels.with.n = c(
"General Physics",
"General Chemistry",
"Intro Biology",
"Intro Programming (Engineers)",
"Intro Economics",
"Intro Programming (Programmers)",
"Intro Statistics"
)
sample.labels.drop = c(
paste(match.drop.meta.df.reorder$num[1], " (",
format(match.drop.meta.df.reorder$num[1]/match.drop.meta.df.reorder$total[1]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[2], " (",
format(match.drop.meta.df.reorder$num[2]/match.drop.meta.df.reorder$total[2]*100, digits=3), "%) ",
sep=""),
paste(match.drop.meta.df.reorder$num[3], " (",
format(match.drop.meta.df.reorder$num[3]/match.drop.meta.df.reorder$total[3]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[4], " (",
format(match.drop.meta.df.reorder$num[4]/match.drop.meta.df.reorder$total[4]*100, digits=3), "%) ",
sep=""),
paste(match.drop.meta.df.reorder$num[5], " (",
format(match.drop.meta.df.reorder$num[5]/match.drop.meta.df.reorder$total[5]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[6], " (",
format(match.drop.meta.df.reorder$num[6]/match.drop.meta.df.reorder$total[6]*100, digits=3), "%) ",
sep=""),
paste(match.drop.meta.df.reorder$num[7], " (",
format(match.drop.meta.df.reorder$num[7]/match.drop.meta.df.reorder$total[7]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[8], " (",
format(match.drop.meta.df.reorder$num[8]/match.drop.meta.df.reorder$total[8]*100, digits=3), "%) ",
sep=""),
paste(match.drop.meta.df.reorder$num[9], " (",
format(match.drop.meta.df.reorder$num[9]/match.drop.meta.df.reorder$total[9]*100, digits=3), "%), ",
sep=""),
"NA",
paste(match.drop.meta.df.reorder$num[11], " (",
format(match.drop.meta.df.reorder$num[11]/match.drop.meta.df.reorder$total[11]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[12], " (",
format(match.drop.meta.df.reorder$num[12]/match.drop.meta.df.reorder$total[12]*100, digits=3), "%) ",
sep=""),
paste(match.drop.meta.df.reorder$num[13], " (",
format(match.drop.meta.df.reorder$num[13]/match.drop.meta.df.reorder$total[13]*100, digits=3), "%), ",
sep=""),
paste(match.drop.meta.df.reorder$num[14], " (",
format(match.drop.meta.df.reorder$num[14]/match.drop.meta.df.reorder$total[14]*100, digits=3), "%) ",
sep="")
)
matching.plot.drop.right <- ggplot(match.drop.meta.df.reorder) + #ggplot(pb_condition.effect) +
geom_point(aes(x=estimate, y = course_num, color = semester), size=3.5) +
geom_errorbarh(aes(xmin=estimate - 1.96*se, xmax=estimate + 1.96*se, y=course_num, color = semester), height=.2) +
scale_colour_manual(values = c("black", "grey")) +
geom_vline(xintercept=0, lty=2) +
scale_y_reverse(breaks=c(1,2,3,4,5,6,7,8), labels=c(reordered.labels.with.n, "All Courses"), position = "right") +
geom_polygon(data = match.drop.meta.df.plothelp, aes(x = x.diamond, y = y.diamond), size = 0.1) +
scale_x_continuous(breaks = round(seq(-10, 10, by = 5),0)) +
labs(color = "Term") +
#ggtitle("Effect of adopting the Exam Playbook") +
xlab("Dropping effect size \n (percentage points on Exam 2)") +
ylab("") +
theme_bw() +
theme(legend.position = "top",
legend.title = element_text(size=18),
legend.text = element_text(size=18),
axis.title = element_text(size=18),
axis.text.x = element_text(size=18),
axis.text.y = element_text(size=18))+
annotate("text", x = 23, y = 1.3, label = sample.labels.drop[1], size = 5, color ="black") +
annotate("text", x = 33, y = 1.3, label = sample.labels.drop[2], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 2.3, label = sample.labels.drop[3], size = 5, color ="black") +
annotate("text", x = 33, y = 2.3, label = sample.labels.drop[4], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 3.3, label = sample.labels.drop[5], size = 5, color ="black") +
annotate("text", x = 33, y = 3.3, label = sample.labels.drop[6], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 4.3, label = sample.labels.drop[7], size = 5, color ="black") +
annotate("text", x = 33, y = 4.3, label = sample.labels.drop[8], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 5.3, label = sample.labels.drop[9], size = 5, color ="black") +
annotate("text", x = 33, y = 5.3, label = sample.labels.drop[10], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 6.3, label = sample.labels.drop[11], size = 5, color ="black") +
annotate("text", x = 33, y = 6.3, label = sample.labels.drop[12], size = 5, color = "dark grey") +
annotate("text", x = 23, y = 7.3, label = sample.labels.drop[13], size = 5, color ="black") +
annotate("text", x = 33, y = 7.3, label = sample.labels.drop[14], size = 5, color = "dark grey") +
coord_cartesian(xlim = c(-16, 16), clip = "off")
# theme(text = element_text(size=25))
# 8.5 by 7, pdf
#matching.plot.drop.right
### extracting first two exams for intra-student analyses
exam.lvl.nostats <- exam.lvl %>%
filter(!(course %in% c("Introduction to Statistics")))
exam.lvl.match.nostats <- exam.lvl.nostats %>%
filter(exam_key %in% c("Exam 1", "Exam 2")) %>%
mutate(time = ifelse(exam_key == "Exam 1", 0, 1))
playbk_use.nostats <- spread(exam.lvl.nostats[, c("user_source_id_sem", "pb_use", "exam_key")], key = exam_key, value = pb_use)
colnames(playbk_use.nostats) <- c("user_source_id_sem","exam1_pbuse","exam2_pbuse","exam3_pbuse","exam4_pbuse")
exam.lvl.match.nostats <- exam.lvl.match.nostats %>%
left_join(playbk_use.nostats, by="user_source_id_sem") %>%
mutate(
dropped_pb = ifelse(exam1_pbuse == 1 & exam2_pbuse == 0, 1,0),
picked_up = ifelse(exam1_pbuse == 0 & exam2_pbuse == 1, 1,0),
no.use = ifelse(exam1_pbuse == 0 & exam2_pbuse == 0, 1,0),
all.use = ifelse(exam1_pbuse == 1 & exam2_pbuse == 1, 1,0),
usage_pattern = factor(ifelse(dropped_pb == 1, "dropped",
ifelse(picked_up == 1, "adopted",
ifelse(no.use == 1, "never",
ifelse(all.use == 1, "consistent", NA)))))
)
exam.lvl.match.adopt.nostats <- exam.lvl.match.nostats %>%
filter(first_use_exam1==0) %>%
mutate(usage_pattern = factor(usage_pattern, levels = c("never", "adopted")))
match.adopt.df.nostats <- exam.lvl.match.adopt.nostats %>%
select(user_source_id_sem, exam_key, course_semester,
usage_pattern, exam_score, course, semester, gender, act_convtd, race, firstgen) %>%
group_by(exam_key, course_semester) %>%
mutate(class_mean_for_exam = mean(exam_score, na.rm =T),
class_sd_for_exam = sd(exam_score, na.rm =T),
exam_key = recode(exam_key, `Exam 1` = "E1", `Exam 2` = "E2")) %>%
pivot_wider(id_cols = c(user_source_id_sem, course_semester, usage_pattern),
names_from = exam_key,
values_from = c(exam_score, class_mean_for_exam, class_sd_for_exam, course, semester, gender, act_convtd, race, firstgen))
match.adopt.meta.df.nostats = data.frame()
for(course_name in unique(match.adopt.df.nostats$course_semester)) {
course_data_full <- match.adopt.df.nostats %>% filter(course_semester == course_name)
course_data <- na.omit(course_data_full)
#print(paste(nrow(course_data_full) - nrow(course_data), "observations were omitted in", course_name))
if(course_name != "Introductory Economics Winter"){
propensity <- matchit(factor(usage_pattern) ~ exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1,
data=course_data,
method = "subclass",
subclass = 5,
estimand = "ATE"
) #one to one matching - each treated with one control
matched_data <- match.data(propensity)
fit1 <- lm(exam_score_E2 ~ factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_result <- coeftest(fit1, vcov. = vcovCL, cluster = ~subclass)
standardized_fit1 = lm( I((exam_score_E2-class_mean_for_exam_E2)/class_sd_for_exam_E2)
~ factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_standardized_result = coeftest(standardized_fit1, vcov. = vcovCL, cluster = ~subclass)
}
# get class total
class_total <- user.lvl.nostats %>% filter(course_semester == course_name)
if(course_name != "Introductory Economics Winter"){
match.adopt.meta.df.nostats = rbind(match.adopt.meta.df.nostats,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = match_result[2,1],
se = match_result[2,2],
num = nrow(matched_data),
total = nrow(class_total),
standardized_estimate = match_standardized_result[2,1],
standardized_se = match_standardized_result[2,2]
))
} else {
match.adopt.meta.df.nostats = rbind(match.adopt.meta.df.nostats,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = NA,
se = NA,
num = NA,
total = NA,
standardized_estimate = NA,
standardized_se = NA
))
}
}
match.adopt.meta.df.nostats.summary <- metagen(match.adopt.meta.df.nostats$estimate,
match.adopt.meta.df.nostats$se)
match.adopt.meta.df.nostats.standardized.summary <-
metagen(match.adopt.meta.df.nostats$standardized_estimate,
match.adopt.meta.df.nostats$standardized_se)
match.adopt.meta.df.nostats.summary
## 95%-CI %W(fixed) %W(random)
## 1 3.3579 [-5.2049; 11.9207] 0.7 1.5
## 2 1.1054 [-2.3990; 4.6098] 4.3 7.2
## 3 0.7712 [-5.3556; 6.8980] 1.4 2.8
## 4 -0.4513 [-1.7195; 0.8169] 32.9 20.7
## 5 2.7423 [-1.3012; 6.7859] 3.2 5.8
## 6 4.3221 [-0.7391; 9.3833] 2.1 4.0
## 7 NA 0.0 0.0
## 8 3.0995 [ 1.5058; 4.6932] 20.8 17.8
## 9 2.1940 [ 0.3793; 4.0088] 16.1 16.0
## 10 2.8098 [-1.8060; 7.4256] 2.5 4.7
## 11 -0.2757 [-4.9989; 4.4476] 2.4 4.5
## 12 0.9893 [-0.9845; 2.9631] 13.6 14.8
##
## Number of studies combined: k = 11
##
## 95%-CI z p-value
## Fixed effect model 1.3084 [0.5809; 2.0359] 3.53 0.0004
## Random effects model 1.5613 [0.4720; 2.6507] 2.81 0.0050
##
## Quantifying heterogeneity:
## tau^2 = 1.0705 [0.0000; 4.6600]; tau = 1.0346 [0.0000; 2.1587]
## I^2 = 38.3% [0.0%; 69.6%]; H = 1.27 [1.00; 1.82]
##
## Test of heterogeneity:
## Q d.f. p-value
## 16.21 10 0.0938
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
exam.lvl.match.drop.nostats <- exam.lvl.match.nostats %>% filter(exam1_pbuse == 1)
match.drop.df.nostats <- exam.lvl.match.drop.nostats %>%
select(user_source_id_sem, exam_key, course_semester,
usage_pattern, exam_score, course, semester, gender, act_convtd, race, firstgen) %>%
group_by(exam_key, course_semester) %>%
mutate(class_mean_for_exam = mean(exam_score, na.rm =T),
class_sd_for_exam = sd(exam_score, na.rm =T),
exam_key = recode(exam_key, `Exam 1` = "E1", `Exam 2` = "E2")) %>%
pivot_wider(id_cols = c(user_source_id_sem, course_semester, usage_pattern),
names_from = exam_key,
values_from = c(exam_score, class_mean_for_exam, class_sd_for_exam, course, semester, gender, act_convtd, race, firstgen))
match.drop.meta.df.nostats = data.frame()
for(course_name in unique(match.drop.df.nostats$course_semester)) {
course_data_full <- match.drop.df.nostats %>% filter(course_semester == course_name)
course_data <- na.omit(course_data_full)
#print(paste(nrow(course_data_full) - nrow(course_data), "observations were omitted in", course_name))
if(course_name != "Introductory Economics Winter"){
propensity <- matchit(factor(usage_pattern) ~ exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1,
data=course_data,
method = "subclass",
subclass = 5,
estimand = "ATE"
) #one to one matching - each treated with one control
matched_data <- match.data(propensity)
fit1 <- lm(exam_score_E2 ~ factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_result <- coeftest(fit1, vcov. = vcovCL, cluster = ~subclass)
standardized_fit1 = lm( I((exam_score_E2-class_mean_for_exam_E2)/class_sd_for_exam_E2)
~ factor(usage_pattern) + exam_score_E1 + gender_E1 + act_convtd_E1 + race_E1 + firstgen_E1, data = matched_data, weights = weights)
match_standardized_result = coeftest(standardized_fit1, vcov. = vcovCL, cluster = ~subclass)
}
# get class total
class_total <- user.lvl.nostats %>% filter(course_semester == course_name)
if(course_name != "Introductory Economics Winter"){
match.drop.meta.df.nostats = rbind(match.drop.meta.df.nostats,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = match_result[2,1],
se = match_result[2,2],
num = nrow(matched_data),
total = nrow(class_total),
standardized_estimate = match_standardized_result[2,1],
standardized_se = match_standardized_result[2,2]
))
}else{
match.drop.meta.df.nostats = rbind(match.drop.meta.df.nostats,
data.frame(course_semester = course_name,
course = course_data$course_E1[1],
semester = course_data$semester_E1[1],
estimate = NA,
se = NA,
num = NA,
total = NA,
standardized_estimate = NA,
standardized_se = NA
))
}
}
match.drop.meta.df.nostats.summary <- metagen(match.drop.meta.df.nostats$estimate,
match.drop.meta.df.nostats$se)
match.drop.meta.df.nostats.standardized.summary <-
metagen(match.drop.meta.df.nostats$standardized_estimate,
match.drop.meta.df.nostats$standardized_se)
match.drop.meta.df.nostats.summary
## 95%-CI %W(fixed) %W(random)
## 1 1.3890 [ -5.1400; 7.9181] 2.7 5.7
## 2 -1.0785 [ -6.4041; 4.2470] 4.1 7.8
## 3 -2.9918 [ -4.6682; -1.3154] 41.6 21.6
## 4 -8.3627 [-14.9558; -1.7696] 2.7 5.6
## 5 2.2179 [ -1.5275; 5.9632] 8.3 12.1
## 6 -3.4175 [-15.1038; 8.2689] 0.9 2.1
## 7 -1.6616 [ -3.9388; 0.6156] 22.5 18.5
## 8 -7.9290 [-14.8639; -0.9941] 2.4 5.2
## 9 5.0767 [ -3.9249; 14.0783] 1.4 3.3
## 10 -0.6983 [ -3.8643; 2.4677] 11.7 14.4
## 11 1.1305 [ -7.4606; 9.7216] 1.6 3.6
## 12 NA 0.0 0.0
##
## Number of studies combined: k = 11
##
## 95%-CI z p-value
## Fixed effect model -1.8777 [-2.9589; -0.7965] -3.40 0.0007
## Random effects model -1.5337 [-3.2917; 0.2243] -1.71 0.0873
##
## Quantifying heterogeneity:
## tau^2 = 2.9883 [0.0000; 29.9479]; tau = 1.7287 [0.0000; 5.4725]
## I^2 = 42.5% [0.0%; 71.6%]; H = 1.32 [1.00; 1.88]
##
## Test of heterogeneity:
## Q d.f. p-value
## 17.38 10 0.0664
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
Following our earlier conservative test of generalizability beyond Introductory Statistics, repeating this stratified matching analyses with the 6 other courses excluding Introductory Statistics, we still observed these effects of adopting and dropping the Exam Playbook—albeit with smaller effect sizes. When matched on their first exam performance, college entrance scores, and demographics, students who adopted the Exam Playbook performed an average of 1.56 percentage points ([95% CI: 0.47 2.65], d = 0.1, p = 0.005 better on the second exam, compared to those who never used it. When matched on their first exam performance, college entrance scores, and demographics, students who dropped the Exam Playbook performed an average of -1.53 percentage points ([95% CI: -3.29 0.22], d = -0.12, p = 0.087) worse, compared to those who kept using it (although this smaller effect of dropping was not significant at the .05 level).
# table(match.drop.df$course_semester, match.drop.df$usage_pattern)
LABELS_ORDERED_FOR_TABLE_S2 <-
c("Introduction to Statistics Fall",
"Introduction to Statistics Winter",
"Introductory Biology Fall",
"Introductory Biology Winter",
"General Chemistry Fall",
"General Chemistry Winter",
"General Physics Fall",
"General Physics Winter",
"Introductory Programming (Engin) Fall",
"Introductory Programming (Engin) Winter",
"Elementary Programming Fall",
"Elementary Programming Winter",
"Introductory Economics Fall",
"Introductory Economics Winter")
tableS2_df =
bind_rows((match.adopt.df %>% select(course_semester, usage_pattern)),
(match.drop.df %>% select(course_semester, usage_pattern))) %>%
filter(!is.na(usage_pattern)) %>%
count(usage_pattern) %>%
pivot_wider(id_cols = course_semester,
names_from = usage_pattern,
values_from = n) %>%
arrange(match(course_semester, LABELS_ORDERED_FOR_TABLE_S2)) %>%
select(course_semester, adopted, never, dropped, consistent) # %>%
# rename(
# Course_Semester = course_semester,
# `Number of students who adopted the Exam Playbook` = adopted,
# `Number of students who never used the Exam Playbook` = never,
# `Number of students who dropped the Exam Playbook` = dropped,
# `Number of students who consistently used the Exam Playbook` = consistent)
tableS2_df
Descriptives of the total number and percentages of (i) students who adopted the Exam Playbook, compared to (ii) students who never used the Exam Playbook; and (iii) students who dropped the Exam Playbook, compared to (iv) students who consistently used the Exam Playbook. Note: for Intro Economics, Winter, N was too small for this analysis
g_legend <- function(a.gplot){
tmp <- ggplot_gtable(ggplot_build(a.gplot))
leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
legend <- tmp$grobs[[leg]]
return(legend)}
mylegend <- g_legend(matching.plot.adopt)
#1600x800
# 17 by 8.5 pdf
grid.arrange(mylegend, arrangeGrob(matching.plot.adopt + theme(legend.position="none"),
matching.plot.drop.right + theme(legend.position="none"),
nrow=1),
nrow=2,heights=c(1, 10))
Note. Forest plot showing effect sizes from stratified matching analyses. Numbers below each course name indicate the number of students in that analysis (and as a percentage of the total class). Left: Effect of “adopting” the Exam Playbook. Both groups did not use the Exam Playbook at Exam 1; students who used it on Exam 2 outperformed students who did not. Right: Effect of “dropping” the Exam Playbook. Both groups used the Exam Playbook for Exam 1; students who dropped the Exam Playbook at Exam 2 did worse than students who consistently used it. Error bars reflect 95% confidence intervals.
Overall, these intra-individual data add further evidence to our meta-analyses suggesting that, on average, using the Exam Playbook predicts exam performance. We additionally describe in Supplementary Note 3 that these results also replicate using a difference-in-difference analytical method.
Next, we examined whether there were dosage and timing effects of using the Exam Playbook. Uptake of the Exam Playbook peaked between the first two exams, and then dropped thereafter if there were more than 2 exams in the course (see Table 1).
user.lvl$pb_use_sum_gmc <- NA
course_nExam <- c(2,2,4,4,3,3,3,3,4,3,3,3,2,1)
course_semester <- sort(unique(user.lvl$course_semester))
course.exam.data <- data.frame(course_semester= course_semester,
course_nExam = course_nExam)
user.lvl <- user.lvl %>% left_join(course.exam.data, by = "course_semester")
for (i in 1:length(course_semester)){
course_row <- which(user.lvl$course_semester == course_semester[i])
course_mean <- mean(user.lvl$pb_use_sum[course_row], na.rm = T)
user.lvl$pb_use_sum_gmc[course_row] <- course_mean
}
usage.df.estimate <- data.frame()
usage.df.se <- data.frame()
usage.df.standardized.estimate <- data.frame()
usage.df.standardized.se <- data.frame()
course_name <- unique(user.lvl$course_semester)
for (i in 1:length(course_name)){
temp.df <- user.lvl %>%
filter(course_semester == course_name[i]) %>%
filter(pb_condition == "playbook") # only include playbook users
lm.model.sum <- summary(lm(exam_score_avrg ~ pb_use_sum, data=temp.df))
usage.df.estimate <- bind_coef(usage.df.estimate, extract_coef(lm.model.sum$coefficients, "Estimate"), rownames(lm.model.sum$coefficients))
usage.df.se <- bind_coef(usage.df.se, extract_coef(lm.model.sum$coefficients, "Std. Error"), rownames(lm.model.sum$coefficients))
lm.model.standardized.sum <- summary(lm(exam_score_avrg_standardized ~ pb_use_sum, data=temp.df))
usage.df.standardized.estimate <-
bind_coef(usage.df.standardized.estimate,
extract_coef(lm.model.standardized.sum$coefficients, "Estimate"),
rownames(lm.model.standardized.sum$coefficients))
usage.df.standardized.se <-
bind_coef(usage.df.standardized.se,
extract_coef(lm.model.standardized.sum$coefficients, "Std. Error"),
rownames(lm.model.standardized.sum$coefficients))
}
rownames(usage.df.estimate) <- course_name
rownames(usage.df.se) <- course_name
# dosage effect
doses.df.summary = metagen(usage.df.estimate$pb_use_sum,
usage.df.se$pb_use_sum)
doses.df.standardized.summary =
metagen(usage.df.standardized.estimate$pb_use_sum,
usage.df.standardized.se$pb_use_sum)
doses.df.summary
## 95%-CI %W(fixed) %W(random)
## 1 0.2555 [-4.6370; 5.1480] 0.7 3.2
## 2 3.8151 [ 3.0840; 4.5462] 32.1 12.7
## 3 1.3786 [-0.5370; 3.2941] 4.7 9.1
## 4 2.7892 [-2.6005; 8.1790] 0.6 2.8
## 5 0.5003 [-1.1640; 2.1647] 6.2 9.9
## 6 1.1726 [-1.2834; 3.6287] 2.8 7.5
## 7 -0.0651 [-6.5771; 6.4469] 0.4 2.0
## 8 4.3172 [ 3.5679; 5.0665] 30.5 12.7
## 9 3.0897 [ 1.2848; 4.8946] 5.3 9.5
## 10 1.6115 [-6.8102; 10.0333] 0.2 1.3
## 11 1.0818 [-0.1722; 2.3358] 10.9 11.2
## 12 3.4569 [-0.0581; 6.9718] 1.4 5.1
## 13 -0.6488 [-3.6909; 2.3933] 1.9 6.1
## 14 3.3919 [ 0.6974; 6.0863] 2.4 6.9
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model 3.0882 [2.6742; 3.5022] 14.62 < 0.0001
## Random effects model 2.1848 [1.1792; 3.1905] 4.26 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 1.9332 [0.0000; 6.0690]; tau = 1.3904 [0.0000; 2.4635]
## I^2 = 72.3% [52.5%; 83.8%]; H = 1.90 [1.45; 2.48]
##
## Test of heterogeneity:
## Q d.f. p-value
## 46.86 13 < 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
Mixed-effects meta-analyses indicated that using the Exam Playbook on more occasions (i.e., higher dosages) related to better average exam performance (b = 2.18 percentage points ([95% CI: 1.18 3.19], d = 0.18, p < .001) among students who used the Exam Playbook—consistent with findings from the original efficacy experiments (Chen et al., 2017).
#sum(exam.lvl$time_left > 10, na.rm=T)
# truncated "> 10 days" to "10"
exam.lvl$time_left_truncate <- ifelse(exam.lvl$time_left > 10, 10, exam.lvl$time_left)
exam.lvl$time_left_truncate <- exam.lvl$time_left
#mean(exam.lvl$time_left_truncate, na.rm=T)
#sd(exam.lvl$time_left_truncate, na.rm=T)
timeleft.df.estimate <- data.frame()
timeleft.df.se <- data.frame()
timeleft.df.standardized.estimate <- data.frame()
timeleft.df.standardized.se <- data.frame()
exam_name <- unique(exam.lvl$course_semester_exam)
for (i in 1:length(exam_name)){
temp.df <- exam.lvl %>% filter(course_semester_exam == exam_name[i])
lm.model.sum <- summary(lm(exam_score ~ time_left_truncate , data=temp.df))
timeleft.df.estimate <- bind_coef(timeleft.df.estimate, extract_coef(lm.model.sum$coefficients, "Estimate"), rownames(lm.model.sum$coefficients))
timeleft.df.se <- bind_coef(timeleft.df.se, extract_coef(lm.model.sum$coefficients, "Std. Error"), rownames(lm.model.sum$coefficients))
lm.model.standardized.sum <- summary(
lm(exam_score_standardized ~ time_left_truncate , data=temp.df))
timeleft.df.standardized.estimate <-
bind_coef(timeleft.df.standardized.estimate,
extract_coef(lm.model.standardized.sum$coefficients, "Estimate"),
rownames(lm.model.standardized.sum$coefficients))
timeleft.df.standardized.se <-
bind_coef(timeleft.df.standardized.se,
extract_coef(lm.model.standardized.sum$coefficients, "Std. Error"),
rownames(lm.model.standardized.sum$coefficients))
}
rownames(timeleft.df.estimate) <- exam_name
rownames(timeleft.df.se) <- exam_name
# timing effect
timing.df.summary = metagen(timeleft.df.estimate$time_left,
timeleft.df.se$time_left)
timing.df.standardized.summary =
metagen(timeleft.df.standardized.estimate$time_left,
timeleft.df.standardized.se$time_left)
timing.df.summary
## 95%-CI %W(fixed) %W(random)
## 1 0.9596 [ 0.2572; 1.6620] 1.0 2.2
## 2 0.7031 [ 0.4141; 0.9921] 6.0 5.1
## 3 0.6368 [ 0.4110; 0.8626] 9.8 5.7
## 4 0.5028 [ 0.3000; 0.7056] 12.1 5.9
## 5 0.1337 [-0.1393; 0.4066] 6.7 5.2
## 6 0.4041 [ 0.0166; 0.7916] 3.3 4.2
## 7 -0.1786 [-1.1981; 0.8409] 0.5 1.2
## 8 0.3542 [-2.2472; 2.9557] 0.1 0.2
## 9 0.4802 [-0.0028; 0.9633] 2.1 3.4
## 10 0.6829 [ 0.0487; 1.3170] 1.2 2.5
## 11 NA 0.0 0.0
## 12 1.6171 [-0.0058; 3.2401] 0.2 0.5
## 13 -0.1069 [-0.6266; 0.4127] 1.8 3.2
## 14 0.2680 [-0.6484; 1.1843] 0.6 1.5
## 15 0.8406 [-0.2731; 1.9543] 0.4 1.1
## 16 -0.3042 [-1.8319; 1.2236] 0.2 0.6
## 17 -0.1642 [-0.5401; 0.2116] 3.5 4.3
## 18 0.0095 [-0.8381; 0.8570] 0.7 1.7
## 19 0.0249 [-0.8565; 0.9064] 0.6 1.6
## 20 0.3007 [-0.9502; 1.5517] 0.3 0.9
## 21 1.6518 [-2.5956; 5.8992] 0.0 0.1
## 22 0.7053 [ 0.4956; 0.9150] 11.3 5.8
## 23 0.9889 [ 0.7415; 1.2362] 8.1 5.5
## 24 0.7525 [ 0.5176; 0.9874] 9.0 5.6
## 25 0.4606 [ 0.0683; 0.8529] 3.2 4.1
## 26 0.1475 [-0.2689; 0.5640] 2.9 3.9
## 27 0.6968 [-1.1202; 2.5138] 0.2 0.4
## 28 0.4139 [-1.7009; 2.5288] 0.1 0.3
## 29 5.6044 [ 0.6067; 10.6021] 0.0 0.1
## 30 -0.0086 [-0.8374; 0.8203] 0.7 1.7
## 31 0.4205 [-0.0078; 0.8487] 2.7 3.8
## 32 0.2581 [-0.3048; 0.8211] 1.6 2.9
## 33 0.9589 [-0.3189; 2.2368] 0.3 0.8
## 34 -0.7208 [-2.3664; 0.9249] 0.2 0.5
## 35 0.2648 [-1.0279; 1.5575] 0.3 0.8
## 36 -0.4879 [-3.4504; 2.4747] 0.1 0.2
## 37 -2.1143 [-5.7587; 1.5301] 0.0 0.1
## 38 0.6195 [-0.6403; 1.8793] 0.3 0.9
## 39 0.1533 [-0.2819; 0.5885] 2.6 3.8
## 40 -0.3227 [-1.3085; 0.6631] 0.5 1.3
## 41 0.2277 [-0.1669; 0.6224] 3.2 4.1
## 42 0.4902 [-0.1208; 1.1013] 1.3 2.6
##
## Number of studies combined: k = 41
##
## 95%-CI z p-value
## Fixed effect model 0.5014 [0.4308; 0.5720] 13.92 < 0.0001
## Random effects model 0.4168 [0.2919; 0.5416] 6.54 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.0585 [0.0000; 0.2418]; tau = 0.2419 [0.0000; 0.4917]
## I^2 = 51.2% [30.2%; 65.9%]; H = 1.43 [1.20; 1.71]
##
## Test of heterogeneity:
## Q d.f. p-value
## 82.03 40 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
The Exam Playbook was made available to students up to 10 days prior to their exams. The average student who used the Exam Playbook engaged with it a week (M = 7.06 days, sd = 3.003 days) before their exams. We used time of usage (number of days before the exam) to predict exam performance at the exam-level. Students who used the Exam Playbook benefited more from using it earlier (b = 0.42 percentage points per day ([95% CI: 0.29 0.54], d = 0.03, p < .001) This suggests that early preparation is associated with better Exam Playbook effectiveness, although it could also reflect other motivation-relevant traits like better time-management and general self-regulatory ability (Steel, 2007). For example, students who used the Exam Playbook very close to the exam date might have procrastinated or crammed their exam preparation—reflecting lower self-regulation (Carvalho et al., 2020).
To better understand which students naturally used the Exam Playbook as a learning resource, we ran a mixed-effects logistic regression using academic ability (college entrance exam score) and demographic variables (gender, race, first-generation status) as predictors of whether students used the Exam Playbook at least once in their classes.
user.lvl <- user.lvl %>% mutate(
race = factor(race, levels=c("White", "Asian", "Black",
"Hawaiian", "Hispanic", "Native Amr",
"Not Indic", "2 or More")))
pb.use.mlm.logit <- glmer(
factor(pb_condition) ~ scale(act_convtd, scale=F) + race + gender +
firstgen + (1|course_semester),
data=user.lvl, family = "binomial")
pb.use.mlm.logit.releveled.sum = user.lvl %>%
mutate(race = fct_relevel(race, "Asian")) %>%
glmer(
factor(pb_condition) ~ scale(act_convtd, scale=F) + race + gender +
firstgen + (1|course_semester),
data=., family = "binomial") %>%
summary()
#summary(pb.use.mlm.logit)
Anova(pb.use.mlm.logit)
Academic ability did not significantly predict Exam Playbook usage (χ2 (1) = 0.24, p= 0.621) , which suggests that natural adoption of this Exam Playbook resource may not have been restricted to higher performers or simply more motivated students.
However, there were demographic differences in natural uptake of the Exam Playbook. Gender significantly predicted Exam Playbook adoption (χ2 (1) = 196.18, p< .001) : the odds of females using the Exam Playbook were 2.22 times higher than males.
Race also predicted Exam Playbook adoption (χ2 (7) = 21.78, p= 0.003) : in particular, Black and Hispanic students were less likely to use the Exam Playbook on their exams (Black students had 0.65 times the odds of using it compared to White students, p= 0.003, and 0.56 times the odds compared to Asian students, p< .001 ; Hispanic students had 0.79 times the odds of using it compared to White students, p= 0.026, and 0.68 times the odds of using it compared to Asian students, p= 0.001 ).
First-generation status did not predict Exam Playbook adoption (χ2 (1) = 0.79, p= 0.373 ).
Could certain groups of students have benefitted more (or less) from using the Exam Playbook? We fitted separate mixed-effects linear models to test the moderation effect of gender, race, and first-generation status on the effectiveness of using the Exam Playbook.
mod.mlm.gender <- lmer(exam_score_avrg ~ pb_condition*gender +
(1+pb_condition|course_semester), data=user.lvl)
mod.mlm.gender.sum <- summary(mod.mlm.gender)
mod.mlm.gender.standardized.sum <- summary(lmer(
exam_score_avrg_standardized ~ pb_condition*gender +
(1+pb_condition|course_semester), data=user.lvl))
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: exam_score_avrg ~ pb_condition * gender + (1 + pb_condition |
## course_semester)
## Data: user.lvl
##
## REML criterion at convergence: 94620.8
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.0319 -0.5671 0.1633 0.7274 3.2699
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## course_semester (Intercept) 33.913 5.823
## pb_conditionplaybook 3.422 1.850 0.04
## Residual 149.136 12.212
## Number of obs: 12054, groups: course_semester, 14
##
## Fixed effects:
## Estimate Std. Error df t value
## (Intercept) 72.6701 1.5813 13.5266 45.955
## pb_conditionplaybook 3.8710 0.6370 18.6039 6.077
## genderMale 3.8323 0.3404 11421.3388 11.259
## pb_conditionplaybook:genderMale -2.3502 0.4618 10755.4427 -5.089
## Pr(>|t|)
## (Intercept) 2.99e-16 ***
## pb_conditionplaybook 8.34e-06 ***
## genderMale < 2e-16 ***
## pb_conditionplaybook:genderMale 3.66e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) pb_cnd gndrMl
## pb_cndtnply -0.052
## genderMale -0.134 0.335
## pb_cndtnp:M 0.098 -0.395 -0.733
Gender significantly moderated Exam Playbook effects: while females generally performed worse than males (b = 3.83 ([95% CI: 3.17 4.5], d = 0.3, p < .001), as is commonly observed in STEM classes, female users benefitted 2.35 percentage points (b = 2.35 ([95% CI: 1.45 3.26], d = 0.19, p < .001) more from using the Exam Playbook than male users—a substantial 61.33% reduction in the gender gap.
mod.mlm.race <- lmer(exam_score_avrg ~ pb_condition*race +
(1+pb_condition|course_semester), data=user.lvl)
#summary(mod.mlm.race)
Anova(mod.mlm.race)
Race did not moderate Exam Playbook effects (χ2 (7) = 6.11, p= 0.527 ).
mod.mlm.firstgen.sum <- summary(lmer(exam_score_avrg ~ pb_condition*firstgen +
(1+pb_condition|course_semester), data=user.lvl))
mod.mlm.firstgen.standardized.sum <- summary(lmer(
exam_score_avrg_standardized ~ pb_condition*firstgen +
(1+pb_condition|course_semester), data=user.lvl))
## boundary (singular) fit: see ?isSingular
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: exam_score_avrg ~ pb_condition * firstgen + (1 + pb_condition |
## course_semester)
## Data: user.lvl
##
## REML criterion at convergence: 88666.5
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.1530 -0.5862 0.1630 0.7172 3.2758
##
## Random effects:
## Groups Name Variance Std.Dev. Corr
## course_semester (Intercept) 31.562 5.618
## pb_conditionplaybook 2.259 1.503 0.12
## Residual 147.769 12.156
## Number of obs: 11309, groups: course_semester, 14
##
## Fixed effects:
## Estimate Std. Error df t value Pr(>|t|)
## (Intercept) 76.1062 1.5146 13.1102 50.248 < 2e-16
## pb_conditionplaybook 1.6490 0.5187 13.9029 3.179 0.006745
## firstgen -7.0375 0.4670 11290.6318 -15.070 < 2e-16
## pb_conditionplaybook:firstgen 2.2524 0.6586 11288.9121 3.420 0.000629
##
## (Intercept) ***
## pb_conditionplaybook **
## firstgen ***
## pb_conditionplaybook:firstgen ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) pb_cnd frstgn
## pb_cndtnply 0.041
## firstgen -0.045 0.133
## pb_cndtnpl: 0.032 -0.183 -0.709
First-generation status significantly moderated Exam Playbook effects: while first-generation students generally performed worse than non-first-generation students (b = -7.04 ([95% CI: -7.95 -6.12], d = -0.57 p < .001), using the Exam Playbook reduced this gap by an average of 2.25 ([95% CI: 0.96 3.54], d = 0.18, p < .001, percentage points—a 32.01% reduction in the first-generation achievement gap.
As described in the main text, we operationalized a “use” of the Exam Playbook to mean accessing and completing the intervention, including: completing the resource checklist, explaining why each resource would be useful, and planning resource use. Students had to click through to the end of the intervention to be counted as having used it. In the table below, we detail how many instances there were of students who started using the Exam Playbook, and how many of those students finished it. For some classes, such as both Intro Programming classes, and Intro Statistics, over 83% of students who started the resource finished it. For other classes, the completion rates were lower, ranging from 30%-65%. In this paper, we counted only instances where students completed the Exam Playbook as a “use.”
Descriptives of the number of instances where students started the Exam Playbook and the number (and percentage) of those who completed using it, categorized by course and semester
examlvl.drop <- exam.lvl %>%
mutate(
#adding in column to identify students that started the pb
pb_use_start = ifelse(is.na(created), 0, 1),
pb_use_status = as.character(pb_use_status)
) %>%
mutate(
# adding 1 more level to pb_use_status: "started".
pb_use_status = ifelse(
pb_use_start == 1 & is.na(pb_use_status),
"started",
pb_use_status
)
)
#computing sum of students that started the survey
examlvl.drop.sum <- examlvl.drop %>%
group_by(semester, course, pb_use_start) %>%
count() %>%
filter(pb_use_start == 1) %>%
rename(tot_no_students_start = n) %>%
ungroup() %>%
select(-pb_use_start)
#creating wide table of each row's status
examlvl.drop.status <- examlvl.drop %>%
group_by(semester, course, pb_use_status) %>%
count() %>%
pivot_wider(names_from = pb_use_status, values_from = n) %>%
select(course, semester, started, intro, strat, plan, `NA`)
examlvl.drop.status.table <- examlvl.drop.status %>%
arrange(match(course, LABELS_ORDERED_FOR_TABLE), semester) %>%
mutate(Started.Playbook = started + intro + strat + plan,
Completed.Playbook = plan) %>%
select(course, semester, Started.Playbook, Completed.Playbook)
examlvl.drop.status.table
exam.lvl.did <- exam.lvl %>%
filter(source != "EECS280") %>%
filter(exam_key == "Exam 1"| exam_key == "Exam 2")
# creating late variable
exam.lvl.did$time <- ifelse(exam.lvl.did$exam_key == "Exam 1", 0, 1)
playbk_use <- spread(exam.lvl[, c("user_source_id_sem", "pb_use", "exam_key")],
key = exam_key, value = pb_use)
colnames(playbk_use) <- c("user_source_id_sem", "exam1_pbuse",
"exam2_pbuse", "exam3_pbuse", "exam4_pbuse")
# drop coursees who have less than 3 exams
# stats250 is dropped as well as it has many pb users in exam 3
exam.lvl.did <- exam.lvl.did %>%
left_join(playbk_use, by = "user_source_id_sem")
exam.lvl.did$dropped_pb <-
ifelse(exam.lvl.did$exam1_pbuse == 1& exam.lvl.did$exam2_pbuse == 0, 1,0)
exam.lvl.did$picked_up <-
ifelse(exam.lvl.did$exam1_pbuse == 0& exam.lvl.did$exam2_pbuse == 1, 1,0)
exam.lvl.did$no.use <-
ifelse(exam.lvl.did$exam2_pbuse == 0 & exam.lvl.did$exam2_pbuse == 0, 1,0)
exam.lvl.did$all.use <-
ifelse(exam.lvl.did$exam1_pbuse == 1 & exam.lvl.did$exam2_pbuse == 1, 1,0)
exam.lvl.did$usage_pattern <-
ifelse(exam.lvl.did$dropped_pb == 1, "dropped",
ifelse(exam.lvl.did$picked_up == 1, "adopted",
ifelse(exam.lvl.did$no.use == 1, "never",
ifelse(exam.lvl.did$all.use == 1, "consistent", NA)
)
)
)
An alternative method of assessing the effect of adopting or dropping the Exam Playbook is using a difference-in-differences (DiD) regression model (Angrist & Pischke, 2008). Here, we report our results using this model and show that it replicates our results obtained by stratified matching that were reported in the main text.
Similar to our analysis using stratified matching, we restricted our analyses to only the first two exams of each class. To estimate the effect of adopting the Exam Playbook, we took the subset of students who did not use the Exam Playbook on their first exam. For each class, we ran a separate DiD model, controlling for college entrance scores, gender, race, and first-generation status, and aggregated the regression estimates using a random-effects meta-analysis.
exam.lvl.did.adopt <- exam.lvl.did %>% filter(first_use_exam1==0)
course_name <- unique(exam.lvl.did.adopt$course_semester)
did.adopt.meta.df <- data.frame()
for (j in 1:length(course_name)){
course_data <- exam.lvl.did.adopt[exam.lvl.did.adopt$course_semester == course_name[j], ]
lm.did = lm(exam_score ~ first_use_exam2*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.sum <- summary(lm.did)
lm.did.standardized = lm(exam_score_standardized ~ first_use_exam2*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.standardized.sum <- summary(lm.did.standardized)
did.adopt.meta.df = rbind(
did.adopt.meta.df,
data.frame(course_semester = course_name[j],
course = course_data$course[1],
semester = course_data$semester[1],
estimate = lm.did.sum$coefficients["first_use_exam2:time",1],
se = lm.did.sum$coefficients["first_use_exam2:time",2],
standardized_estimate =
lm.did.standardized.sum$coefficients["first_use_exam2:time",1],
standardized_se =
lm.did.standardized.sum$coefficients["first_use_exam2:time",2],
num = length(unique(course_data$user_source_id))))
}
did.adopt.meta.summary <- metagen(did.adopt.meta.df$estimate,
did.adopt.meta.df$se)
did.adopt.meta.standardized.summary <-
metagen(did.adopt.meta.df$standardized_estimate,
did.adopt.meta.df$standardized_se)
did.adopt.meta.summary
## 95%-CI %W(fixed) %W(random)
## 1 3.3600 [ -8.3763; 15.0964] 1.1 1.1
## 2 3.5092 [ 0.9509; 6.0676] 22.8 22.8
## 3 0.6592 [ -3.4064; 4.7248] 9.0 9.0
## 4 5.2845 [ -1.4302; 11.9993] 3.3 3.3
## 5 -1.1632 [ -5.2776; 2.9512] 8.8 8.8
## 6 0.6803 [ -7.6546; 9.0153] 2.2 2.2
## 7 4.5707 [ -0.7871; 9.9285] 5.2 5.2
## 8 0.4575 [ -3.6937; 4.6087] 8.7 8.7
## 9 2.7865 [-20.7733; 26.3462] 0.3 0.3
## 10 1.8861 [ -1.3650; 5.1373] 14.1 14.1
## 11 2.8352 [ -1.2720; 6.9425] 8.9 8.9
## 12 3.4485 [ -1.2938; 8.1909] 6.6 6.6
## 13 0.9517 [ -5.2593; 7.1626] 3.9 3.9
## 14 -0.3368 [ -5.7809; 5.1073] 5.0 5.0
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model 2.0373 [0.8145; 3.2600] 3.27 0.0011
## Random effects model 2.0373 [0.8145; 3.2600] 3.27 0.0011
##
## Quantifying heterogeneity:
## tau^2 = 0 [0.0000; 1.2897]; tau = 0 [0.0000; 1.1357]
## I^2 = 0.0% [0.0%; 55.0%]; H = 1.00 [1.00; 1.49]
##
## Test of heterogeneity:
## Q d.f. p-value
## 7.85 13 0.8534
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
This was the model we ran on each class (for adopting):
lm(exam_score ~ adopted_playbook*time +
college_entrance_score + gender + race + first_gen,
data = subset(exam_lvl, did_not_use_playbook_on_exam1))
This analysis only includes the students who did not use the Exam Playbook in the first exam. “adopted_playbook” is a dummy-coded variable that indicates the students who started using the Exam Playbook on their second exam.
# remove students who used never used the pb or used pb only second time
exam.lvl.did.drop <- exam.lvl.did %>% filter(exam1_pbuse == 1)
course_name <- unique(exam.lvl.did.drop$course_semester)
did.drop.meta.df <- data.frame()
for (j in 1:length(course_name)){
course_data <- exam.lvl.did.drop[exam.lvl.did.drop$course_semester == course_name[j], ]
lm.did = lm(exam_score ~ dropped_pb*time +
act_convtd + gender+ firstgen +race, data = course_data)
lm.did.sum <- summary(lm.did)
lm.did.standardized = lm(exam_score_standardized ~ dropped_pb*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.standardized.sum <- summary(lm.did.standardized)
did.drop.meta.df = rbind(
did.drop.meta.df,
data.frame(course_semester = course_name[j],
course = course_data$course[1],
semester = course_data$semester[1],
estimate = lm.did.sum$coefficients["dropped_pb:time",1],
se = lm.did.sum$coefficients["dropped_pb:time",2],
standardized_estimate =
lm.did.standardized.sum$coefficients["dropped_pb:time",1],
standardized_se =
lm.did.standardized.sum$coefficients["dropped_pb:time",2],
num = length(unique(course_data$user_source_id))))
}
did.drop.meta.summary <- metagen(did.drop.meta.df$estimate,
did.drop.meta.df$se)
did.drop.meta.standardized.summary <-
metagen(did.drop.meta.df$standardized_estimate,
did.drop.meta.df$standardized_se)
did.drop.meta.summary
## 95%-CI %W(fixed) %W(random)
## 1 -1.2187 [ -4.2814; 1.8440] 19.5 19.5
## 2 -1.1308 [-11.8952; 9.6336] 1.6 1.6
## 3 -0.1330 [ -4.9575; 4.6914] 7.9 7.9
## 4 -2.0013 [ -7.5283; 3.5258] 6.0 6.0
## 5 -11.1290 [-23.8177; 1.5596] 1.1 1.1
## 6 2.3105 [ -2.7318; 7.3528] 7.2 7.2
## 7 -1.1984 [-15.6557; 13.2589] 0.9 0.9
## 8 -1.5021 [ -5.8467; 2.8424] 9.7 9.7
## 9 -4.1178 [ -6.4535; -1.7821] 33.5 33.5
## 10 -5.3506 [-19.0900; 8.3887] 1.0 1.0
## 11 4.5009 [ -4.9547; 13.9565] 2.0 2.0
## 12 -0.0568 [ -4.8614; 4.7478] 7.9 7.9
## 13 2.5000 [ -8.3498; 13.3498] 1.6 1.6
## 14 6.6667 [-26.5745; 39.9078] 0.2 0.2
##
## Number of studies combined: k = 14
##
## 95%-CI z p-value
## Fixed effect model -1.7969 [-3.1493; -0.4445] -2.60 0.0092
## Random effects model -1.7969 [-3.1493; -0.4445] -2.60 0.0092
##
## Quantifying heterogeneity:
## tau^2 = 0 [0.0000; 12.2986]; tau = 0 [0.0000; 3.5069]
## I^2 = 0.0% [0.0%; 55.0%]; H = 1.00 [1.00; 1.49]
##
## Test of heterogeneity:
## Q d.f. p-value
## 12.37 13 0.4972
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
This was the model we ran on each class (for dropping):
lm(exam_score ~ dropped_playbook*time +
college_entrance_score + gender + race + first_gen,
data = subset(exam_lvl, used_playbook_on_exam1))
This analysis only includes the students who used the Exam Playbook in the first exam. “dropped_playbook” is a dummy-coded variable that indicates the students who dropped the Exam Playbook on their second exam.
After students adopted the Exam Playbook, they performed better on the subsequent exam by an average of 2.04 percentage points ([95% CI: 0.81 3.26], d = 0.16, p = 0.001).
We repeated this analysis to estimate the effect of dropping the Exam Playbook, by taking the subset of students who used the Exam Playbook on their first exam. Controlling for college entrance scores, gender, race, and first-generation status, we estimated that after dropping the Exam Playbook, students performed worse by 1.8 percentage points ([95% CI: 0.44 3.15], d = 0.12, p = 0.009).
These estimates were consistent in terms of general direction and magnitude with the estimates from our analyses using stratified matching ( 1.75 percentage points, d = 0.12, for adopting and -1.88 percentage points, d = -0.14, for dropping).
exam.lvl.did.adopt.nostats <- exam.lvl.did %>%
filter(first_use_exam1==0) %>% filter(!(course %in% c("Introduction to Statistics")))
course_name <- unique(exam.lvl.did.adopt.nostats$course_semester)
did.adopt.meta.df.nostats <- data.frame()
for (j in 1:length(course_name)){
course_data <- exam.lvl.did.adopt.nostats[exam.lvl.did.adopt.nostats$course_semester == course_name[j], ]
lm.did = lm(exam_score ~ first_use_exam2*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.sum <- summary(lm.did)
lm.did.standardized = lm(exam_score_standardized ~ first_use_exam2*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.standardized.sum <- summary(lm.did.standardized)
did.adopt.meta.df.nostats = rbind(
did.adopt.meta.df.nostats,
data.frame(course_semester = course_name[j],
course = course_data$course[1],
semester = course_data$semester[1],
estimate = lm.did.sum$coefficients["first_use_exam2:time",1],
se = lm.did.sum$coefficients["first_use_exam2:time",2],
standardized_estimate =
lm.did.standardized.sum$coefficients["first_use_exam2:time",1],
standardized_se =
lm.did.standardized.sum$coefficients["first_use_exam2:time",2],
num = length(unique(course_data$user_source_id))))
}
did.adopt.meta.nostats.summary <- metagen(did.adopt.meta.df.nostats$estimate,
did.adopt.meta.df.nostats$se)
did.adopt.meta.nostats.standardized.summary <-
metagen(did.adopt.meta.df.nostats$standardized_estimate,
did.adopt.meta.df.nostats$standardized_se)
did.adopt.meta.nostats.summary
## 95%-CI %W(fixed) %W(random)
## 1 3.3600 [ -8.3763; 15.0964] 1.6 1.6
## 2 0.6592 [ -3.4064; 4.7248] 13.2 13.2
## 3 5.2845 [ -1.4302; 11.9993] 4.8 4.8
## 4 -1.1632 [ -5.2776; 2.9512] 12.9 12.9
## 5 0.6803 [ -7.6546; 9.0153] 3.1 3.1
## 6 4.5707 [ -0.7871; 9.9285] 7.6 7.6
## 7 2.7865 [-20.7733; 26.3462] 0.4 0.4
## 8 1.8861 [ -1.3650; 5.1373] 20.7 20.7
## 9 2.8352 [ -1.2720; 6.9425] 12.9 12.9
## 10 3.4485 [ -1.2938; 8.1909] 9.7 9.7
## 11 0.9517 [ -5.2593; 7.1626] 5.7 5.7
## 12 -0.3368 [ -5.7809; 5.1073] 7.4 7.4
##
## Number of studies combined: k = 12
##
## 95%-CI z p-value
## Fixed effect model 1.7464 [0.2689; 3.2240] 2.32 0.0205
## Random effects model 1.7464 [0.2689; 3.2240] 2.32 0.0205
##
## Quantifying heterogeneity:
## tau^2 = 0 [0.0000; 1.6932]; tau = 0 [0.0000; 1.3012]
## I^2 = 0.0% [0.0%; 58.3%]; H = 1.00 [1.00; 1.55]
##
## Test of heterogeneity:
## Q d.f. p-value
## 5.87 11 0.8819
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
# remove students who used never used the pb or used pb only second time
exam.lvl.did.drop.nostats <- exam.lvl.did %>%
filter(exam1_pbuse == 1) %>% filter(!(course %in% c("Introduction to Statistics")))
course_name <- unique(exam.lvl.did.drop.nostats$course_semester)
did.drop.meta.df.nostats <- data.frame()
for (j in 1:length(course_name)){
course_data <- exam.lvl.did.drop.nostats[exam.lvl.did.drop.nostats$course_semester == course_name[j], ]
lm.did = lm(exam_score ~ dropped_pb*time +
act_convtd + gender+ firstgen +race, data = course_data)
lm.did.sum <- summary(lm.did)
lm.did.standardized = lm(exam_score_standardized ~ dropped_pb*time +
act_convtd + gender+race + firstgen, data = course_data)
lm.did.standardized.sum <- summary(lm.did.standardized)
did.drop.meta.df.nostats = rbind(
did.drop.meta.df.nostats,
data.frame(course_semester = course_name[j],
course = course_data$course[1],
semester = course_data$semester[1],
estimate = lm.did.sum$coefficients["dropped_pb:time",1],
se = lm.did.sum$coefficients["dropped_pb:time",2],
standardized_estimate =
lm.did.standardized.sum$coefficients["dropped_pb:time",1],
standardized_se =
lm.did.standardized.sum$coefficients["dropped_pb:time",2],
num = length(unique(course_data$user_source_id))))
}
did.drop.meta.nostats.summary <- metagen(did.drop.meta.df.nostats$estimate,
did.drop.meta.df.nostats$se)
did.drop.meta.nostats.standardized.summary <-
metagen(did.drop.meta.df.nostats$standardized_estimate,
did.drop.meta.df.nostats$standardized_se)
did.drop.meta.nostats.summary
## 95%-CI %W(fixed) %W(random)
## 1 -1.1308 [-11.8952; 9.6336] 3.4 3.4
## 2 -0.1330 [ -4.9575; 4.6914] 16.7 16.7
## 3 -2.0013 [ -7.5283; 3.5258] 12.7 12.7
## 4 -11.1290 [-23.8177; 1.5596] 2.4 2.4
## 5 2.3105 [ -2.7318; 7.3528] 15.3 15.3
## 6 -1.1984 [-15.6557; 13.2589] 1.9 1.9
## 7 -1.5021 [ -5.8467; 2.8424] 20.6 20.6
## 8 -5.3506 [-19.0900; 8.3887] 2.1 2.1
## 9 4.5009 [ -4.9547; 13.9565] 4.4 4.4
## 10 -0.0568 [ -4.8614; 4.7478] 16.9 16.9
## 11 2.5000 [ -8.3498; 13.3498] 3.3 3.3
## 12 6.6667 [-26.5745; 39.9078] 0.4 0.4
##
## Number of studies combined: k = 12
##
## 95%-CI z p-value
## Fixed effect model -0.3806 [-2.3538; 1.5926] -0.38 0.7054
## Random effects model -0.3806 [-2.3538; 1.5926] -0.38 0.7054
##
## Quantifying heterogeneity:
## tau^2 = 0 [0.0000; 15.0829]; tau = 0 [0.0000; 3.8837]
## I^2 = 0.0% [0.0%; 58.3%]; H = 1.00 [1.00; 1.55]
##
## Test of heterogeneity:
## Q d.f. p-value
## 6.47 11 0.8406
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
If we exclude Introduction to Statistics to test the generalization of the Exam Playbook, we find that the difference-in-difference analysis still yields a significant positive effect of adoption. Controlling for college entrance scores, gender, race, and first-generation status, students who adopted the Exam Playbook performed better on the subsequent exam by an average of 1.75 percentage points ([95% CI: 0.27 3.22], d = 0.14, p = 0.021). However, excluding Introductory Statistics, we found that the difference-in-difference effect for dropping the playbook is not statistically significant at the 0.05 level (b = 0.38 percentage points ([95% CI: -1.59 2.35], d = 0.03, p = 0.705).
timeleft.notrunc.df.estimate <- data.frame()
timeleft.notrunc.df.se <- data.frame()
exam_name <- unique(exam.lvl$course_semester_exam)
for (i in 1:length(exam_name)){
temp.df <- exam.lvl %>%
filter(course_semester_exam == exam_name[i])
lm.model <- lm(exam_score ~ time_left , data=temp.df)
lm.model.sum <- summary(lm.model)
timeleft.notrunc.df.estimate <- bind_coef(timeleft.notrunc.df.estimate, extract_coef(lm.model.sum$coefficients, "Estimate"), rownames(lm.model.sum$coefficients))
timeleft.notrunc.df.se <- bind_coef(timeleft.notrunc.df.se, extract_coef(lm.model.sum$coefficients, "Std. Error"), rownames(lm.model.sum$coefficients))
}
rownames(timeleft.notrunc.df.estimate) <- exam_name
rownames(timeleft.notrunc.df.se) <- exam_name
timeleft.notrunc.df.summary = metagen(timeleft.notrunc.df.estimate$time_left,
timeleft.notrunc.df.se$time_left)
Due to logistical errors in communication between the intervention administration team and instructors, 137 (1.1% out of 12,065) students were accidentally given access to the Exam Playbook earlier than 10 days prior to their exams. Because the planned official release date was 10 days prior to the exam, and this was also the earliest timing that the vast majority of students could access the Exam Playbook via ECoach, in the main paper analysis we report analyses using a truncated “time_left” variable that ensured values fell between 0-10 (i.e., any value above 10 was replaced with 10). Nevertheless, we also repeated this analysis without truncation (i.e., using 15 days before the exam that was the maximum time that any student had accessed the Exam Playbook). Consistent with the main findings, students who used the Exam Playbook benefitted more from using it earlier (b = 0.42 percentage points per day ([95% CI: 0.29 0.54], p < .001) compared to b = 0.42 percentage points per day without truncation).
## 95%-CI %W(fixed) %W(random)
## 1 0.9596 [ 0.2572; 1.6620] 1.0 2.2
## 2 0.7031 [ 0.4141; 0.9921] 6.0 5.1
## 3 0.6368 [ 0.4110; 0.8626] 9.8 5.7
## 4 0.5028 [ 0.3000; 0.7056] 12.1 5.9
## 5 0.1337 [-0.1393; 0.4066] 6.7 5.2
## 6 0.4041 [ 0.0166; 0.7916] 3.3 4.2
## 7 -0.1786 [-1.1981; 0.8409] 0.5 1.2
## 8 0.3542 [-2.2472; 2.9557] 0.1 0.2
## 9 0.4802 [-0.0028; 0.9633] 2.1 3.4
## 10 0.6829 [ 0.0487; 1.3170] 1.2 2.5
## 11 NA 0.0 0.0
## 12 1.6171 [-0.0058; 3.2401] 0.2 0.5
## 13 -0.1069 [-0.6266; 0.4127] 1.8 3.2
## 14 0.2680 [-0.6484; 1.1843] 0.6 1.5
## 15 0.8406 [-0.2731; 1.9543] 0.4 1.1
## 16 -0.3042 [-1.8319; 1.2236] 0.2 0.6
## 17 -0.1642 [-0.5401; 0.2116] 3.5 4.3
## 18 0.0095 [-0.8381; 0.8570] 0.7 1.7
## 19 0.0249 [-0.8565; 0.9064] 0.6 1.6
## 20 0.3007 [-0.9502; 1.5517] 0.3 0.9
## 21 1.6518 [-2.5956; 5.8992] 0.0 0.1
## 22 0.7053 [ 0.4956; 0.9150] 11.3 5.8
## 23 0.9889 [ 0.7415; 1.2362] 8.1 5.5
## 24 0.7525 [ 0.5176; 0.9874] 9.0 5.6
## 25 0.4606 [ 0.0683; 0.8529] 3.2 4.1
## 26 0.1475 [-0.2689; 0.5640] 2.9 3.9
## 27 0.6968 [-1.1202; 2.5138] 0.2 0.4
## 28 0.4139 [-1.7009; 2.5288] 0.1 0.3
## 29 5.6044 [ 0.6067; 10.6021] 0.0 0.1
## 30 -0.0086 [-0.8374; 0.8203] 0.7 1.7
## 31 0.4205 [-0.0078; 0.8487] 2.7 3.8
## 32 0.2581 [-0.3048; 0.8211] 1.6 2.9
## 33 0.9589 [-0.3189; 2.2368] 0.3 0.8
## 34 -0.7208 [-2.3664; 0.9249] 0.2 0.5
## 35 0.2648 [-1.0279; 1.5575] 0.3 0.8
## 36 -0.4879 [-3.4504; 2.4747] 0.1 0.2
## 37 -2.1143 [-5.7587; 1.5301] 0.0 0.1
## 38 0.6195 [-0.6403; 1.8793] 0.3 0.9
## 39 0.1533 [-0.2819; 0.5885] 2.6 3.8
## 40 -0.3227 [-1.3085; 0.6631] 0.5 1.3
## 41 0.2277 [-0.1669; 0.6224] 3.2 4.1
## 42 0.4902 [-0.1208; 1.1013] 1.3 2.6
##
## Number of studies combined: k = 41
##
## 95%-CI z p-value
## Fixed effect model 0.5014 [0.4308; 0.5720] 13.92 < 0.0001
## Random effects model 0.4168 [0.2919; 0.5416] 6.54 < 0.0001
##
## Quantifying heterogeneity:
## tau^2 = 0.0585 [0.0000; 0.2418]; tau = 0.2419 [0.0000; 0.4917]
## I^2 = 51.2% [30.2%; 65.9%]; H = 1.43 [1.20; 1.71]
##
## Test of heterogeneity:
## Q d.f. p-value
## 82.03 40 0.0001
##
## Details on meta-analytical method:
## - Inverse variance method
## - DerSimonian-Laird estimator for tau^2
## - Jackson method for confidence interval of tau^2 and tau
In our analyses in the main text, we used a mixed-effects meta-analysis model to aggregate the effect size estimates across the different classes, treating each class as a separate “experiment”. We preferred this approach as we wanted to further examine heterogeneity across classes. An alternative analysis approach is mixed-effects hierarchical linear modelling, where we treat students as nested within course and semester. Here, we report our results using this alternative approach, using the lme4 package (v1.1-26; Bates et al., 2014), of estimation and show that we can draw similar conclusions.
mod.pb.effect.sum = summary(
lmer(exam_score_avrg ~ pb_condition + (1|course) + (1|semester), user.lvl))
mod.pb.effect.standardized.sum = summary(
lmer(exam_score_avrg_standardized ~ pb_condition + (1|course) + (1|semester), user.lvl))
## boundary (singular) fit: see ?isSingular
To estimate the effect of using the Exam Playbook, we used a dummy-coded variable indicating that a student used the Exam Playbook at least once throughout the semester (playbook_user) to predict their average exam score in the class. We added random effects by course and semester (Note: We tried fitting a model with course nested within semester, but the model reported a singular fit, suggesting that the random-effect structure is over-fitted.). Specifically, we ran the following model:
lmer(avg_exam_score ~ playbook_user + (1|course) + (1|semester), data= user_lvl)
Consistent with the meta-analysis model, we found that students who used the Exam Playbook outperformed students who did not (b = 2.07 percentage points ([95% CI: 1.51 2.64], d = 0.11, p < .001); compared to 2.17 percentage points, d = 0.18, estimated by meta-analysis).
mod.pb.effect.examlvl.sum =
summary(lmer(exam_score ~ pb_use +
(1|exam_key:course) + (1|user_id:course) + (1|semester), exam.lvl))
mod.pb.effect.examlvl.standardized.sum =
summary(lmer(exam_score_standardized ~ pb_use +
(1|exam_key:course) + (1|user_id:course) + (1|semester), exam.lvl))
## boundary (singular) fit: see ?isSingular
We did a further robustness check to repeat this analysis at the exam level:
lmer(exam_score ~ used_playbook + (1|exam:course) + (1|student:course) + (1|semester), data= exam_lvl)
We found that students who used the Exam Playbook on a given exam performed better than students who did not (b = 2.94 percentage points ([95% CI: 2.6 3.28], d = 0.12, p < .001); compared to 2.91 percentage points, d = 0.22, estimated by meta-analysis).
mod.mlm.dosage.sum = summary(lmer(exam_score_avrg ~ pb_use_sum +
(1|semester) + (1|course) ,
data=user.lvl %>% filter(pb_condition == "playbook")) )
mod.mlm.dosage.standardized.sum =
summary(lmer(exam_score_avrg_standardized ~ pb_use_sum +
(1|semester) + (1|course) ,
data=user.lvl %>% filter(pb_condition == "playbook")) )
## boundary (singular) fit: see ?isSingular
Dosage and Timing. To estimate the dosage effect, we considered the subset of Exam Playbook users, and used the number of times they used the Exam Playbook to predict their average exam score in the class. We added random effects by course and semester.
lmer(avg_exam_score ~ sum_playbook_usage + (1|course) + (1|semester), data= playbook_users)
We found that among students who used the Exam Playbook, using the Exam Playbook on more occasions related to better average exam performance (b = 3.33 percentage points ([95% CI: 2.9 3.76], d = 0.26, p < .001); compared to b = 2.18, d = 0.18 estimated via meta-analyses).
exam.lvl$time_left_trunc <- ifelse(exam.lvl$time_left>10,10,exam.lvl$time_left)
mod.mlm.timeleft.sum = summary(lmer(exam_score ~ time_left_trunc +
(1|exam_key:course:semester) + (1|course) + (1|semester),
data=exam.lvl) )
## boundary (singular) fit: see ?isSingular
mod.mlm.timeleft.standardized.sum =
summary(lmer(exam_score_standardized ~ time_left_trunc +
(1|exam_key:course:semester) + (1|course) + (1|semester),
data=exam.lvl) )
## boundary (singular) fit: see ?isSingular
To estimate how timing of usage affects exam performance, we again considered the subset of Exam Playbook users, but now examined performance on each individual exam. We defined a variable, “time_left”, which counts the number of days between the Exam Playbook usage and the exam itself. We used this to predict students’ exam score. Because this was at the exam level (which is nested within course and semester), we used the following random effect structure:
lmer(exam_score ~ time_left + (1|exam:course:semester) + (1|course) + (1|semester), data= playbook_users_exam_level)
We found that students who used the Exam Playbook benefited more from using it earlier (b = 0.53 percentage points per day ([95% CI: 0.46 0.61], d = 0.04, p < .001); compared to b=0.42, d = 0.03 estimated via meta-analyses).
## R version 4.0.1 (2020-06-06)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] GGally_2.1.1 gridExtra_2.3 sandwich_3.0-1 lmerTest_3.1-2
## [5] lmtest_0.9-37 zoo_1.8-8 MatchIt_4.2.0 meta_4.18-1
## [9] lme4_1.1-23 Matrix_1.2-18 forcats_0.5.0 stringr_1.4.0
## [13] dplyr_1.0.0 purrr_0.3.4 readr_1.3.1 tidyr_1.1.0
## [17] tibble_3.0.1 ggplot2_3.3.1 tidyverse_1.3.0 car_3.0-8
## [21] carData_3.0-4
##
## loaded via a namespace (and not attached):
## [1] nlme_3.1-148 fs_1.4.1 lubridate_1.7.9.2
## [4] RColorBrewer_1.1-2 httr_1.4.2 numDeriv_2016.8-1.1
## [7] tools_4.0.1 backports_1.2.1 R6_2.4.1
## [10] metafor_2.4-0 DBI_1.1.0 colorspace_1.4-1
## [13] withr_2.3.0 tidyselect_1.1.0 curl_4.3
## [16] compiler_4.0.1 cli_2.0.2 rvest_0.3.5
## [19] xml2_1.3.2 scales_1.1.1 digest_0.6.25
## [22] foreign_0.8-80 minqa_1.2.4 rmarkdown_2.2
## [25] rio_0.5.16 pkgconfig_2.0.3 htmltools_0.4.0
## [28] dbplyr_1.4.4 rlang_0.4.10 readxl_1.3.1
## [31] rstudioapi_0.11 farver_2.0.3 generics_0.0.2
## [34] jsonlite_1.7.2 zip_2.1.1 magrittr_1.5
## [37] Rcpp_1.0.6 munsell_0.5.0 fansi_0.4.1
## [40] abind_1.4-5 lifecycle_0.2.0 stringi_1.4.6
## [43] yaml_2.2.1 CompQuadForm_1.4.3 MASS_7.3-51.6
## [46] plyr_1.8.6 grid_4.0.1 blob_1.2.1
## [49] crayon_1.3.4 lattice_0.20-41 haven_2.3.1
## [52] splines_4.0.1 hms_0.5.3 knitr_1.28
## [55] pillar_1.4.4 boot_1.3-25 reprex_0.3.0
## [58] glue_1.4.1 evaluate_0.14 data.table_1.12.8
## [61] modelr_0.1.8 vctrs_0.3.1 nloptr_1.2.2.1
## [64] cellranger_1.1.0 gtable_0.3.0 reshape_0.8.8
## [67] assertthat_0.2.1 xfun_0.14 openxlsx_4.1.5
## [70] broom_0.5.6 statmod_1.4.34 ellipsis_0.3.1