Introduction

This R Markdown file reproduces all the analyses, tables and figures produced in:

Chen, P., Ong, D. C., Ng, J., & Coppola, B. (in press). Explore, Exploit, and Prune in the Classroom: Strategic Resource Management Behaviors Predict Performance. AERA Open.

Methods

Table 1: Numbers and Percentages of Students who Participated Across the 4 Consecutive Cohorts

ResponseRates Year1 Year2 Year3 Year4
Enrolled in class 1172 1336 1438 1392
Exam 1 1136 (96.93%) 992 (74.25%) 1265 (87.97%) 1064 (76.44%)
Exam 2 1119 (95.48%) 994 (74.4%) 1300 (90.4%) 1097 (78.81%)
Exam 3 1123 (95.82%) 907 (67.89%) 1287 (89.5%) 1105 (79.38%)
At least one exam survey 1170 (99.83%) 1071 (80.16%) 1347 (93.67%) 1201 (86.28%)
All three exam surveys 1057 (90.19%) 853 (63.85%) 1194 (83.03%) 940 (67.53%)

Note that Table 2 (Descriptive Frequencies of Resource Use on the Prior Exam (Exams 1 and 2) and Percentages of Resources Explored, Exploited, and Pruned Out of Those Possible (on the Subsequent Exams 2 and 3, Respectively), Aggregating Across Cohorts)) is found later in the file due to the the “flow” of the code.

Interleaved Methods and Results

Resource use over time.

To first analyze how students’ resource use changed over time, we fit a simple linear model predicting the number of resources used at a given time point, assuming a linear coefficient on time, an integer-valued variable (t = 1, 2, 3). We applied the same model to analyze how students’ reported mean usefulness ratings changed over time. These two models were estimated with random effects of individual students nested within cohort (year):

\(\text{NumResourcesUsed}_{i,t,y} = b_0 + b_1 t + u_{i,y} + u_y + \epsilon_{i,t,y}\)

\(\text{MeanUsefulness}_{i,t,y} = b_0 + b_1 t + u_{i,y} + u_y + \epsilon_{i,t,y}\)

where \(u_{i,y}\) is the student-specific random intercept nested within year, \(u_y\) is a random intercept for year, and \(\epsilon_{i,t,y}\) is the residual error term. In R syntax, the models are:

lmer(numResUsed ~ examNum + (1|ID:Year) + (1|Year))

lmer(meanUsefulness ~ examNum + (1|ID:Year) + (1|Year))

Results: Aggregate Patterns of Resource Use and Usefulness

We begin by describing how students in our study interacted with their resources. Students showed a decreasing linear trend in the number of resources that they used to study for each of their 3 exams (Equation 2a). Across all four cohorts, students started off using an average of 7.9 (standard deviation, SD = 1.7) resources to study for Exam 1, 7.4 (SD = 1.7) resources for Exam 2, and 7 (SD = 1.7) resources for Exam 3, linear trend b = -.44, 95% CI = [-.47, -.42], p < .001. We illustrate students’ use and usefulness ratings for the 12 kinds of resources in Figure 1, aggregated across cohorts and categorized by exam. On average, some resources (e.g., attending the lecture and using the coursepack) were used more than others (e.g., lecture podcasts, Science Learning Center)—not surprisingly, these tended to also be the resources that a large proportion of students rated as “extremely useful” for their learning.

While this decreasing trend of number of resources used could be interpreted as decreasing motivation, our evidence suggests otherwise: Over the same period, students’ mean ratings of how useful their resources had been exhibited a positive linear trend over time—increasing from 4.09 (SD = 0.473) on Exam 1, to 4.17 (SD = 0.529) on Exam 2, and 4.21 (SD = 0.58) on Exam 3, linear trend b = .06, [.05, .07], p < .001 (Equation 2b). A score of 4 on our 5-point scale corresponds to “useful”. We inferred that, rather than necessarily being less invested, students were, on average, becoming possibly more focused and effective in their resource use over time. Evidence from our cognitive interviews with a randomly selected sample of students in the class who were not part of this study (see SOM “Survey Validation” for details) supports this idea that some students were strategically changing their resource use over time in the class. For example, one student shared that, “By exam 3 I wasn’t using [textbook problems]. I used those for the first exam, but found them not as helpful, so I stopped using those.” Next, we turn to how students increased their resource-use effectiveness by managing their resource use wisely from exam to exam.

[stats dump:]

  • Resource usage across all 4 years:
    • usage at examNum 1: M=7.86, SD=1.72,
    • usage at examNum 2: M=7.37, SD=1.69,
    • usage at examNum 3: M=6.99, SD=1.71,
    • b = -0.437 [-0.505, -0.37], t(3) = -12.7, p = 0.00112;
  • Mean usefulness across all 4 years
    • usefulness at examNum 1: M=4.09, SD=0.473,
    • usefulness at examNum 2: M=4.17, SD=0.529,
    • usefulness at examNum 3: M=4.21, SD=0.58,
    • b = 0.0599 [0.0507, 0.069], t(8842) = 12.8, p<.001;

Predicting Exam Performance

Ultimately, we were interested in whether students’ resource management behaviors between exams were associated with their exam performance. In a mixed-effects linear model, we regressed students’ current exam performance on their reported exploration, exploitation, and pruning behaviors, controlling for students’ performance on the prior exam as fixed effects. We added random intercepts by student nested within year, and time-point nested within year. Exam scores, the dependent variable, were converted into percentage scores out of 100 for all exams. Effect sizes (unstandardized b coefficients) can be interpreted in units of percentage points. Means and standard deviations of the three class exam scores are presented in Table S3. Thus, for student \(i\) at time \(t\) in year \(y\), we estimate the following model:

\(\text{Exam}_{i,t,y} = b_0 + b_1 \text{NumExplore}_{i,t,y} + b_2 \text{NumExploit}_{i,t,y} + b_3 \text{NumPrune}_{i,t,y} + b_4 \text{Exam}_{i,t-1,y} + u_{i,y} + u_{t,y} + u_y + \epsilon_{i,t,y}\)

Where \(\text{Exam}_{i,t,y}\) denotes the exam score of student \(i\) at time \(t\) in year \(y\), \(u_{i,y}\) gives the random intercept of student nested within year, \(u_{t,y}\) gives the random intercept of exam nested within year, and \(u_y\) gives the random intercept by year. Note that controlling for the exam performance at the previous time-point is conservative, and allows us to test if the resource regulatory behaviors (done between exams) explain exam performance over and above prior performance. In R syntax, this model is:

lmer(currentScorePercent ~ sum_explore + sum_exploit + sum_prune + pastScorePercent + (1|ID:Year) + (1|examNum:Year) + (1|Year))

(Note the above figure is not in the paper as the information already appears in Table 3)

Results: Predicting Exam Performance

We analyzed the effect of exploring, exploiting, and pruning behaviors on exam performance (Equation 3). Consistent with our hypotheses, the extent to which students engaged in each of these resource management behaviors positively predicted their exam performance. This was true for exploration (b = 0.85, [0.52, 1.18], p < .001), for exploitation (b = 0.91, [0.74, 1.08], p < .001), and for pruning (b = 0.75, [0.03, 1.48], p = .042). Table 3 presents the full statistical results. Following recent recommendations in psychological science and statistics to move away from Null Hypothesis Statistical Testing (e.g., Wasserstein, Schirm, & Lazar, 2019) and towards an “estimation” framework (e.g., Cumming, 2014), we provide p values for completeness, but we focus on interpreting the effect sizes, which are directly interpretable in terms of exam performance. Exploring one new resource was associated with an average of 0.85 percentage points increase in students’ performance on the current exam; exploiting one additional resource that was considered useful on the previous exam was associated with an average of 0.91 percentage points increase in students’ performance on the current exam; and pruning one additional resource that was found to be useless on the previous exam was associated with an average of 0.75 percentage points increase in students’ performance on the current exam.

Figure 2 visually illustrates how empirically-observed combinations of exploration, exploitation, and pruning related to changes in students’ exam performance. We observed that greater resource management was associated with larger changes in students’ performance on subsequent exams: Starting from the origin and moving out along each of the three axes, as learners report practicing more exploration, exploitation, and pruning, we see that their exam performance improves. Our findings underscore the adaptive, strategic nature of learners’ decisions to explore new resources, exploit previously useful resources, and prune previously useless resources between one exam to the next.

Table 3

  • Explore: b = 0.848 [0.515, 1.18], t(8486) = 4.99, p<.001;
  • Exploit: b = 0.911 [0.741, 1.08], t(8486) = 10.5, p<.001;
  • Prune: b = 0.752 [0.0271, 1.48], t(8486) = 2.03, p = 0.042;
  • pastScorePercent: b = 0.824 [0.806, 0.843], t(8488) = 86.6, p<.001;
  • Conditional R^2 after Nakagawa & Schielzeth (2013): 0.6011472

Robustness Check 1: Controlling for initial sum of resources used, as a proxy for engagement

In addition, we replicated our results when controlling for student engagement with the course resources (i.e., the total number of resources they used). Controlling for the number of resources students reported using at the beginning of the course (before Exam 1) as a proxy of their course engagement, as well as prior performance (i.e., adding total number of resources initially used as additional covariate to Equation 3), we find that greater exploration (b = 0.93 [0.59, 1.28], p<.001), exploitation (b = 0.85 [0.65, 1.05], p<.001), and pruning (b = 0.68 [-0.06, 1.43], p = .072) between exams still predicted students’ subsequent exam performance. The effect sizes in this analysis were relatively similar in magnitude, although we note that the coefficient on pruning is no longer statistically significant at the .05 level, suggesting that strategic resource management behaviors of exploring, exploiting, and pruning offer predictive value above and beyond a proxy of students’ sheer use of more course resources.

lmer(currentScorePercent ~ sum_explore + sum_exploit + sum_prune + exam1_sumres + pastScorePercent + (1|ID:Year) + (1|examNum:Year) + (1|Year))

(i.e., we added their sum of resources used on exam 1 as an additional covariate)

  • Explore: b = 0.934 [0.592, 1.28], t(8258) = 5.35, p<.001;
  • Exploit: b = 0.85 [0.652, 1.05], t(8258) = 8.42, p<.001;
  • Prune: b = 0.684 [-0.0599, 1.43], t(8258) = 1.8, p = 0.0716;
  • pastScorePercent: b = 0.82 [0.801, 0.838], t(8260) = 84.7, p<.001;
  • exam1_sumres: b = 0.158 [-0.0481, 0.364], t(8258) = 1.5, p = 0.133;
  • Conditional R^2 after Nakagawa & Schielzeth (2013): 0.5985226

Robustness check 2: Analyses using strict explore

Finally, our definition of exploration only required that students not use a resource on the preceding exam, but did not consider if they have not used it on all previous exams. We repeated all our analyses using a stricter definition of exploration that is trying a resource that students had not used on any previous exam, rather than just the preceding exam. Our results replicated, and the effect size on explore was in fact, stronger (b = 1.18 [0.78, 1.57], p<.001). However, we choose to retain our current (conservative) operationalization, using only the previous exam’s (non-)use, to be consistent with how we operationalized exploiting and pruning. We note that exploitation and pruning are theoretically well-defined even using only the previous exam.

lmer(currentScorePercent ~ sum_explore_STRICT + sum_exploit + sum_prune + pastScorePercent + (1|ID:Year) + (1|examNum:Year) + (1|Year))

  • Explore: b = 1.18 [0.784, 1.57], t(8488) = 5.86, p<.001;
  • Exploit: b = 0.911 [0.742, 1.08], t(8486) = 10.6, p<.001;
  • Prune: b = 0.75 [0.0265, 1.47], t(8486) = 2.03, p = 0.0422;
  • pastScorePercent: b = 0.824 [0.806, 0.843], t(8488) = 86.9, p<.001;
  • Conditional R^2 after Nakagawa & Schielzeth (2013):
    • 0.5977108

Figures

Figure 1: Resource Use Figure

Figure 2: Graphical Representation of the Frequency of Exploring, Exploiting and Pruning in Relation to Changes in Performance

Supplemental Figures and Tables

Table S1: Demographics of Students Who Participated in At Least One of Our Surveys Across the 4 Cohorts.

  Year1 Year2 Year3 Year4
Male 583 499 637 582
Female 587 572 708 616
Gender: Not Reported 0 0 2 3
Asian 220 212 258 236
Black 64 39 63 52
Hispanic 43 38 58 36
Native Amr 16 8 8 11
White 676 651 791 726
Race: Not Reported 151 123 169 140

Table S2: Breakdown of resource usefulness ratings, comparing past usefulness and current usefulness. These numbers are collapsed across all 4 cohorts and each of the two (past/current) time-points in class.

name current5 current4 current3 current2 current1 current0 currentAll
Past: Usefulness 5 18053 5172 706 123 339 1467 25860
Past: Usefulness 4 5834 12362 3032 432 195 3713 25568
Past: Usefulness 3 812 2828 2885 547 117 3217 10406
Past: Usefulness 2 122 343 474 327 97 803 2166
Past: Usefulness 1 255 139 83 67 106 291 941
Past: Didn’t use 1550 2367 1387 267 155 31319 37045
Past: All 26626 23211 8567 1763 1009 40810 101986

Table 2: Descriptive frequencies of resource use on the prior exam (Exams 1 and 2) and the percentages of those resources explored, exploited, and pruned out of those possible on the subsequent Exams 2 and 3, respectively, collapsing across cohorts.

Note. The numbers of resources reflect the mean numbers per student, averaged across all students per exam.

  Exam2 Exam3 Both
Average number of resources that were not used on the prior exam 4.13 4.62 4.39
Of these, number explored on current exam 0.706 0.644 0.686
Percentage Explored 16.9 14 16
Average number of resources that were rated useful (>3) on prior exam 6.17 6 6.07
Of these, number exploited on current exam 5.54 5.41 5.46
Percentage Exploited 90.1 90.4 90.1
Average number of resources that were rated useless (<3) on prior exam 0.395 0.34 0.37
Of these, number pruned on current exam 0.15 0.108 0.129
Percentage Pruned 40.8 37.9 42.3

Table S3: Means (and Standard Deviations) of Exams Scores

Exam Year1 Year2 Year3 Year4
Exam 1 65.4 (15.75) 67.3 (18.83) 67.14 (15.24) 76.76 (14.5)
Exam 2 70.6 (16.74) 73.06 (18.2) 74.71 (14.6) 68.49 (17.42)
Exam 3 62.36 (21.53) 55.41 (23.83) 59.38 (22.06) 51.58 (20.17)

Session Information

## R version 4.0.1 (2020-06-06)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] plyr_1.8.6     plotly_4.9.2.1 doBy_4.6.6     MuMIn_1.43.17  pander_0.6.3  
##  [6] lme4_1.1-23    Matrix_1.2-18  reshape2_1.4.4 corrplot_0.84  ggplot2_3.3.1 
## [11] tidyr_1.1.0    dplyr_1.0.0   
## 
## loaded via a namespace (and not attached):
##  [1] statmod_1.4.34      tidyselect_1.1.0    xfun_0.14          
##  [4] purrr_0.3.4         splines_4.0.1       lmerTest_3.1-2     
##  [7] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.1        
## [10] generics_0.0.2      htmltools_0.4.0     stats4_4.0.1       
## [13] viridisLite_0.3.0   yaml_2.2.1          rlang_0.4.6        
## [16] pillar_1.4.4        nloptr_1.2.2.1      glue_1.4.1         
## [19] withr_2.2.0         lifecycle_0.2.0     stringr_1.4.0      
## [22] munsell_0.5.0       gtable_0.3.0        htmlwidgets_1.5.1  
## [25] evaluate_0.14       labeling_0.3        knitr_1.28         
## [28] crosstalk_1.1.0.1   broom_0.5.6         Rcpp_1.0.4.6       
## [31] scales_1.1.1        backports_1.1.7     jsonlite_1.6.1     
## [34] farver_2.0.3        Deriv_4.0           digest_0.6.25      
## [37] stringi_1.4.6       numDeriv_2016.8-1.1 grid_4.0.1         
## [40] tools_4.0.1         magrittr_1.5        lazyeval_0.2.2     
## [43] tibble_3.0.1        crayon_1.3.4        pkgconfig_2.0.3    
## [46] ellipsis_0.3.1      MASS_7.3-51.6       data.table_1.12.8  
## [49] minqa_1.2.4         rmarkdown_2.2       httr_1.4.1         
## [52] R6_2.4.1            boot_1.3-25         nlme_3.1-148       
## [55] compiler_4.0.1