Open science resources, analysis code, and useful tools for behavioral data science.
I am an advocate of reproducible data analysis practices, and support the publication of the data, analysis code, and research materials that contribute to a scientific finding.
A really handy tool is R Markdown,, which allows me to do all my analyses in R and then format the output (including programatically-generated tables and numbers!) via Markdown into a nice HTML or PDF document.
Many journals now require uploading data and analysis code to some online repository like OSF or ICPSR. One issue is that these repositories are functionally simple: they just offer (persistent, time-stamped) storage for data and code, but do not offer any interactivity. What I prefer to do is to use GitHub Pages to render the output HTML file that accompany my projects (often written in R Markdown). This results in a visually appealing---and reproducible---presentation of the main results.
Interactive R Shiny application for exploring the meta-analytic data from Yeo & Ong (2024)
View R Shiny App → GitHub →Analysis code for Chen et al. (2022) on real-world effectiveness of social-psychological interventions
View formatted HTML analysis→Analysis code and materials from Chen et al. (2021)
View formatted HTML analysis→ View on ICPSR→Analysis code and materials from Chen et al. (2017) on resource management in learning
View formatted HTML analysis→ View on OSF →Here are some code utilities and hacks I've developed over the years that might be useful for behavioral researchers, as well as some useful hacks that I've collected from various places around the internet
Python script to programmatically pay bonuses to workers on Amazon Mechanical Turk. Requires Amazon Command Line Tools.
View on GitHub →R function to programmatically download survey data from Qualtrics directly into your R environment.
View on GitHub →JavaScript implementation of a continuous IOS scale using Raphael.js for online experiments.
View on GitHub →
Simple one liner to replace identifiable information (e.g. mTurk workerids). (Has some cons, but it's fast.)
d0$workerid_random <- match(d0$workerid, unique(sort(d0$workerid)))
If you want to be even cooler and use a one-way hash function, here's a one-liner using the digest() function (from the digest package):
library(digest); d0$workerid_hash <- substr(sapply(as.character(d0$workerid), digest, algo="md5", serialize=F), 1, 6)
(the code above: (1) converts workerid into a character string, (2) uses sapply() to apply digest() vectorially, (3) takes the first 6 characters of the resulting string.)
A neat and simple trick to update R and re-install all your packages here.
Sometimes you just want to concatenate all the data files in a directory into one big file. If it's in a format like .csv, and you want to skip the first header, you can use the following command in Terminal:
awk 'FNR > 1' *.csv > combined_file.csv
If you crop a pdf file in Preview, it doesn't destructively crop it. The parts you cropped out are still hidden in the file (i.e. so you can undo cropping). I've found that this gives trouble with LaTeX when it doesn't recognize the bounding boxes. If you need to destructively crop the file, one way to do it is using Ghostscript. Let's say you want to crop "in.pdf" to "out.pdf" (note that you can't use the same filename, because of the way gs works), at the command line, type:
gs -sDEVICE=pdfwrite -dUseCropBox -sOutputFile=out.pdf - < in.pdf
Reference: http://electron.mit.edu/~gsteele/pdf/
[Postscript] Level 1 uses only ascii-coded RGB values, and is very wasteful, producing very large files. Level 2 includes support for JPEG encoded images, which produces much smaller files. Level 3 includes support for Zlib compression, making it well suited for making EPS files from png files.
In general, level 3 will produce the smallest files. Level 2 provides the best compatibility, and works well with jpeg images.
If you decide to use level 2 postscript, I recommend converting first to a jpg file. The "convert" program included Imagemagick uses a quality factor in "percent" that ranges from 0 to 100:
convert -quality 80 fig.png fig.jpg
I find a quality factor of 80 on high resolution images gives good compresssion without too much loss in quality. You can then to convert the image to eps using "convert" with the eps2 settings:
convert fig.jpg eps2:fig.eps
If you can use level 3 postscript, you can convert directly from png to eps:
convert fig.png eps3:fig.eps
Using level 3 postscript from a png image file for scientific figures will often produce a very small eps file. Ghostscript is compatible with these level 3 eps files, so this is often a good way to go.
Here's a simple Macro that you can use so that everytime Microsoft Word opens a new document, it does so at a specific zoom level. (Personally, I like 100% on my Retina Pro.).
1) In Word, Go to Tools->Macro->Macros.
2) On the dropbox after "Macros in", click Normal (Global Template)
3) Create a new Macro called AutoOpen [This particular name seems to be required for it to be run upon opening].
4) Paste the following macro in, where 100 is the desired zoom percentage.
Sub AutoOpen()
ActiveWindow.ActivePane.View.Zoom.Percentage = 100
End Sub
(refs: various places like this and this.)
I love having really high sensitivity on my mouse/trackpad. Unfortunately, the maximum that you can go in System Preferences isn't high enough for me. There is a way to increase this sensitivity further. In Terminal, typing:
defaults read -g com.apple.mouse.scaling
will give you the current value of your mouse scaling. You can modify it by changing read to write. For example, if you want to set your mouse scaling to 3.0 (the maximum in System Preferences), type:
defaults write -g com.apple.mouse.scaling 3.0
In addition, to change the trackpad, use com.apple.trackpad.scaling. You can also use .scrolling to change scrolling speed.