Statistics and Analytics for the Social and Computing Sciences
Draft Dated: 2020-06-17
This is a collection of notes on statistics and data analytics that I am compiling, with two goals:
To serve as a supplement to a course that I teach at the National University of Singapore (BT1101: Introduction to Business Analytics), which is a statistics course in R targeted at first-year undergraduate students who are aspiring data scientists. These sections will cover introductory material, and will be marked with BT1101 .
To discuss material that would be useful for graduate-level researchers in the social and computing sciences. This material will build upon the introductory level, and will be marked with Advanced
I hope to cover quite broadly several important and useful statistical tools (e.g. my top priorities right now are to cover multivariate regression and simulations), as well as discuss issues like data visualization best practices. I also plan to write several chapters on applying statistics in the computing sciences (for example, proper statistics when analyzing machine learning models). And finally, if I have time, I would like to transition to teaching statistics in a more Bayesian tradition.
As some background, I am a computational cognitive psychologist, with a little bit of training in econometrics, so I tend to favor regression and simulation approaches, and my examples may default to examples common in the social sciences.
Disclaimer: For students taking BT1101, please refer to these notes only if you are taking this course under me. If you are taking the course under a different instructor, that instructor’s lecture notes take precedence as to whether something is in syllabus or not (and hence, testable on assessments/exams). We are always making improvements to the syllabus, and so for different offerings of the course, instructors may cover slightly different material. So if you are taking it under a different instructor, do not assume that concepts covered here will show up on the exam, or assume that concepts not covered here will not show up on the exam. I’ve indicated sections that were covered the last time I taught BT1101 with a BT1101 label).
This is a work in progress that is inspired by Russ Poldrack’s Psych10 book here: http://statsthinking21.org/, which is another undergraduate Introduction to Statistics course. This set of notes is hosted on GitHub and built using Bookdown.
Feedback can be sent to dco (at) comp (dot) nus (dot) edu (dot) sg.
This material is shared under a Creative Commons Attribution Share Alike 4.0 International (CC-BY-SA-4.0) License. What this means is that you are free to copy, redistribute, and even adapt the material in this book, in any format or for any purpose, even commercial. Basically, this is a freely-available educational resource that you can share and use. The only conditions are (i) you must give appropriate credit, and if you made any changes, you must indicate so and not in any way that suggests that I endorse your changes, and (ii) if you transform or build upon the material here, you must also distribute this contributions under the same license.