5.7 Standardized Coefficients

Let’s take a short digression to discuss standardised coefficients. In all the examples in this Chapter, we’ve seen that it’s very important to be clear about what the units of measurement are, as this affects how we interpret the numbers.

For example, in $\text{Income} = b_0 + b_1 \text{YearsOfEducation}$, we can say for a 1 unit increase in $X_1$, so for one additional year of education, there is a $b_1$ unit increase in $Y$; so there is a $$b_1$ increase in income.

Unfortunately, sometimes we may have difficulty comparing $X$’s. Perhaps I have a dataset of American students and their standardized test scores on the SATs, and a corresponding dataset of Singaporean students and their standardized test scores on the O-levels³. I want to compare the datasets to predict how test scores ($X$) affect income ($Y$), but my $X$s here are on different scales. How should we compare them?

One way is by using standardised coefficients⁴.

To do so, we “standardize” each variable by subtracting its mean and dividing by its standard deviation. Then we just re-run the regression. Now, notice that I’ve replaced $b$’s with $\beta$s, and now these $\beta$s are unit-less. Or, to put it another way, they are in “standardized” units. By convention (although not everyone follows this), $b$ are used to refer to unstandardized regression coefficients while $\beta$s are used to refer to standardized regression coefficients.

\[\left[ \frac{Y-\bar{Y}}{\sigma_{Y}} \right] = \beta_0 + \beta_1 \left[ \frac{X_1 - \bar{X_1}}{\sigma_{X_1}} \right] + \beta_2 \left[ \frac{X_2 - \bar{X_2}}{\sigma_{X_2}} \right] + \ldots\]

Note: We can choose to standardise only the IVs, or only some of the IVs. The usual convention is that all the IVs and sometimes the DV are standardised.

Now, the interpretation is:

When $X_i$ increases by one standard deviation, there is a change in $Y$ of $\beta_i$ standard deviations

Example: if we run: \[\text{Income} = \beta_0 + \beta_1 \text{Education} + \beta_2 \text{Working Experience}\] as a standardized regression, and we find that $\beta_1$ = 0.5, then the interpretation is: “For every increase in education level by one standard deviation, holding working experience constant, there is an average increase in income by 0.5 standard deviations.”

With standardised coefficients, the interpretation changes, everything is now in standard deviations.

Sometimes, standardised coefficients makes it easier to interpret when the underlying unit is quite difficult to interpret, for example: IQ, the intelligence quotient, is actually a standardised variable itself such that 100 is the mean of the population and 15 is the standard deviation. It’s hard to come up with units of “absolute” intelligence, and so IQ is actually measured relative to others in the population.

Coming back to the example of comparing the results in an American sample with the SAT and a Singaporean sample with the O-levels; even if I cannot compare 1 point on the SAT with 1 point on the O-levels, with standardized coefficients I can still ask: does a 1 standard deviation increase in SAT score have the same effect (on whatever $Y$), as a 1 standard deviation increase in O-level score?

Here is some R code to ‘manually’ standardize variables and use them in a model

# unstandardized
lm(Y ~ X, df1)

# standardized
df1$X_standardized = scale(df1$X, center=TRUE, scale=TRUE)
df1$Y_standardized = scale(df1$Y, center=TRUE, scale=TRUE)

lm(Y_standardized ~ X_standardized, df1)

ignore the fact that the SAT and O-levels are at different levels↩︎
Confusingly, when we use standardised coefficients, it is the variables that get standardised, not the coefficients.↩︎