A Primer on Interaction Effects
in Multiple Linear Regression
quantpsy.org
© 2010-2014,
Kristopher J. Preacher

A primer on interaction effects in multiple linear regression
Kristopher J. Preacher (Vanderbilt University)

This primer is divided into 6 sections:

  1. Two-way interaction effects in MLR
  2. Regions of significance
  3. Plotting and probing higher order interactions
  4. Centering variables
  5. Cautions regarding interactions in standardized regression
  6. References

Two-Way Interaction Effects in MLR

An interaction occurs when the magnitude of the effect of one independent variable (X) on a dependent variable (Y) varies as a function of a second independent variable (Z). This is also known as a moderation effect, although some have more strict criteria for moderation effects than for interactions. Interactions occur potentially in situations involving univariate analysis of variance and covariance (ANOVA and ANCOVA), multivariate analysis of variance and covariance (MANOVA and MANCOVA), multiple linear regression (MLR), logistic regression, path analysis, and covariance structure modeling. This primer is concerned with interactions as they occur in MLR. ANOVA and ANCOVA models are special cases of MLR in which one or more predictors are nominal or ordinal "factors." It is straightforward to estimate such models in the MLR framework, but the accompanying web pages were designed for use with interactions among two or three continuous and/or dichotomous predictor variables only.

2-way Interactions

The regression equation used to assess the predictive effect of two independent variables (X and Z) on Y is:

  Y = b0 + b1(X) + b2(Z) + e

The regression equation used to analyze and interpret a 2-way interaction is:

  Y = b0 + b1(X) + b2(Z) + b3(XZ) + e

...where the last term (XZ) is simply the product of the first two. b3 can be interpreted as the amount of change in the slope of the regression of Y on X when Z changes by one unit.

Assuming a significant interaction effect has been obtained, examine the unstandardized regression coefficients and construct a prediction equation from them. For example, the coefficients obtained in an analysis conducted by Aiken and West (1991, p. 11, uncentered data) are:

  ^
  Y = 90.15 - 24.68(X) - 9.33(Z) + 2.58(XZ)

To make things easier, it makes sense to reconceptualize these equations in terms of one predictor, thus:

  ^
  Y = (b1 + b3Z)X + (b0 + b2(Z))
     = (-24.68 + 2.58(Z))X + (90.15 - 9.33(Z))

This arrangement makes it clear that we are interested in the regression of Y on X at particular values of Z. The (b0 + b2(Z)) term is called the simple intercept because it operates as the intercept for the equation describing Y as a linear function of X. The (b1 + b3(Z)) term is called the simple slope. To examine the interaction, we must choose particular values of Z at which to compute simple slopes. We are free to choose any values for Z we want, but it is usually sensible to stay within the observed range of Z. For example, researchers commonly choose the mean of Z, one standard deviation below the mean, and one standard deviation above the mean if Z is continuous (if Z is dichotomous, these values correspond to the only two possible values of Z). We are not restricted to using plus or minus one standard deviation about the mean of Z - we could use any values we wish, perhaps corresponding to meaningful ranges on the scale (such as 0 and 1 for female and male) or to clinical cutoff points (such as 16 and 10 for major depression and dysthymia, respectively, on the Beck Depression Inventory). Insert the values for Z into the prediction equation, obtain equations for three lines, and then plot the lines. In this example:

  MEAN of Z = 10.0
  STDEV of Z = 2.2
  Zlow = 7.8
  Zmid = 10.0
  Zhigh = 12.2

So...

  Zlow line:
    = (-24.68 + 2.58(7.8))X + (90.15 - 9.33(7.8))
    = -4.556(X) + 17.376

  Zmid line:
    = (-24.68 + 2.58(10.0))X + (90.15 - 9.33(10.0))
    = 1.12(X) - 3.15

  Zhigh line:
    = (-24.68 + 2.58(12.2))X + (90.15 - 9.33(12.2))
    = 6.796(X) - 23.676

We can choose any two meaningful values for X to anchor the lines (for example, the minimum and maximum observed values), and then plot them using any plotting software (e.g., Excel or Sigma Plot). We chose values of 4.05 and 5.95 for X to stay consistent with the example in Aiken and West (1991, p. 15) (only two values are necessary because two points define a line). Each of the lines in the plot corresponds to a chosen level of Z.

Is the simple slope different from zero?

Recall that simple slopes are the regression slopes of Y on X at particular values of Z. In order to test the hypothesis that a simple slope differs from zero, we must first know the standard error of the simple slope, which is given by:

  sb = sqrt[s11 + 2Zs13 + (Z)2s33]

...where s11 is the variance of the X coefficient (i.e., the squared standard error of b1), s33 is the variance of the interaction coefficient (i.e., the squared standard error of b3), and s13 is the covariance of the two. These values can be obtained from the asymptotic covariance matrix of regression coefficients, which usually can be found in regression output (sometimes it must be specially requested). For the hypothetical group of people one standard deviation above the mean of Z in the example from Aiken and West (1991, p. 17),

  sb = sqrt[43.88 + 2(12.2)(-4.07) + (12.2)2(.4)]
     = sqrt[43.88 - 99.308 + 59.536]
     = 2.03

The test of the simple slope is a t-test with t equal to the simple slope divided by its standard error, with (N - k - 1) degrees of freedom, where N is the sample size and k is the number of predictors including the interaction term. In our example, that would be:

  t = (-24.68 + 2.58(7.8)) / 2.03 = -2.24
  df = 400 - 3 - 1 = 396

...for the simple slope corresponding to Zlow. In this case, the slope is significantly negative at alpha = .05. Aiken & West (1991) describe an easy way to do this test of a simple slope using any statistical software capable of MLR:

  1. Create a new variable Zs, which is Z minus the value of Z for which we want the simple slope of Y on X. For simple slopes at the mean of Z, this transformation is the same as centering Z.
  2. Form a new variable that is X times Zs.
  3. Regress Y on X, the Zs, and the product term, and the t-test for the X coefficient will be the t-test conducted by hand above.

Are two simple slopes different from each other?

This question can be answered by looking at the p-value of the interaction effect. If the interaction is significant, then any two simple slopes are significantly different from one another. This may seem strange, but remember that the question we are trying to answer is "does the dependence of Y on X depend on the level of Z for hypothetical people at different levels of Z?" The answer is "yes" if the interaction is significant.

What about covariates?

If the regression equation involves continuous covariates not involved in interactions, then the recommended approach is to pick the mean value for each covariate and then follow the procedure above for plotting and probing interactions (West, personal communication, April 2001). For dichotomous covariates, the model is interpreted for the case when the dichotomous covariate equals zero (the reference group). Picking different values for covariates has the effect of sliding the existing plot up or down the y-axis.

Regions of Significance

Regions of significance for 2-way interactions are values of Z for which the simple slope of Y on X is statistically significant. Computing regions of significance can be much more useful and powerful than picking arbitrary values of Z at which to examine the significance of simple slopes. One advantage associated with computing the region of significance is that knowing this region tells the user the results of all possible simple slopes tests. Any value of Z falling inside the region corresponds to a nonsignificant simple slope of Y on X. Any value of Z falling outside the region corresponds to a significant simple slope.

Regions of significance were explored by Aiken and West (1991, pp. 134-137), but only for the case involving an interaction between one continuous and one categorical predictor. The idea was elaborated upon by Curran, Bauer, and Willoughby (2006), operating on the insight that regions of significance can be easily computed for both categorical and continuous predictors by simply reversing the t-test formula. The computational aspects of regions of significance are detailed in Curran et al. (2006).

Plotting and Probing Higher Order Interactions

Treatment of 3-way and 4-way interactions proceeds in much the same way as for 2-way interactions. To form the 3-way interaction term, compute the product of all 3 IVs. In order to obtain the unique effect of a higher-order interaction term, it is necessary to include all lower-order terms first (or simultaneously) so that the interaction coefficient represents a unique effect. The regression equation used to analyze a 3-way interaction looks like this:

  ^
  Y = b0 + b1(X) + b2(Z) + b3(W) + b4(XZ) + b5(XW) + b6(ZW) + b7(XZW)

If the b7 coefficient is significant, then it is reasonable to explore further. Reframe the regression equation so that Y is a function of one of the IVs at particular values of the other two:

  ^
  Y = (b1 + b4(Z) + b5(W) + b7(ZW))X + (b0 + b2(Z) + b3(W) + b6(ZW))

The simple slope (what Aiken and West, 1991 call a "simple regression equation") is now:

  (b1 + b4(Z) + b5(W) + b7(ZW))

The remainder of the equation now functions as a simple intercept term.

We can represent 3-way interactions graphically in the same way as 2-way interactions. Pick convenient or meaningful values for Z and W, such as one standard deviation above and below the mean on each, and use all combinations of these values in the equation to plot lines at meaningful levels of X. We can choose any variable to use for the x-axis - it does not matter, except that it may be easier for interpretation to use one over another.

Just as we can test the significance of simple slopes in 2-way interactions, it is also possible to test the significance of simple slopes in 3-way interactions. If you want to test the significance of a simple slope from a line representing the regression of Y on X at particular levels of Z and W, divide the simple regression equation at those values of Z and W by its standard error, which is given by:

  sb = sqrt[s11 + (Z)2s44 + (W)2s55 + (Z)2(W)2s77 + (2Z)s14 + (2W)s15
+ (2Z)(W)s17 + (2Z)(W)s45 + (2W)(Z)2s47 + (2Z)(W)2s57]

Tests of simple slopes can be accomplished using any statistical software capable of MLR in a manner similar to testing simple slopes in 2-way interactions. New Z and W variables can be created by subtracting the values at which the researcher wants to examine simple slopes of Y on X. All 2-way and 3-way product terms are then created, and Y is regressed on X, Z, W, XZ, ZW, XW, and XZW. The t-test for the X coefficient will be the t-test conducted by hand above.

Centering Variables

Centering means subtracting the mean from a variable, leaving deviation scores. There are advantages to be gained from centering independent variables, however:

  1. Centering can make otherwise uninterpretable regression coefficients meaningful, and
  2. Centering reduces multicollinearity among predictor variables.

Centering to reduce multicollinearity is particularly useful when the regression involves squares or cubes of IVs. Centering has no effect at all on linear regression coefficients (except for the intercept) unless at least one interaction term is included. The more the IVs are correlated, the smaller their regression weights and the larger their standard errors tend to be.

Regardless of the complexity of the regression equation, centering has no effect at all on the coefficients of the highest-order terms, but may drastically change those of the lower-order terms in the equation. The algebra is given in Aiken and West (1991), but centering unstandardized IVs usually does not affect anything of interest. Simple slopes will be the same in centered as in uncentered equations, their standard errors and t-tests will be the same, and interaction plots will look exactly the same, but with different values on the x-axis.

Cautions Regarding Interactions in Standardized Regression

Standardized regression weights are what would be obtained if every independent variable in the regression equation were rescaled to have a mean of 0.0 and a standard deviation of 1.0 before running a regression analysis. The most common reason to use standardized coefficients is in order to have a common scale with which to evaluate the contribution of each of the independent variables (IVs). With one predictor, the standardized regression coefficient of the IV is simply the correlation between the dependent variable (Y) and the IV. With two IVs, this is no longer the case (unless the IVs are completely uncorrelated).

An advantage that standardized weights have over unstandardized weights is that they can be informally compared in terms of magnitude. If one IV has a standardized coefficient larger than another one, then it is probably the case that it is more effective at predicting Y. However, this claim becomes more untenable as the IVs become more correlated. Caution should also be exercised when interactions are investigated using standardized coefficients. Three things to keep in mind are:

  1. the t-test for the interaction term will typically be the same for any combination of standardized, unstandardized, centered, or uncentered data.
  2. Whereas the regression coefficient for the interaction term will be the same for centered or uncentered IVs in unstandardized regression, they differ with standardized regression.
  3. For standardized regression, the simple slopes differ depending on whether centered or uncentered data are used.

#2 and #3 above are troubling. They imply that we should never interpret standardized regression weights when an interaction is present, because the effect size of the interaction changes when constants are added to the IVs. However, software is not at fault here. Statistical software usually computes standardized regression weights by first standardizing all predictors. It does not differentiate between IVs and products of IVs - they are all considered independent variables on equal footing. In unstandardized regression (centered or uncentered) we manually compute the interaction term by multiplying X by Z to yield the product XZ. In standardized regression, then, we ought to compute it by multiplying zX by zZ to yield zXzZ. Software packages do not know that this term is supposed to be a product, so they simply standardize the product rather than obtaining the product of already-standardized variables. In general, the z-score of the product does not equal the product of z-scores, a point made very clear by Friedrich (1982). This fact implies that the way to obtain correct results for standardized regression with an interaction term involves computing the standardized terms, and their product terms, manually. Then the regression can be conducted on the standardized terms as with any other regression.

However, even though the correct standardized coefficients are obtained using this method, the standard error still will not be correct for the standardized coefficients. Standardized coefficients involve a stochastic scaling adjustment, which itself is subject to sampling error. This adjustment is not made by most statistics packages (see Bollen, 1989, p. 125). In other words, computation of standardized effects is correct if accomplished by using the procedure above, but significance should be assessed using unstandardized coefficients.

References

Aiken, L. S., & West, S. G. (1991). Multiple Regression: Testing and interpreting interactions. Thousand Oaks: Sage.

Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40, 373-400.

Bollen, K. A. (1989). Structural equations with latent variables. New York: John Wiley & Sons.

Cohen, J., Cohen, P., West, S. G., & Aiken, L. S. (2003). Applied multiple regression/correlation analysis for the behavioral sciences, 3rd ed.. Hillsdale: Erlbaum.

Curran, P. J., Bauer, D. J, & Willoughby, M. T. (2006). Testing and probing interactions in hierarchical linear growth models. In C. S. Bergeman & S. M. Boker (Eds.), The Notre Dame Series on Quantitative Methodology, Volume 1: Methodological issues in aging research, (pp. 99-129). Mahwah, NJ: Lawrence Erlbaum Associates.

Curran, P. J., Bauer, D. J, & Willoughby, M. T. (2004). Testing main effects and interactions interactions in hierarchical linear growth models. Psychological Methods, 9, 220-237.

Friedrich, R. J. (1982). In defense of multiplicative terms in multiple regression equations. American Journal of Political Science, 26, 797-833.

Acknowledgments

Original version posted 2003. Thank you to my teachers.