Simple intercepts, simple slopes, and regions of significance in HLM 2way interactions
Kristopher J. Preacher (Vanderbilt University)
Patrick J. Curran (University of North Carolina at Chapel Hill)
Daniel J. Bauer (University of North Carolina at Chapel Hill)
Get a printable PDF version of these instructions.
If the Rweb server is not working
The code generated by this utility can be pasted directly into an R console window. R (a free, opensource statistical computing environment) may be obtained here: http://cran.rproject.org/.
This web page calculates simple intercepts, simple slopes, and the region of significance to facilitate the testing and probing of twoway interactions estimated in hierarchical linear regression models (HLMs). The interaction can be between two dichotomous variables, two continuous variables, or a dichotomous and a continuous variable. Further, the interaction can occur solely within level 1 (i.e., Case 1), solely within level 2 (i.e., Case 2), or result from a cross level prediction of a level 1 random effect by a level 2 covariate (i.e., Case 3). Because the analytic methods are identical for probing interactions in all three cases, we use the general notation _{0} to define the simple intercept and _{1} to define the simple slope regardless of which case we are considering. We use the standard notation of Raudenbush and Bryk (2002) to define each of these cases, and we assume that the user is knowledgeable both in the general HLM and in the testing, probing, and interpretation of interactions in multiple linear regression (e.g., Aiken & West, 1991). The following material is intended to facilitate the calculation of the methods presented in Bauer and Curran (2004) and Curran, Bauer, and Willoughby (in press), and we recommend consulting these papers for further details.
The first case we consider involves an interaction between two predictors within the level 1 equation but with no predictors of these effects at level 2. For the two predictor case, the level 1 equation is
(1) 
where y_{ij} is the value of y for observation i in group j, x_{1ij}, and x_{2ij} are the two level 1 covariates for observation i in group j, and x_{1ij}x_{2ij} is the interaction between the two level 1 covariates. Further, _{0j} is the intercept of the regression equation for group j, _{1j} and _{2j} are the main effects of x_{1ij} and x_{2ij}, respectively, _{3j} is the withinlevel interaction between x_{1ij} and x_{2ij}, and r_{ij} is the observation and groupspecific residual. Because the regression parameters are viewed as random variables, these can be expressed in the level 2 equations as
(2) 
where the s represent the fixed regression coefficients and the u's represent the groupspecific deviations from the fixed effects. This formulation is sometimes referred to as a random effects regression model given that the level 1 regression coefficients vary over the level 2 units, but are not conditioned on level 2 covariates. We can substitute the level 2 equations into the level 1 equation to result in the reduced form equation such that
(3) 
The first parenthetical term represents the fixed effects and the second parenthetical term represents the random effects. If the interaction term (e.g., _{30}) is found to be significant, it is necessary to further probe this effect to identify the precise nature of this conditional relation.
Following the methods described in Bauer and Curran (2004), we can define the conditional regression of y on x_{1} (denoted the focal predictor) as a function of x_{2} (denoted the moderator). Note that this distinction between focal predictor and moderator is arbitrary given the symmetry of the interaction. Rearrangement of the expected value of the reducedform equation highlights the conditional relation between the dependent variable y and focal predictor x_{1} as a function of the moderator x_{2}:
(4) 
where _{yx2} denotes the model implied mean value of y as a function of x_{1} at a specific value of x_{2}. Note that Equation (4) has the form of a simple regression of y on x_{1} where the first parenthetical term is the intercept of the regression and the second parenthetical term is the slope of the regression. We will refer to the first parenthetical term as the simple intercept and the second term as the simple slope. It can be seen that the simple intercept and simple slope are compound coefficients that result from the linear combination of other regression parameters. To further explicate this, we can reexpress Equation (4) in terms of sample estimates of population values such that
(5) 
where
(6) 
The sample estimates of the simple intercept (_{0}) and simple slope (_{1}) define the conditional regression of y on x_{1} as a function of x_{2}. Because these are sample estimates, we must compute standard errors to conduct inferential tests of these effects. The computation of these standard errors is one of the key purposes of our calculators.
The second case arises when there are no predictors at level 1 and there is a twoway interaction estimated within level 2. This is sometimes referred to as a means as outcomes model. Using the twolevel notation system of Raudenbush and Bryk (2002), the level 1 equation is expressed as
(7) 
where y_{ij} is the observed value of outcome y for observation i nested within group j, _{0j} is the intercept for group j, and r_{ij} is the person and group specific residual. Because there are no predictors, the intercept represents the model implied mean of y within group j. These group means can then be modeled as a function of two level 2 covariates (w_{1j} and w_{2j}) and their interaction (w_{1j}w_{2j}) such that
(8) 
where _{00} is the fixed intercept, _{01}, _{02}, and _{03} are the fixed regression coefficients for the two main effects and the interaction, respectively, and u_{0j} is the level 2 residual. Finally, the level 2 equation can be substituted into the level 1 equation to form the reduced form equation such that
(9) 
As was described for Case 1, if the interaction term (i.e., _{03}) is found to be significant, it is necessary to further probe this effect. We can again (arbitrarily) define the conditional regression of y on w_{1} (the focal predictor) as a function of w_{2} (the moderator). Rearrangement the expected value of the reduced form equation results in
(10) 
where _{yw2} represents the model implied value of y as a function of w_{1} at a specific value of w_{2}. As before, the first term represents the simple intercept and the second the simple slope. The sample estimates of these compound effects can be explicitly defined as
(11) 
where
(12) 
The sample estimates of the simple intercept (_{0}) and simple slope (_{1}) define the conditional regression of y on w_{1} as a function of w_{2}.
The third and final case arises when there is a single main effect predictor at level 1 and a single main effect predictor at level 2 which is manifested in the reduced form equation as a crosslevel interaction. This type of conditional relation may be the most commonly encountered in many HLM applications and is sometimes referred to as a slopes as outcomes model. The level 1 equation is
(13) 
where x_{1ij} is the observed predictor for observation i nested within group j, _{1j} is the regression slope of y on x_{1} within group j, and all else is defined as above. The level 2 equations are
(14) 
where w_{1j} is the observed predictor for group j, _{00} and _{10} are the fixed intercepts, _{01} and _{11} are the fixed regression coefficients for w_{1j}, and u_{0j} and u_{1j} are the residual terms. Finally, substituting the level 2 equation into the level 1 equation results in the reduced form equation such that
(15) 
It can be seen that the regression of the level 1 slope on the level 2 covariate results in a crosslevel interaction between x_{1ij} and w_{1j} with regression coefficient _{11}. Rearrangement of the expected value of the reduced form equation results in
(16) 
where the simple intercept and simple slope for the conditional regression of y on x_{1} as a function of w_{1} are given by the first and second parenthetical expression, respectively. The sample estimates of these compound effects can be explicitly defined as
(17) 
where
(18) 
The sample estimates of the simple intercept (_{0}) and simple slope (_{1}) define the conditional regression of y on x_{1} as a function of w_{1}. It is sometimes of interest to estimate a crosslevel interaction in which the question of interest revolves around the simple slope of y on w_{1} as a function of x_{1}, but we do not address such situations here. The Case 3 table may be used for such situations, switching x_{1} and w_{1} (and parameters associated with them) where appropriate.
We are primarily interested in the estimation of the simple intercept (_{0}) and the simple slope (_{1}) of the conditional regression of the outcome on the focal predictor as a function of the moderator. When comparing the calculation of the simple intercepts and slopes across the three cases above, it is clear that these all share a common computational form, and this is why we have used the same notation to define the simple intercept and slope for each case. However, to simplify the use of our tables in practice, we have developed calculators separately for each of the three cases, although the underlying analytics are all identical (see Bauer & Curran, 2004, for details). We now turn to a brief description of the values that can be calculated using our tables below.
The first available output is the region of significance of the simple slope describing the relation between the outcome y and the focal predictor as a function of the moderator. We do not provide the region of significance for the simple intercept given that this is rarely of interest in practice. The region of significance defines the specific values of the moderator at which the slope of the regression of y on the focal predictor transitions from nonsignificance to significance. There are lower and upper bounds to the region. In many cases, the regression of y on the focal predictor is significant at values of the moderator that are less than the lower bound and greater than the upper bound, and the regression is nonsignificant at values of the moderator falling within the region. However, there are some cases in which the opposite holds (e.g., the significant slopes fall within the region). Consequently, the output will explicitly denote how the region should be defined in terms of the significance and nonsignificance of the simple slopes. There are also instances in which the region cannot be mathematically obtained, and an error is displayed if this occurs for a given application. By default, the region is calculated at = .05, but this may be changed by the user. Finally, the point estimates and standard errors of both the simple intercepts and the simple slopes are automatically calculated precisely at the lower and upper bounds of the region.
Simple Intercepts and Simple Slopes
The second available output is the calculation of point estimates and standard errors for up to three simple intercepts and simple slopes of the regression of y on the focal predictor at specific levels of the moderator. In the table we refer to these specific values of the moderator as conditional values. There are a variety of potential conditional values of the moderator that may be chosen for the computation of the simple intercepts and slopes. If the moderator is dichotomous (e.g., 0 or 1 to denote gender), we could select the first and second conditional values to be equal to 0 and 1 to compute the regression of y on the focal predictor for males and for females (leaving the third conditional value blank). If the moderator is continuous, we might select values of the moderator that are one standard deviation above the mean, equal to the mean, and one standard deviation below the mean. Whatever the conditional values chosen, these specific values are entered in the section labeled "Conditional Values," and this will provide the corresponding simple intercepts and simple slopes of the regression of y on the focal predictor at those specific values of the moderator. The calculation of simple intercepts and slopes at specific values of the moderator is optional; the user may leave any or all of the conditional value fields blank.
Given the calculation of one or more simple slopes, it is common to plot these relations graphically to improve interpretability of effects. The final available output is the calculation of a lower and upper value associated with each of the simple slopes to aid in the graphing of these using any standard software package (e.g., Excel, SPSS, etc.). These are provided to simply aid in the graphing of effects; no inferential tests apply here. For the regression of y on the focal predictor at specific levels of the moderator, the user enters any two values of the focal predictor in order to plot the regression line between y and the predictor at specific values of the moderator. Although any pair of moderator values can be used, we recommend using either the lower and upper observed values of the moderator, the lower and upper possible values of the moderator, or one sd below and above the mean of the moderator. However, many other specific values can be chosen that may be more appropriate for a particular research application.
Simple intercepts, simple slopes, and the region of significance can be obtained by following these eight steps. Use as many significant digits as possible for optimal precision.
Once all of the necessary information is entered into the table, simply click "Calculate." The status box will identify any errors that might have been encountered. If no errors are found, the results will be presented in the output window. The results in the output window can be pasted into any word processor for printing.
R Code for Creating Simple Slopes Plot
Below the output window are two additional windows. If conditional values of x and z are entered, clicking on "Calculate" will also generate R code for producing a plot of the interaction effect (R is a statistical computing language). This R code can be submitted to a remote Rweb server by clicking on "Submit above to Rweb." A new window will open containing a plot of the interaction effect. The user may make any desired changes to the generated code before submitting, but changes are not necessary to obtain a basic plot. Indeed, this window can be used as an allpurpose interface for R.
R Code for Creating Confidence Bands / Regions of Significance Plot
Assuming enough information is entered into the interactive table, the second output window below the table will include R syntax for generating confidence bands, continuously plotted confidence intervals for simple slopes corresponding to all conditional values of the moderator. The xaxis of the resulting plot will represent conditional values of the moderator, and the yaxis represents values of the simple slope of y regressed on the focal predictor.
If the moderator is dichotomous, only two values along the xaxis (corresponding to the codes used for grouping) would be interpretable. Therefore, in cases where the focal predictor is continuous and the moderator is dichotomous, we suggest treating x_{2} (or w_{2}) as the moderator for the simple slopes plot (so that each line will represent the regression of y on x_{1} (or w_{1}) at conditional values of the moderator) and treating x_{1} (or w_{1}) as the moderator for the confidence bands / regions of significance plot (so that the xaxis will represent values of the focal predictor and the yaxis will represent the group difference in y at conditional values of the focal predictor). This will require switching the roles of the focal predictor and the moderator in the interactive table, requiring the entry of some new values from the ACOV matrix and reentering old values in new places.
Regardless of what variable is treated as the moderator, the user is expected to supply lower and upper values for the moderator (10 and +10 by default). As above, this R code can be submitted to a remote Rweb server by clicking on "Submit above to Rweb." A new window will open containing a plot of confidence bands.
Case 1: x_{1}: focal predictor; x_{2}: moderator


* Optional degrees of freedom for simple intercepts and slopes. 
Case 2: w_{1}: focal predictor; w_{2}: moderator


* Optional degrees of freedom for simple intercepts and slopes. 
Case 3: x_{1}: focal predictor; w_{1}: moderator


* Optional degrees of freedom for simple intercepts and slopes. 
Bauer, D. J., & Curran, P. J. (2005). Probing interactions in fixed and multilevel regression: Inferential and graphical techniques. Multivariate Behavioral Research, 40, 373400.
Curran, P. J., Bauer, D. J, & Willoughby, M. T. (2006). Testing and probing interactions in hierarchical linear growth models. In C. S. Bergeman & S. M. Boker (Eds.), The Notre Dame Series on Quantitative Methodology, Volume 1: Methodological issues in aging research (pp. 99129). Mahwah, NJ: Lawrence Erlbaum Associates.
Preacher, K. J., Curran, P. J., & Bauer, D. J. (2006). Computational tools for probing interaction effects in multiple linear regression, multilevel modeling, and latent curve analysis. Journal of Educational and Behavioral Statistics, 31, 437448.
Original version posted September, 2003. Free JavaScripts provided by The JavaScript Source and John C. Pezzullo.