Variable selection is the process of choosing a “best” subset of all available predictors.
Well, there is no single “best” subset.
We do want a model we can interpret or justify with respect to the questions of interest.
Nested models
Any two models with the same response
MSE (Mean Squared Error)
AIC (Akaike Information Criterion)
BIC (Bayesian Information Criterion)
Mallow’s \(C_p\) (has fallen out of favor)
Any two models with the same response (but possibly differently transformed)
CAUTION:
For AIC, BIC, and \(C_p\),
\(p\) = number of parameters (inluding the constant/intercept).
This is different than our usual of the letter \(p\) (number of predictors).
Both Akaike and Bayesian Information Criteria
reward small variance (\(SSE_p / n\) small) and penalize larger models (\(p\) large).
\[AIC = n \log_e(SSE_p / n) + 2p\]
\[BIC = n \log_e(SSE_p / n) + p\log_e(n)\]
Smaller is better for both criterion
Models with AIC difference \(\le 2\) should be treated as equally adequate
Similarly, models with BIC difference \(\le 2\) should be treated as equally adequate
BIC penalty for larger models is more severe
\(p\log_e(n) > 2p\) (whenever \(n > 8\))
Controls tendency of “overfitting” from AIC
If there are \(p\) predictor variables, then \(2^p\) possible models.
If \(p=10\), then more than 1,000 candidate models to choose from!
…and that’s not even including the possibility of interaction effects
So, how to choose models in an intelligent and efficient way?
Consider a world where there is a “correct” model with \(q\) predictors.
\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_q x_{iq} + \epsilon_i\] with least squares estimate
\[\widehat{y}_i^*=\widehat{\beta}_{0}^*+\widehat{\beta}_{1}^* x_{i1} + ... + \widehat{\beta}_q^* x_{iq}\]
Let \(p < q\) and consider the model
\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_p x_{ip} + \epsilon_i\] which excludes \(\beta_{p+1}, \beta_{p+2}, \ldots, \beta_q\) (all non-zero coefficients).
This model is estimated by
\[\widehat{y}_i=\widehat{\beta}_{0}+\widehat{\beta}_{1} x_{i1} + ... + \widehat{\beta}_px_{ip}\]
Decreased variance of coefficients and predictions
\(\mathrm{var}(\widehat{\beta}_j) \le \mathrm{var}(\widehat{\beta}_j^*)\)
\(\mathrm{var}(\widehat{y}_i) \le \mathrm{var}(\widehat{y}_i^*)\)
Bias of coefficients and predictions
tend to over-estimate or tend to under-estimate on average
coefficient bias = \(E(\widehat{\beta}_j) - \beta_j\)
prediction bias = \(E(\widehat{y}_i) - \mu_i\)
…and we don’t know which direction or how large the bias
However, the bias can be considered negligible if \(\widehat{\beta}_j< se(\widehat{\beta}_j)\).
Consider a world where there is a “correct” model with \(p\) predictors.
\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_p x_{ip} + \epsilon_i\] with least squares estimate
\[\widehat{y}_i=\widehat{\beta}_{0}+\widehat{\beta}_{1} x_{i1} + ... + \widehat{\beta}_px_{ip}\]
Let \(q < p\) and consider the model
\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_q x_{iq} + \epsilon_i\] which includes \(\beta_{p+1}, \beta_{p+2}, \ldots, \beta_q\) (but all these coefficients = 0).
This model is estimated by
\[\widehat{y}_i^*=\widehat{\beta}_{0}^*+\widehat{\beta}_{1}^*x_{i1} + ... + \widehat{\beta}_q^*x_{iq}\]
Increased variance of coefficients and predictions
Decreases degrees of freedom (amount of information for estimating variance)
\(\mathrm{var}(\widehat{\beta}_j) \le \mathrm{var}(\widehat{\beta}_j^*)\)
\(\mathrm{var}(\widehat{y}_i) \le \mathrm{var}(\widehat{y}_i^*)\)
An overarching consideration in model selection
relates to the purpose and intended use of the model.
Descriptive (exploratory, understanding relationships)
Try to account for as much response variability as possible, but keep simple.
Search for fundamental relationships
Usually start with a few essential variables
Then choose variables (and combinations of variables) to build forward
Useful with big data
The descriptive/exploratory model might not be the final model
Predictive (getting good predictions)
Minimize the MSE of prediction: \(MSE(\widehat{y}_i) = \mathrm{var}(\widehat{y}_i) + \text{bias}^2\)
Want realistic predictions and close to the data
Less consideration about which variables are required, included
Explanatory (describe the process, interpretability)
Lots of thinking required about which variables are important
Parsimony important (smallest model that is “complete”)
Thoroughly address confounding and possible effect modification
Do not omit important confounders
Explore the need for interaction terms
Ideally, would like a little bit of all three properties.
Let’s look at an example: From textbook Table 3.3, the supervisor performance data.
\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_2=\) Does not allow special privileges
\(x_3=\) Opportunity to learn new things
\(x_4=\) Raises based on performance
\(x_5=\) Too critical of poor performance
\(x_6=\) Rate of advancing to better jobs
superData <- read.delim("http://statistics.uchicago.edu/~collins/data/RABE5/P060.txt")
glimpse(superData)
Observations: 30
Variables: 7
$ y <dbl> 43, 63, 71, 61, 81, 43, 58, 71, 72, 67, 64, 67, 69, 68, 77, 81, 74…
$ x1 <dbl> 51, 64, 70, 63, 78, 55, 67, 75, 82, 61, 53, 60, 62, 83, 77, 90, 85…
$ x2 <dbl> 30, 51, 68, 45, 56, 49, 42, 50, 72, 45, 53, 47, 57, 83, 54, 50, 64…
$ x3 <dbl> 39, 54, 69, 47, 66, 44, 56, 55, 67, 47, 58, 39, 42, 45, 72, 72, 69…
$ x4 <dbl> 61, 63, 76, 54, 71, 54, 66, 70, 71, 62, 58, 59, 55, 59, 79, 60, 79…
$ x5 <dbl> 92, 73, 86, 84, 83, 49, 68, 66, 83, 80, 67, 74, 63, 77, 77, 54, 79…
$ x6 <dbl> 45, 47, 48, 35, 47, 34, 35, 41, 31, 41, 34, 41, 25, 35, 46, 36, 63…
head(superData)
y | x1 | x2 | x3 | x4 | x5 | x6 |
---|---|---|---|---|---|---|
43 | 51 | 30 | 39 | 61 | 92 | 45 |
63 | 64 | 51 | 54 | 63 | 73 | 47 |
71 | 70 | 68 | 69 | 76 | 86 | 48 |
61 | 63 | 45 | 47 | 54 | 84 | 35 |
81 | 78 | 56 | 66 | 71 | 83 | 47 |
43 | 55 | 49 | 44 | 54 | 49 | 34 |
ggpairs(superData)
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
library(car)
VIFvalues <- as.data.frame(vif(lmfit))
colnames(VIFvalues) <- "VIF"
VIFvalues
VIF | |
---|---|
x1 | 2.667 |
x2 | 1.601 |
x3 | 2.271 |
x4 | 3.078 |
x5 | 1.228 |
x6 | 1.952 |
No strong evidence of multicollinearity
\[Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + \beta_6 x_6 + \epsilon\]
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
tidy(lmfit, conf.int=TRUE)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 10.7871 | 11.5893 | 0.9308 | 0.3616 | -13.1871 | 34.7613 |
x1 | 0.6132 | 0.1610 | 3.8090 | 0.0009 | 0.2802 | 0.9462 |
x2 | -0.0731 | 0.1357 | -0.5382 | 0.5956 | -0.3538 | 0.2077 |
x3 | 0.3203 | 0.1685 | 1.9009 | 0.0699 | -0.0283 | 0.6689 |
x4 | 0.0817 | 0.2215 | 0.3690 | 0.7155 | -0.3764 | 0.5399 |
x5 | 0.0384 | 0.1470 | 0.2611 | 0.7963 | -0.2657 | 0.3425 |
x6 | -0.2171 | 0.1782 | -1.2180 | 0.2356 | -0.5857 | 0.1516 |
glance(summary(lmfit))
r.squared | adj.r.squared | sigma | statistic | p.value | df |
---|---|---|---|---|---|
0.7326 | 0.6628 | 7.068 | 10.5 | 0 | 7 |
\[Y = \beta_0 + \beta_1 x_1 + \beta_2x_3 + \beta_3 x_4 + \beta_4 x_6 + \epsilon\]
\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_3=\) Opportunity to learn new things
\(x_4=\) Raises based on performance
\(x_6=\) Rate of advancing to better jobs
lmfit.positive <- lm(y ~ x1 + x3 + x4 + x6, data = superData)
tidy(lmfit.positive, conf.int=TRUE)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 11.9917 | 8.2411 | 1.4551 | 0.1581 | -4.9811 | 28.9644 |
x1 | 0.5811 | 0.1443 | 4.0274 | 0.0005 | 0.2839 | 0.8782 |
x3 | 0.2999 | 0.1583 | 1.8947 | 0.0698 | -0.0261 | 0.6258 |
x4 | 0.1062 | 0.2049 | 0.5185 | 0.6087 | -0.3158 | 0.5283 |
x6 | -0.2289 | 0.1677 | -1.3646 | 0.1845 | -0.5744 | 0.1166 |
glance(summary(lmfit.positive))
r.squared | adj.r.squared | sigma | statistic | p.value | df |
---|---|---|---|---|---|
0.7285 | 0.6851 | 6.831 | 16.77 | 0 | 5 |
\[Y = \beta_0 + \beta_1 x_2 + \beta_2 x_5 + \epsilon\]
\(Y=\) Overall rating of job being done by supervisor
\(x_2=\) Does not allow special privileges
\(x_5=\) Too critical of poor performance
lmfit.negative <- lm(y ~ x2 + x5, data = superData)
tidy(lmfit.negative, conf.int=TRUE)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | 34.0448 | 17.4725 | 1.9485 | 0.0618 | -1.8057 | 69.8954 |
x2 | 0.4099 | 0.1742 | 2.3536 | 0.0261 | 0.0526 | 0.7672 |
x5 | 0.1178 | 0.2153 | 0.5471 | 0.5888 | -0.3240 | 0.5597 |
glance(summary(lmfit.negative))
r.squared | adj.r.squared | sigma | statistic | p.value | df |
---|---|---|---|---|---|
0.1905 | 0.1306 | 11.35 | 3.178 | 0.0576 | 3 |
Adjusted \(R^2\quad\) (larger is better)
MSE = \(\widehat{\sigma}^2\quad\) (smaller is better)
AIC = \(n(\mathrm{SSE}_p/n) + 2p\quad\) (smaller is better)
BIC = \(n\log_e(\mathrm{SSE}_p/n) + p\log_e(n)\quad\) (smaller is better)
\(Y = \beta_0 + \beta_1x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + \beta_6 x_6 + \epsilon\) (the full model)
\(Y = \beta_0 + \beta_1x_1 + \beta_2x_3 + \beta_3 x_4 + \beta_4 x_6 + \epsilon\) (positive stuff)
\(Y = \beta_0 + \beta_1 x_2 + \beta_2 x_5 + \epsilon\) (negative stuff)
The full model:
glance(lmfit)[c(2, 3, 8, 9)]
adj.r.squared | sigma | AIC | BIC |
---|---|---|---|
0.6628 | 7.068 | 210.5 | 221.7 |
A model based on advancement and raises (positive stuff)
glance(lmfit.positive)[c(2, 3, 8, 9)]
adj.r.squared | sigma | AIC | BIC |
---|---|---|---|
0.6851 | 6.831 | 207 | 215.4 |
A model focusing on the negative
glance(lmfit.negative)[c(2, 3, 8, 9)]
adj.r.squared | sigma | AIC | BIC |
---|---|---|---|
0.1306 | 11.35 | 235.7 | 241.3 |
The full model and the “positive” model are fairly close.
Both are clearly better than the “negative” model.
Perhaps choose the “positive” model for parsimony.
\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_3=\) Opportunity to learn new things
\(x_4=\) Raises based on performance
\(x_6=\) Rate of advancing to better jobs
…and how shall we interpret this result with regard to rating supervisors?
Thinking through the problem carefully is always a strategy that should be employed!!!
But, sometimes may want to use one of these objective approaches:
Forward Selection
Backward Elimination
Stepwise Selection (a little of both)
Begin with an “empty” model (no predictors)
Begin with the “full” model (all predictors)
Remove the predictor with the smallest t-statistic (largest p-value)
It combines forward and backward steps,
usually beginning from empty model (forward stepwise).
I found an R package (olsrr
) that will do variable selection much like described in the textbook.
I’m sure there are other packages.
install.packages("olsrr")
library(olsrr)
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
ols_step_forward_p(lmfit, penter = 0.99)
Using penter = 0.99
we are adding in all predictors,
one by one, whichever variable has the lowest p-value comes in next.
library(olsrr)
Attaching package: 'olsrr'
The following object is masked from 'package:MASS':
cement
The following object is masked from 'package:datasets':
rivers
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
forward.sequence <- ols_step_forward_p(lmfit, penter = 0.99)
Forward Selection Method
---------------------------
Candidate Terms:
1. x1
2. x2
3. x3
4. x4
5. x5
6. x6
We are selecting variables based on p value...
Variables Entered:
- x1
- x3
- x6
- x2
- x4
- x5
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.856 RMSE 7.068
R-Squared 0.733 Coef. Var 10.936
Adj. R-Squared 0.663 MSE 49.957
Pred R-Squared 0.547 MAE 5.179
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 3147.966 6 524.661 10.502 0.0000
Residual 1149.000 23 49.957
Total 4296.967 29
--------------------------------------------------------------------
Parameter Estimates
-----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
-----------------------------------------------------------------------------------------
(Intercept) 10.787 11.589 0.931 0.362 -13.187 34.761
x1 0.613 0.161 0.671 3.809 0.001 0.280 0.946
x3 0.320 0.169 0.309 1.901 0.070 -0.028 0.669
x6 -0.217 0.178 -0.183 -1.218 0.236 -0.586 0.152
x2 -0.073 0.136 -0.073 -0.538 0.596 -0.354 0.208
x4 0.082 0.221 0.070 0.369 0.715 -0.376 0.540
x5 0.038 0.147 0.031 0.261 0.796 -0.266 0.342
-----------------------------------------------------------------------------------------
forward.sequence
Selection Summary
------------------------------------------------------------------------
Variable Adj.
Step Entered R-Square R-Square C(p) AIC RMSE
------------------------------------------------------------------------
1 x1 0.6813 0.6699 1.4115 205.7638 6.9933
2 x3 0.7080 0.6864 1.1148 205.1387 6.8168
3 x6 0.7256 0.6939 1.6027 205.2758 6.7343
4 x2 0.7293 0.6860 3.2805 206.8634 6.8206
5 x4 0.7318 0.6759 5.0682 208.5886 6.9294
6 x5 0.7326 0.6628 7.0000 210.4998 7.0680
------------------------------------------------------------------------
The olsrr
package includes a plot
method for the output from ols_step_forward_p
forward.sequence <- ols_step_forward_p(lmfit, penter = 0.99)
plot(forward.sequence)
You can see how the various model summaries change as variables are added one-by-one.
SBC is BIC (I don’t know why it’s relabeled)
SBIC is a slight modification of BIC.
C(p) is Mallow’s \(C_p\) discussed in the textbook, but not in lecture.
plot(forward.sequence)
Could use a rule to only add variables if the p-value is less than a specified value:
Then set criterion to, say, p-value \(= 0.25\)
forward.output <- ols_step_forward_p(lmfit, penter = 0.25)
Forward Selection Method
---------------------------
Candidate Terms:
1. x1
2. x2
3. x3
4. x4
5. x5
6. x6
We are selecting variables based on p value...
Variables Entered:
- x1
- x3
- x6
No more variables to be added.
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.852 RMSE 6.734
R-Squared 0.726 Coef. Var 10.419
Adj. R-Squared 0.694 MSE 45.350
Pred R-Squared 0.642 MAE 5.317
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 3117.858 3 1039.286 22.917 0.0000
Residual 1179.109 26 45.350
Total 4296.967 29
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 13.578 7.544 1.800 0.084 -1.929 29.084
x1 0.623 0.118 0.681 5.271 0.000 0.380 0.866
x3 0.312 0.154 0.301 2.026 0.053 -0.005 0.629
x6 -0.187 0.145 -0.158 -1.291 0.208 -0.485 0.111
----------------------------------------------------------------------------------------
forward.output
Selection Summary
------------------------------------------------------------------------
Variable Adj.
Step Entered R-Square R-Square C(p) AIC RMSE
------------------------------------------------------------------------
1 x1 0.6813 0.6699 1.4115 205.7638 6.9933
2 x3 0.7080 0.6864 1.1148 205.1387 6.8168
3 x6 0.7256 0.6939 1.6027 205.2758 6.7343
------------------------------------------------------------------------
\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_3=\) Opportunity to learn new things
\(x_6=\) Rate of advancing to better jobs
Alternatively, we can use a criterion like AIC to decide when to stop.
We will stop when AIC no longer decreases.
ols_step_forward_aic(lmfit)
ols_step_forward_aic(lmfit)
Forward Selection Method
------------------------
Candidate Terms:
1 . x1
2 . x2
3 . x3
4 . x4
5 . x5
6 . x6
Variables Entered:
- x1
- x3
No more variables to be added.
Selection Summary
--------------------------------------------------------------------
Variable AIC Sum Sq RSS R-Sq Adj. R-Sq
--------------------------------------------------------------------
x1 205.764 2927.584 1369.382 0.68131 0.66993
x3 205.139 3042.318 1254.649 0.70802 0.68639
--------------------------------------------------------------------
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
Start with all variables in the model and remove according to the p-value.
Remove predictor with largest p-value.
ols_step_backward_p(lmfit, prem = 0.33)
If we set the p-value for removal = 0.33, that corresponds to a t-statistic \(\approx 1\).
2 * (1 - pt(1, df = 30 - 6 - 1))
[1] 0.3277
ols_step_backward_p(lmfit, prem = 0.33)
Backward Elimination Method
---------------------------
Candidate Terms:
1 . x1
2 . x2
3 . x3
4 . x4
5 . x5
6 . x6
We are eliminating variables based on p value...
Variables Removed:
- x5
- x4
- x2
No more variables satisfy the condition of p value = 0.33
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.852 RMSE 6.734
R-Squared 0.726 Coef. Var 10.419
Adj. R-Squared 0.694 MSE 45.350
Pred R-Squared 0.642 MAE 5.317
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 3117.858 3 1039.286 22.917 0.0000
Residual 1179.109 26 45.350
Total 4296.967 29
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 13.578 7.544 1.800 0.084 -1.929 29.084
x1 0.623 0.118 0.681 5.271 0.000 0.380 0.866
x3 0.312 0.154 0.301 2.026 0.053 -0.005 0.629
x6 -0.187 0.145 -0.158 -1.291 0.208 -0.485 0.111
----------------------------------------------------------------------------------------
Elimination Summary
------------------------------------------------------------------------
Variable Adj.
Step Removed R-Square R-Square C(p) AIC RMSE
------------------------------------------------------------------------
1 x5 0.7318 0.6759 5.0682 208.5886 6.9294
2 x4 0.7293 0.686 3.2805 206.8634 6.8206
3 x2 0.7256 0.6939 1.6027 205.2758 6.7343
------------------------------------------------------------------------
Alternatively, we can use a criterion like AIC to decide when to stop.
We will stop when AIC no longer decreases.
ols_step_backward_aic(lmfit)
ols_step_backward_aic(lmfit)
Backward Elimination Method
---------------------------
Candidate Terms:
1 . x1
2 . x2
3 . x3
4 . x4
5 . x5
6 . x6
Variables Removed:
- x5
- x4
- x2
- x6
No more variables to be removed.
Backward Elimination Summary
---------------------------------------------------------------------
Variable AIC RSS Sum Sq R-Sq Adj. R-Sq
---------------------------------------------------------------------
Full Model 210.500 1149.000 3147.966 0.73260 0.66285
x5 208.589 1152.406 3144.560 0.73181 0.67594
x4 206.863 1163.012 3133.955 0.72934 0.68604
x2 205.276 1179.109 3117.858 0.72559 0.69393
x6 205.139 1254.649 3042.318 0.70802 0.68639
---------------------------------------------------------------------
As variables come into the model,
you can also check to see if any could later be removed.
Both criteria can be the p-values for coefficient t-tests.
ols_step_both_p(lmfit, pent = 0.25, prem = 0.33)
ols_step_both_p(lmfit, pent = 0.25, prem = 0.33)
Stepwise Selection Method
---------------------------
Candidate Terms:
1. x1
2. x2
3. x3
4. x4
5. x5
6. x6
We are selecting variables based on p value...
Variables Entered/Removed:
- x1 added
- x3 added
- x6 added
No more variables to be added/removed.
Final Model Output
------------------
Model Summary
--------------------------------------------------------------
R 0.852 RMSE 6.734
R-Squared 0.726 Coef. Var 10.419
Adj. R-Squared 0.694 MSE 45.350
Pred R-Squared 0.642 MAE 5.317
--------------------------------------------------------------
RMSE: Root Mean Square Error
MSE: Mean Square Error
MAE: Mean Absolute Error
ANOVA
--------------------------------------------------------------------
Sum of
Squares DF Mean Square F Sig.
--------------------------------------------------------------------
Regression 3117.858 3 1039.286 22.917 0.0000
Residual 1179.109 26 45.350
Total 4296.967 29
--------------------------------------------------------------------
Parameter Estimates
----------------------------------------------------------------------------------------
model Beta Std. Error Std. Beta t Sig lower upper
----------------------------------------------------------------------------------------
(Intercept) 13.578 7.544 1.800 0.084 -1.929 29.084
x1 0.623 0.118 0.681 5.271 0.000 0.380 0.866
x3 0.312 0.154 0.301 2.026 0.053 -0.005 0.629
x6 -0.187 0.145 -0.158 -1.291 0.208 -0.485 0.111
----------------------------------------------------------------------------------------
Stepwise Selection Summary
------------------------------------------------------------------------------------
Added/ Adj.
Step Variable Removed R-Square R-Square C(p) AIC RMSE
------------------------------------------------------------------------------------
1 x1 addition 0.681 0.670 1.4110 205.7638 6.9933
2 x3 addition 0.708 0.686 1.1150 205.1387 6.8168
3 x6 addition 0.726 0.694 1.6030 205.2758 6.7343
------------------------------------------------------------------------------------
ols_step_both_aic(lmfit)
ols_step_both_aic(lmfit)
Stepwise Selection Method
-------------------------
Candidate Terms:
1 . x1
2 . x2
3 . x3
4 . x4
5 . x5
6 . x6
Variables Entered/Removed:
- x1 added
- x3 added
No more variables to be added or removed.
Stepwise Summary
-------------------------------------------------------------------------------
Variable Method AIC RSS Sum Sq R-Sq Adj. R-Sq
-------------------------------------------------------------------------------
x1 addition 205.764 1369.382 2927.584 0.68131 0.66993
x3 addition 205.139 1254.649 3042.318 0.70802 0.68639
-------------------------------------------------------------------------------
Start with the full model (all predictors).
Remove according to p-value (large)
…and possibly later enter a variable back in according to p-value (small).
In the olsrr
package, the function ols_step_both_p
does not appear to support backward stepwise selection – only forward (demonstrated above).