1 Chapter 11: Model Selection

1.0.1 Intro

1.0.1.1 Variable (model) selection

1.0.1.1.1 Thus far…

predictors identified in advance
most predictors had some value for constructing a linear model

1.0.1.1.2 Reality…

Many predictors
Several candidate models
- all may pass the usual diagnostics and tests
How do we pick the best model?

1.0.1.1.3 What is variable (model) selection

Variable selection is the process of choosing a “best” subset of all available predictors.

Well, there is no single “best” subset.

We do want a model we can interpret or justify with respect to the questions of interest.

1.0.1.2 Model comparison statistics

Nested models

F-test

Any two models with the same response

MSE (Mean Squared Error)
AIC (Akaike Information Criterion)
BIC (Bayesian Information Criterion)
Mallow’s \(C_p\) (has fallen out of favor)

Any two models with the same response (but possibly differently transformed)

adjusted \(R^2\)

1.0.1.2.1 Information criterion

CAUTION:
For AIC, BIC, and \(C_p\),
\(p\) = number of parameters (inluding the constant/intercept).
This is different than our usual of the letter \(p\) (number of predictors).

Both Akaike and Bayesian Information Criteria
reward small variance (\(SSE_p / n\) small) and penalize larger models (\(p\) large).

\[AIC = n \log_e(SSE_p / n) + 2p\]

\[BIC = n \log_e(SSE_p / n) + p\log_e(n)\]

Smaller is better for both criterion
Models with AIC difference \(\le 2\) should be treated as equally adequate
Similarly, models with BIC difference \(\le 2\) should be treated as equally adequate
BIC penalty for larger models is more severe
- \(p\log_e(n) > 2p\) (whenever \(n > 8\))
- Controls tendency of “overfitting” from AIC

1.0.1.3 Which models to compare?

If there are \(p\) predictor variables, then \(2^p\) possible models.

If \(p=10\), then more than 1,000 candidate models to choose from!

…and that’s not even including the possibility of interaction effects

So, how to choose models in an intelligent and efficient way?

1.0.2 Consequences

1.0.2.1 Consequences of excluding necessary variables

Consider a world where there is a “correct” model with \(q\) predictors.

\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_q x_{iq} + \epsilon_i\] with least squares estimate

\[\widehat{y}_i^*=\widehat{\beta}_{0}^*+\widehat{\beta}_{1}^* x_{i1} + ... + \widehat{\beta}_q^* x_{iq}\]

Let \(p < q\) and consider the model

\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_p x_{ip} + \epsilon_i\] which excludes \(\beta_{p+1}, \beta_{p+2}, \ldots, \beta_q\) (all non-zero coefficients).

This model is estimated by

\[\widehat{y}_i=\widehat{\beta}_{0}+\widehat{\beta}_{1} x_{i1} + ... + \widehat{\beta}_px_{ip}\]

1.0.2.1.0.1 What do we gain?

Decreased variance of coefficients and predictions
- \(\mathrm{var}(\widehat{\beta}_j) \le \mathrm{var}(\widehat{\beta}_j^*)\)
- \(\mathrm{var}(\widehat{y}_i) \le \mathrm{var}(\widehat{y}_i^*)\)

1.0.2.1.0.2 What do we lose?

Bias of coefficients and predictions
- tend to over-estimate or tend to under-estimate on average
  - coefficient bias = \(E(\widehat{\beta}_j) - \beta_j\)
  - prediction bias = \(E(\widehat{y}_i) - \mu_i\)
- …and we don’t know which direction or how large the bias
- However, the bias can be considered negligible if \(\widehat{\beta}_j< se(\widehat{\beta}_j)\).

1.0.2.1.0.3 None of these consequence occur if predictors uncorrelated

So, check for collinearity first!
For example, check variance inflation factors (VIF)

1.0.2.2 Consequences of including unnecessary variables

Consider a world where there is a “correct” model with \(p\) predictors.

\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_p x_{ip} + \epsilon_i\] with least squares estimate

\[\widehat{y}_i=\widehat{\beta}_{0}+\widehat{\beta}_{1} x_{i1} + ... + \widehat{\beta}_px_{ip}\]

Let \(q < p\) and consider the model

\[y_i=\beta_0+\beta_1 x_{i1} + ... + \beta_q x_{iq} + \epsilon_i\] which includes \(\beta_{p+1}, \beta_{p+2}, \ldots, \beta_q\) (but all these coefficients = 0).

This model is estimated by

\[\widehat{y}_i^*=\widehat{\beta}_{0}^*+\widehat{\beta}_{1}^*x_{i1} + ... + \widehat{\beta}_q^*x_{iq}\]

1.0.2.2.0.1 What do we gain?

Nothing, really

1.0.2.2.0.2 What do we lose?

Increased variance of coefficients and predictions
- Decreases degrees of freedom (amount of information for estimating variance)
- \(\mathrm{var}(\widehat{\beta}_j) \le \mathrm{var}(\widehat{\beta}_j^*)\)
- \(\mathrm{var}(\widehat{y}_i) \le \mathrm{var}(\widehat{y}_i^*)\)

1.0.2.2.0.3 None of these consequence occur if predictors uncorrelated

So, check for collinearity first!
For example, check variance inflation factors (VIF)

1.0.3 Modeling Purpose

1.0.3.1 What is your modeling purpose?

An overarching consideration in model selection
relates to the purpose and intended use of the model.

Descriptive (exploratory, understanding relationships)
- Try to account for as much response variability as possible, but keep simple.
- Search for fundamental relationships
  - Usually start with a few essential variables
    - Requires consideration of why variables should be in the model
  - Then choose variables (and combinations of variables) to build forward
  - Useful with big data
- The descriptive/exploratory model might not be the final model
Predictive (getting good predictions)
- Minimize the MSE of prediction: \(MSE(\widehat{y}_i) = \mathrm{var}(\widehat{y}_i) + \text{bias}^2\)
- Want realistic predictions and close to the data
- Less consideration about which variables are required, included
  - “black box” modeling to some extent
Explanatory (describe the process, interpretability)
- Lots of thinking required about which variables are important
- Parsimony important (smallest model that is “complete”)
- Thoroughly address confounding and possible effect modification
  - Do not omit important confounders
  - Explore the need for interaction terms

1.0.3.1.0.1 The three goals are not mutually exclusive

Ideally, would like a little bit of all three properties.

1.0.4 Example

1.0.4.1 Supervisor performance example

Let’s look at an example: From textbook Table 3.3, the supervisor performance data.

\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_2=\) Does not allow special privileges
\(x_3=\) Opportunity to learn new things
\(x_4=\) Raises based on performance
\(x_5=\) Too critical of poor performance
\(x_6=\) Rate of advancing to better jobs

superData <- read.delim("http://statistics.uchicago.edu/~collins/data/RABE5/P060.txt")
glimpse(superData)
Observations: 30
Variables: 7
$ y  <dbl> 43, 63, 71, 61, 81, 43, 58, 71, 72, 67, 64, 67, 69, 68, 77, 81, 74…
$ x1 <dbl> 51, 64, 70, 63, 78, 55, 67, 75, 82, 61, 53, 60, 62, 83, 77, 90, 85…
$ x2 <dbl> 30, 51, 68, 45, 56, 49, 42, 50, 72, 45, 53, 47, 57, 83, 54, 50, 64…
$ x3 <dbl> 39, 54, 69, 47, 66, 44, 56, 55, 67, 47, 58, 39, 42, 45, 72, 72, 69…
$ x4 <dbl> 61, 63, 76, 54, 71, 54, 66, 70, 71, 62, 58, 59, 55, 59, 79, 60, 79…
$ x5 <dbl> 92, 73, 86, 84, 83, 49, 68, 66, 83, 80, 67, 74, 63, 77, 77, 54, 79…
$ x6 <dbl> 45, 47, 48, 35, 47, 34, 35, 41, 31, 41, 34, 41, 25, 35, 46, 36, 63…
head(superData)

y	x1	x2	x3	x4	x5	x6
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47
43	55	49	44	54	49	34

1.0.4.1.1 Is multicollinearity a potential problem?

1.0.4.1.1.1 High correlations between predictors?

ggpairs(superData)

1.0.4.1.2 Variance inflation factors larger than 10?

lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
library(car)

VIFvalues <- as.data.frame(vif(lmfit))
colnames(VIFvalues) <- "VIF"
VIFvalues

	VIF
x1	2.667
x2	1.601
x3	2.271
x4	3.078
x5	1.228
x6	1.952

No strong evidence of multicollinearity

1.0.4.2 Three models to compare

1.0.4.2.0.1 The full model

\[Y = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + \beta_6 x_6 + \epsilon\]

lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
tidy(lmfit, conf.int=TRUE)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	10.7871	11.5893	0.9308	0.3616	-13.1871	34.7613
x1	0.6132	0.1610	3.8090	0.0009	0.2802	0.9462
x2	-0.0731	0.1357	-0.5382	0.5956	-0.3538	0.2077
x3	0.3203	0.1685	1.9009	0.0699	-0.0283	0.6689
x4	0.0817	0.2215	0.3690	0.7155	-0.3764	0.5399
x5	0.0384	0.1470	0.2611	0.7963	-0.2657	0.3425
x6	-0.2171	0.1782	-1.2180	0.2356	-0.5857	0.1516

glance(summary(lmfit))

r.squared	adj.r.squared	sigma	statistic	p.value	df
0.7326	0.6628	7.068	10.5	0	7

1.0.4.2.0.2 A model based on advancement and raises (positive stuff)

\[Y = \beta_0 + \beta_1 x_1 + \beta_2x_3 + \beta_3 x_4 + \beta_4 x_6 + \epsilon\]

\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_3=\) Opportunity to learn new things
\(x_4=\) Raises based on performance
\(x_6=\) Rate of advancing to better jobs

lmfit.positive <- lm(y ~ x1 + x3 + x4 + x6, data = superData)
tidy(lmfit.positive, conf.int=TRUE)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	11.9917	8.2411	1.4551	0.1581	-4.9811	28.9644
x1	0.5811	0.1443	4.0274	0.0005	0.2839	0.8782
x3	0.2999	0.1583	1.8947	0.0698	-0.0261	0.6258
x4	0.1062	0.2049	0.5185	0.6087	-0.3158	0.5283
x6	-0.2289	0.1677	-1.3646	0.1845	-0.5744	0.1166

glance(summary(lmfit.positive))

r.squared	adj.r.squared	sigma	statistic	p.value	df
0.7285	0.6851	6.831	16.77	0	5

1.0.4.2.0.3 A model focusing on the negative

\[Y = \beta_0 + \beta_1 x_2 + \beta_2 x_5 + \epsilon\]

\(Y=\) Overall rating of job being done by supervisor
\(x_2=\) Does not allow special privileges
\(x_5=\) Too critical of poor performance

lmfit.negative <- lm(y ~ x2 + x5, data = superData)
tidy(lmfit.negative, conf.int=TRUE)

term	estimate	std.error	statistic	p.value	conf.low	conf.high
(Intercept)	34.0448	17.4725	1.9485	0.0618	-1.8057	69.8954
x2	0.4099	0.1742	2.3536	0.0261	0.0526	0.7672
x5	0.1178	0.2153	0.5471	0.5888	-0.3240	0.5597

glance(summary(lmfit.negative))

r.squared	adj.r.squared	sigma	statistic	p.value	df
0.1905	0.1306	11.35	3.178	0.0576	3

1.0.4.3 Comparing non-nested models

1.0.4.3.1 Criteria

Adjusted \(R^2\quad\) (larger is better)
MSE = \(\widehat{\sigma}^2\quad\) (smaller is better)
AIC = \(n(\mathrm{SSE}_p/n) + 2p\quad\) (smaller is better)
BIC = \(n\log_e(\mathrm{SSE}_p/n) + p\log_e(n)\quad\) (smaller is better)

1.0.4.3.1.1 The three models we are comparing:

\(Y = \beta_0 + \beta_1x_1 + \beta_2 x_2 + \beta_3 x_3 + \beta_4 x_4 + \beta_5 x_5 + \beta_6 x_6 + \epsilon\) (the full model)

\(Y = \beta_0 + \beta_1x_1 + \beta_2x_3 + \beta_3 x_4 + \beta_4 x_6 + \epsilon\) (positive stuff)

\(Y = \beta_0 + \beta_1 x_2 + \beta_2 x_5 + \epsilon\) (negative stuff)

The full model:

glance(lmfit)[c(2, 3, 8, 9)]

adj.r.squared	sigma	AIC	BIC
0.6628	7.068	210.5	221.7

A model based on advancement and raises (positive stuff)

glance(lmfit.positive)[c(2, 3, 8, 9)]

adj.r.squared	sigma	AIC	BIC
0.6851	6.831	207	215.4

A model focusing on the negative

glance(lmfit.negative)[c(2, 3, 8, 9)]

adj.r.squared	sigma	AIC	BIC
0.1306	11.35	235.7	241.3

The full model and the “positive” model are fairly close.
Both are clearly better than the “negative” model.

Perhaps choose the “positive” model for parsimony.

…and how shall we interpret this result with regard to rating supervisors?

1.0.5 Automation

1.0.5.1 Automated variable selection strategies

Thinking through the problem carefully is always a strategy that should be employed!!!

But, sometimes may want to use one of these objective approaches:

Forward Selection
Backward Elimination
Stepwise Selection (a little of both)

1.0.5.1.1 Forward Selection

Begin with an “empty” model (no predictors)
Add the predictor with highest correlation with response
- or largest t-statistic (smallest p-value)
- \(H_0: \text{correlation}_j = 0\) is equivalent to \(H_0: \beta_j=0\)
- decide here on a significance level required for entry at each step
Continue on to add next predictor with highest partial correlation with response
- that is, after adjusting for the predictor added in step (2)
- or largest t-statistic meeting the significance level criterion
- \(H_0: \text{partial correlation}_j = 0\) is equivalent to \(H_0: \beta_j=0\)
With new model from (3), go back to step (3) again
- add another predictor if one meets the criterion (a p-value threshold)
- repeat (3) until no other predictors meet the criterion

1.0.5.1.2 Backward Elimination

Begin with the “full” model (all predictors)
Remove the predictor with the smallest t-statistic (largest p-value)
Continue on to remove the next predictor with smallest t-statistic
- that is, after adjusting for the remaining predictors after step (2)
With the new model from (3), go back to step (3) again
- remove another predictor if it meets the criterion (a p-value threshold)
- repeat (3) until no other predictors meet the criterion

1.0.5.1.3 Stepwise Selection

It combines forward and backward steps,
usually beginning from empty model (forward stepwise).

1.0.6 Example V2

I found an R package (olsrr) that will do variable selection much like described in the textbook.
I’m sure there are other packages.

install.packages("olsrr")
library(olsrr)

1.0.6.1 Forward selection

lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)

1.0.6.1.0.1 Entry criteria: p-value of t-test for coefficient

ols_step_forward_p(lmfit, penter = 0.99)

Using penter = 0.99 we are adding in all predictors,
one by one, whichever variable has the lowest p-value comes in next.

library(olsrr)

Attaching package: 'olsrr'
The following object is masked from 'package:MASS':

    cement
The following object is masked from 'package:datasets':

    rivers
lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)
forward.sequence <- ols_step_forward_p(lmfit, penter = 0.99)
Forward Selection Method    
---------------------------

Candidate Terms: 

1. x1 
2. x2 
3. x3 
4. x4 
5. x5 
6. x6 

We are selecting variables based on p value...

Variables Entered: 

- x1 
- x3 
- x6 
- x2 
- x4 
- x5 


Final Model Output 
------------------

                        Model Summary                          
--------------------------------------------------------------
R                       0.856       RMSE                7.068 
R-Squared               0.733       Coef. Var          10.936 
Adj. R-Squared          0.663       MSE                49.957 
Pred R-Squared          0.547       MAE                 5.179 
--------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression    3147.966         6        524.661    10.502    0.0000 
Residual      1149.000        23         49.957                     
Total         4296.967        29                                    
--------------------------------------------------------------------

                                   Parameter Estimates                                    
-----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig       lower     upper 
-----------------------------------------------------------------------------------------
(Intercept)    10.787        11.589                  0.931    0.362    -13.187    34.761 
         x1     0.613         0.161        0.671     3.809    0.001      0.280     0.946 
         x3     0.320         0.169        0.309     1.901    0.070     -0.028     0.669 
         x6    -0.217         0.178       -0.183    -1.218    0.236     -0.586     0.152 
         x2    -0.073         0.136       -0.073    -0.538    0.596     -0.354     0.208 
         x4     0.082         0.221        0.070     0.369    0.715     -0.376     0.540 
         x5     0.038         0.147        0.031     0.261    0.796     -0.266     0.342 
-----------------------------------------------------------------------------------------
forward.sequence

                           Selection Summary                             
------------------------------------------------------------------------
        Variable                  Adj.                                      
Step    Entered     R-Square    R-Square     C(p)       AIC        RMSE     
------------------------------------------------------------------------
   1    x1            0.6813      0.6699    1.4115    205.7638    6.9933    
   2    x3            0.7080      0.6864    1.1148    205.1387    6.8168    
   3    x6            0.7256      0.6939    1.6027    205.2758    6.7343    
   4    x2            0.7293      0.6860    3.2805    206.8634    6.8206    
   5    x4            0.7318      0.6759    5.0682    208.5886    6.9294    
   6    x5            0.7326      0.6628    7.0000    210.4998    7.0680    
------------------------------------------------------------------------

The olsrr package includes a plot method for the output from ols_step_forward_p

forward.sequence <- ols_step_forward_p(lmfit, penter = 0.99)
plot(forward.sequence)

You can see how the various model summaries change as variables are added one-by-one.

SBC is BIC (I don’t know why it’s relabeled)
SBIC is a slight modification of BIC.
C(p) is Mallow’s \(C_p\) discussed in the textbook, but not in lecture.

plot(forward.sequence)

Could use a rule to only add variables if the p-value is less than a specified value:

Then set criterion to, say, p-value \(= 0.25\)

forward.output <- ols_step_forward_p(lmfit, penter = 0.25)
Forward Selection Method    
---------------------------

Candidate Terms: 

1. x1 
2. x2 
3. x3 
4. x4 
5. x5 
6. x6 

We are selecting variables based on p value...

Variables Entered: 

- x1 
- x3 
- x6 

No more variables to be added.

Final Model Output 
------------------

                        Model Summary                          
--------------------------------------------------------------
R                       0.852       RMSE                6.734 
R-Squared               0.726       Coef. Var          10.419 
Adj. R-Squared          0.694       MSE                45.350 
Pred R-Squared          0.642       MAE                 5.317 
--------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression    3117.858         3       1039.286    22.917    0.0000 
Residual      1179.109        26         45.350                     
Total         4296.967        29                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    13.578         7.544                  1.800    0.084    -1.929    29.084 
         x1     0.623         0.118        0.681     5.271    0.000     0.380     0.866 
         x3     0.312         0.154        0.301     2.026    0.053    -0.005     0.629 
         x6    -0.187         0.145       -0.158    -1.291    0.208    -0.485     0.111 
----------------------------------------------------------------------------------------
forward.output

                           Selection Summary                             
------------------------------------------------------------------------
        Variable                  Adj.                                      
Step    Entered     R-Square    R-Square     C(p)       AIC        RMSE     
------------------------------------------------------------------------
   1    x1            0.6813      0.6699    1.4115    205.7638    6.9933    
   2    x3            0.7080      0.6864    1.1148    205.1387    6.8168    
   3    x6            0.7256      0.6939    1.6027    205.2758    6.7343    
------------------------------------------------------------------------

\(Y=\) Overall rating of job being done by supervisor
\(x_1=\) Handles employee complaints
\(x_3=\) Opportunity to learn new things
\(x_6=\) Rate of advancing to better jobs

1.0.6.1.0.2 Entry criteria: AIC no longer decreases

Alternatively, we can use a criterion like AIC to decide when to stop.
We will stop when AIC no longer decreases.

ols_step_forward_aic(lmfit)

ols_step_forward_aic(lmfit)
Forward Selection Method 
------------------------

Candidate Terms: 

1 . x1 
2 . x2 
3 . x3 
4 . x4 
5 . x5 
6 . x6 


Variables Entered: 

- x1 
- x3 

No more variables to be added.

                         Selection Summary                           
--------------------------------------------------------------------
Variable       AIC       Sum Sq       RSS        R-Sq      Adj. R-Sq 
--------------------------------------------------------------------
x1           205.764    2927.584    1369.382    0.68131      0.66993 
x3           205.139    3042.318    1254.649    0.70802      0.68639 
--------------------------------------------------------------------

1.0.6.2 Backward elimination

lmfit <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6, data = superData)

1.0.6.2.0.1 Exit criteria: p-value of t-test for coefficient

Start with all variables in the model and remove according to the p-value.
Remove predictor with largest p-value.

ols_step_backward_p(lmfit, prem = 0.33)

If we set the p-value for removal = 0.33, that corresponds to a t-statistic \(\approx 1\).

2 * (1 - pt(1, df = 30 - 6 - 1))
[1] 0.3277
ols_step_backward_p(lmfit, prem = 0.33)
Backward Elimination Method 
---------------------------

Candidate Terms: 

1 . x1 
2 . x2 
3 . x3 
4 . x4 
5 . x5 
6 . x6 

We are eliminating variables based on p value...

Variables Removed: 

- x5 
- x4 
- x2 

No more variables satisfy the condition of p value = 0.33


Final Model Output 
------------------

                        Model Summary                          
--------------------------------------------------------------
R                       0.852       RMSE                6.734 
R-Squared               0.726       Coef. Var          10.419 
Adj. R-Squared          0.694       MSE                45.350 
Pred R-Squared          0.642       MAE                 5.317 
--------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression    3117.858         3       1039.286    22.917    0.0000 
Residual      1179.109        26         45.350                     
Total         4296.967        29                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    13.578         7.544                  1.800    0.084    -1.929    29.084 
         x1     0.623         0.118        0.681     5.271    0.000     0.380     0.866 
         x3     0.312         0.154        0.301     2.026    0.053    -0.005     0.629 
         x6    -0.187         0.145       -0.158    -1.291    0.208    -0.485     0.111 
----------------------------------------------------------------------------------------

                          Elimination Summary                            
------------------------------------------------------------------------
        Variable                  Adj.                                      
Step    Removed     R-Square    R-Square     C(p)       AIC        RMSE     
------------------------------------------------------------------------
   1    x5            0.7318      0.6759    5.0682    208.5886    6.9294    
   2    x4            0.7293       0.686    3.2805    206.8634    6.8206    
   3    x2            0.7256      0.6939    1.6027    205.2758    6.7343    
------------------------------------------------------------------------

1.0.6.2.0.2 Exit criteria: AIC no longer decreases

Alternatively, we can use a criterion like AIC to decide when to stop.
We will stop when AIC no longer decreases.

ols_step_backward_aic(lmfit)

ols_step_backward_aic(lmfit)
Backward Elimination Method 
---------------------------

Candidate Terms: 

1 . x1 
2 . x2 
3 . x3 
4 . x4 
5 . x5 
6 . x6 


Variables Removed: 

- x5 
- x4 
- x2 
- x6 

No more variables to be removed.

                     Backward Elimination Summary                      
---------------------------------------------------------------------
Variable        AIC        RSS        Sum Sq      R-Sq      Adj. R-Sq 
---------------------------------------------------------------------
Full Model    210.500    1149.000    3147.966    0.73260      0.66285 
x5            208.589    1152.406    3144.560    0.73181      0.67594 
x4            206.863    1163.012    3133.955    0.72934      0.68604 
x2            205.276    1179.109    3117.858    0.72559      0.69393 
x6            205.139    1254.649    3042.318    0.70802      0.68639 
---------------------------------------------------------------------

1.0.6.3 Stepwise (forward) selection

1.0.6.3.0.1 Criteria: Use t-statistic p-value as the entry and exit criterion

As variables come into the model,
you can also check to see if any could later be removed.
Both criteria can be the p-values for coefficient t-tests.

ols_step_both_p(lmfit, pent = 0.25, prem = 0.33)

ols_step_both_p(lmfit, pent = 0.25, prem = 0.33)
Stepwise Selection Method   
---------------------------

Candidate Terms: 

1. x1 
2. x2 
3. x3 
4. x4 
5. x5 
6. x6 

We are selecting variables based on p value...

Variables Entered/Removed: 

- x1 added 
- x3 added 
- x6 added 

No more variables to be added/removed.


Final Model Output 
------------------

                        Model Summary                          
--------------------------------------------------------------
R                       0.852       RMSE                6.734 
R-Squared               0.726       Coef. Var          10.419 
Adj. R-Squared          0.694       MSE                45.350 
Pred R-Squared          0.642       MAE                 5.317 
--------------------------------------------------------------
 RMSE: Root Mean Square Error 
 MSE: Mean Square Error 
 MAE: Mean Absolute Error 

                               ANOVA                                 
--------------------------------------------------------------------
                Sum of                                              
               Squares        DF    Mean Square      F         Sig. 
--------------------------------------------------------------------
Regression    3117.858         3       1039.286    22.917    0.0000 
Residual      1179.109        26         45.350                     
Total         4296.967        29                                    
--------------------------------------------------------------------

                                  Parameter Estimates                                    
----------------------------------------------------------------------------------------
      model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
----------------------------------------------------------------------------------------
(Intercept)    13.578         7.544                  1.800    0.084    -1.929    29.084 
         x1     0.623         0.118        0.681     5.271    0.000     0.380     0.866 
         x3     0.312         0.154        0.301     2.026    0.053    -0.005     0.629 
         x6    -0.187         0.145       -0.158    -1.291    0.208    -0.485     0.111 
----------------------------------------------------------------------------------------

                             Stepwise Selection Summary                              
------------------------------------------------------------------------------------
                     Added/                   Adj.                                      
Step    Variable    Removed     R-Square    R-Square     C(p)       AIC        RMSE     
------------------------------------------------------------------------------------
   1       x1       addition       0.681       0.670    1.4110    205.7638    6.9933    
   2       x3       addition       0.708       0.686    1.1150    205.1387    6.8168    
   3       x6       addition       0.726       0.694    1.6030    205.2758    6.7343    
------------------------------------------------------------------------------------

1.0.6.3.0.2 Criterion: Or use AIC as the entry and exit criterion

ols_step_both_aic(lmfit)

ols_step_both_aic(lmfit)
Stepwise Selection Method 
-------------------------

Candidate Terms: 

1 . x1 
2 . x2 
3 . x3 
4 . x4 
5 . x5 
6 . x6 


Variables Entered/Removed: 

- x1 added 
- x3 added 

No more variables to be added or removed.

                                Stepwise Summary                                 
-------------------------------------------------------------------------------
Variable     Method       AIC        RSS        Sum Sq      R-Sq      Adj. R-Sq 
-------------------------------------------------------------------------------
x1          addition    205.764    1369.382    2927.584    0.68131      0.66993 
x3          addition    205.139    1254.649    3042.318    0.70802      0.68639 
-------------------------------------------------------------------------------

1.0.6.4 Stepwise (backward) selection

Start with the full model (all predictors).
Remove according to p-value (large)
…and possibly later enter a variable back in according to p-value (small).

In the olsrr package, the function ols_step_both_p does not appear to support backward stepwise selection – only forward (demonstrated above).

y	x1	x2	x3	x4	x5	x6
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47
43	55	49	44	54	49	34

y	x1	x2	x3	x4	x5	x6
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47
43	55	49	44	54	49	34

STAT 224 Chapter 11

1 Chapter 11: Model Selection

1.0.1 Intro

1.0.1.1 Variable (model) selection

1.0.1.1.1 Thus far…

1.0.1.1.2 Reality…

1.0.1.1.3 What is variable (model) selection

1.0.1.2 Model comparison statistics

1.0.1.2.1 Information criterion

1.0.1.3 Which models to compare?

1.0.2 Consequences

1.0.2.1 Consequences of excluding necessary variables

1.0.2.1.0.1 What do we gain?

1.0.2.1.0.2 What do we lose?

1.0.2.1.0.3 None of these consequence occur if predictors uncorrelated

1.0.2.2 Consequences of including unnecessary variables

1.0.2.2.0.1 What do we gain?

1.0.2.2.0.2 What do we lose?

1.0.2.2.0.3 None of these consequence occur if predictors uncorrelated

1.0.3 Modeling Purpose

1.0.3.1 What is your modeling purpose?

1.0.3.1.0.1 The three goals are not mutually exclusive

1.0.4 Example

1.0.4.1 Supervisor performance example

1.0.4.1.1 Is multicollinearity a potential problem?

1.0.4.1.1.1 High correlations between predictors?

1.0.4.1.2 Variance inflation factors larger than 10?

1.0.4.2 Three models to compare

1.0.4.2.0.1 The full model

1.0.4.2.0.2 A model based on advancement and raises (positive stuff)

1.0.4.2.0.3 A model focusing on the negative

1.0.4.3 Comparing non-nested models

1.0.4.3.1 Criteria

1.0.4.3.1.1 The three models we are comparing:

1.0.5 Automation

1.0.5.1 Automated variable selection strategies

1.0.5.1.1 Forward Selection

1.0.5.1.2 Backward Elimination

1.0.5.1.3 Stepwise Selection

1.0.6 Example V2

1.0.6.1 Forward selection

1.0.6.1.0.1 Entry criteria: p-value of t-test for coefficient

1.0.6.1.0.2 Entry criteria: AIC no longer decreases

1.0.6.2 Backward elimination

1.0.6.2.0.1 Exit criteria: p-value of t-test for coefficient

1.0.6.2.0.2 Exit criteria: AIC no longer decreases

1.0.6.3 Stepwise (forward) selection

1.0.6.3.0.1 Criteria: Use t-statistic p-value as the entry and exit criterion

1.0.6.3.0.2 Criterion: Or use AIC as the entry and exit criterion

1.0.6.4 Stepwise (backward) selection

y	x1	x2	x3	x4	x5	x6
43	51	30	39	61	92	45
63	64	51	54	63	73	47
71	70	68	69	76	86	48
61	63	45	47	54	84	35
81	78	56	66	71	83	47
43	55	49	44	54	49	34