---
title: "Regularization"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(ggformula)
library(broom)
library(car)
library(glmnet)
set.seed(1)
```
## The problem of overfitting
```{r}
N<-10
X<-runif(N)*10
err<-rnorm(N)*20
Y<-2+3*X-4*X^2+err
out<-data.frame(X,Y)
```
```{r}
gf_point(Y~X,data=out)
```
```{r,include=FALSE}
fit0<-lm(Y~X)
fit1<-lm(Y~X+I(X^2))
fit2<-lm(Y~X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2))
out<-data.frame(Y,X,predict(fit0),predict(fit1),predict(fit2))
```
First we can try a simple linear fit, using the command `fit0<-lm(Y~X)`
```{r, echo=FALSE}
gf_point(Y~X,data=out) %>%
gf_line(predict.fit0.~X,data=out)
```
That works OK, but we can do a better job if we include the additional predictor $X^2$:
`fit1<-lm(Y~X+I(X^2))`
```{r,echo=FALSE}
gf_point(Y~X,data=out) %>%
gf_line(predict.fit1.~X,data=out)
```
But what if we could get a *perfect* fit? We can! We only need to use even more predictors, like this:
`fit2<-lm(Y~X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2))`
```{r}
gf_point(Y~X,data=out) %>%
gf_line(predict.fit2.~X,data=out)
```
Which of these three is the best fit? Let's look at them individually:
```{r}
summary(fit0)
```
This one looks OK.
```{r}
summary(fit1)
```
Here the coefficients are very different from the true values, and one of them isn't even significant. Evidence of multicollinearity! (why?) Let's check:
```{r}
vif(fit1)
```
Now let's look at the last fit:
```{r}
summary(fit2)
vif(fit2)
```
We can use regularization instead. First we do ridge regression, which uses a $\beta^2$ penalty:
```{r}
library(glmnet)
mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1]
fit_ridge_cv = cv.glmnet(mm, Y, alpha = 0)
coef(fit_ridge_cv, s="lambda.1se")
```
Now we do LASSO regression, which uses a $|\beta|$ penalty:
```{r}
mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1]
fit_lasso_cv = cv.glmnet(mm, Y, alpha = 1)
coef(fit_lasso_cv,s="lambda.1se")
```
```{r}
N<-1000
X<-runif(N)*10
err<-rnorm(N)*2
Y<-2+3*X-4*X^2+err
out<-data.frame(X,Y)
mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1]
```
```{r}
fit_ridge_cv = cv.glmnet(mm, Y, alpha = 0)
coef(fit_ridge_cv, s = "lambda.1se")
```
```{r}
fit_lasso_cv = cv.glmnet(mm, Y, alpha = 1)
coef(fit_lasso_cv, s = "lambda.1se")
```