--- title: "Regularization" output: html_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(ggformula) library(broom) library(car) library(glmnet) set.seed(1) ``` ## The problem of overfitting ```{r} N<-10 X<-runif(N)*10 err<-rnorm(N)*20 Y<-2+3*X-4*X^2+err out<-data.frame(X,Y) ``` ```{r} gf_point(Y~X,data=out) ``` ```{r,include=FALSE} fit0<-lm(Y~X) fit1<-lm(Y~X+I(X^2)) fit2<-lm(Y~X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2)) out<-data.frame(Y,X,predict(fit0),predict(fit1),predict(fit2)) ``` First we can try a simple linear fit, using the command `fit0<-lm(Y~X)` ```{r, echo=FALSE} gf_point(Y~X,data=out) %>% gf_line(predict.fit0.~X,data=out) ``` That works OK, but we can do a better job if we include the additional predictor $X^2$: `fit1<-lm(Y~X+I(X^2))` ```{r,echo=FALSE} gf_point(Y~X,data=out) %>% gf_line(predict.fit1.~X,data=out) ``` But what if we could get a *perfect* fit? We can! We only need to use even more predictors, like this: `fit2<-lm(Y~X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2))` ```{r} gf_point(Y~X,data=out) %>% gf_line(predict.fit2.~X,data=out) ``` Which of these three is the best fit? Let's look at them individually: ```{r} summary(fit0) ``` This one looks OK. ```{r} summary(fit1) ``` Here the coefficients are very different from the true values, and one of them isn't even significant. Evidence of multicollinearity! (why?) Let's check: ```{r} vif(fit1) ``` Now let's look at the last fit: ```{r} summary(fit2) vif(fit2) ``` We can use regularization instead. First we do ridge regression, which uses a $\beta^2$ penalty: ```{r} library(glmnet) mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1] fit_ridge_cv = cv.glmnet(mm, Y, alpha = 0) coef(fit_ridge_cv, s="lambda.1se") ``` Now we do LASSO regression, which uses a $|\beta|$ penalty: ```{r} mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1] fit_lasso_cv = cv.glmnet(mm, Y, alpha = 1) coef(fit_lasso_cv,s="lambda.1se") ``` ```{r} N<-1000 X<-runif(N)*10 err<-rnorm(N)*2 Y<-2+3*X-4*X^2+err out<-data.frame(X,Y) mm = model.matrix(Y ~ X+I(X^2)+I(X^3)+I(X^4)+I(X^5)+I(X^6)+I(X^7)+I(X^8)+I(X^-2), out)[, -1] ``` ```{r} fit_ridge_cv = cv.glmnet(mm, Y, alpha = 0) coef(fit_ridge_cv, s = "lambda.1se") ``` ```{r} fit_lasso_cv = cv.glmnet(mm, Y, alpha = 1) coef(fit_lasso_cv, s = "lambda.1se") ```