SLR model: \[Y \;=\; \beta_0 + \beta_1X_1 + \epsilon\]
Model assumption (linearity) \[\begin{align}
E(Y\; |\; X=x) &\;=\; E(\beta_0 +\beta_1 x + \epsilon ) \\ \\
& \;=\; \beta_0 +\beta_1 x + E(\epsilon) \\ \\
& \;=\; \beta_0 +\beta_1 x + 0
\;=\; \beta_0 +\beta_1 x
\end{align}\]
Model for an individual observation: \[Y_i \;=\; \beta_0 +\beta_1 x_i + \epsilon_i\]
Meaning of \(\beta_0\): \[E(Y\; |\; X=0) \;=\; \beta_0 +\beta_1 (0) \;=\; \beta_0\]
Meaning of \(\beta_1\):
\[\begin{align}
E(Y\; |\; X=x+1) &\;=\; \beta_0 +\beta_1 (x+1) \\ \\
&\;=\; \beta_0 +\beta_1 x + \beta_1
\;=\; E(Y\; |\; X=x) + \beta_1
\end{align}\]
Fitted value for an observation in the data set: \[{\widehat{y}}_i \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_i\]
Prediction for the average of y values given \(x=x_0\): \[\widehat{\mu}_0 = \;=\;{\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]
Prediction for the an individual y value given \(x=x_0\):
\[{\widehat{y}}_0 \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]
Residuals \[\begin{align} {\rm residual} \;=\; e_i &\;=\; {\rm observed - fitted} \\ &\;=\; {\rm observed - expected} \\ &\;=\; {\rm observed - predicted} \\ \\ &\;=\; y_i - {\widehat{y}}_i\\ \\ &\;=\; y_i - ({\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_i) \end{align}\]
Sum of Squared Errors \[{\mathrm{SSE}}\;=\; \sum_i e_i^2 \;=\; \sum_i (y_i - {\widehat{y}}_i)^2
\;=\; \sum_i (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2\]
Least squares estimates of \(\beta_0\) and \(\beta_1\): \[{\widehat{\beta}_{0}} \;=\; {\overline{y}}- {\widehat{\beta}_{1}} {\overline{x}}\]
\[{\widehat{\beta}_{1}} \;=\; r \frac{s_y}{s_x}\]
Estimators are unbiased:
\[E({\widehat{\beta}_{0}}) = {\beta_{0}} \qquad \text{ and } \qquad E({\widehat{\beta}_{1}}) = {\beta_{1}}\]
\[\begin{align} E({\widehat{\beta}_{1}}) &\;=\; E\left(\sum k_iY_i\right) \;=\; \sum E(k_i Y_i) \;=\; \sum k_i E(Y_i) \\ \\ &\;=\; \sum k_i (\beta_0 + \beta_1 x_i) \;=\; \beta_0 \sum k_i + \beta_1 \sum k_i x_i \\ \\ &\;=\; \beta_0 (0) + \beta_1 (1) \;=\; \beta_1 \end{align}\]
\[\begin{align} E({\widehat{\beta}_{0}}) &\;=\; E({\overline{Y}}- {\widehat{\beta}_{1}} {\overline{x}}) \;=\; E({\overline{Y}}) - {\overline{x}}E({\widehat{\beta}_{1}}) \\ \\ &\;=\; (\beta_0 + \beta_1 {\overline{x}}) - {\overline{x}}\beta_1 \;=\; \beta_0 \end{align}\]
Estimator of the population variance: \[\widehat\sigma^2 \;=\; \frac{{\mathrm{SSE}}}{{\mathrm{df}}} \;=\; \frac{e_i^2}{n-2} \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{n-2}\]
(Note: \(\widehat\sigma\) is called “residual standard error” on some R output.)
Standard errors for the estimators:
\[{\mathrm{se}}({\widehat{\beta}_{1}}) \;=\; \frac{\widehat{\sigma}}{\sqrt{\sum(x_i-{\overline{x}})^2}}\]
\[{\mathrm{se}}({\widehat{\beta}_{0}}) \;=\; \widehat{\sigma} \;\sqrt{\frac{1}{n} + \frac{{\overline{x}}^2}{\sum(x_i-{\overline{x}})^2}}\]
T-statistic:
\[\frac{{\widehat{\beta}_{0}} - \beta_0}{{\mathrm{se}}(\,\beta_0\,)} \;\sim\; t_{n-1}\]
Partitioning the sum of squares:
\[\sum (y_i - {\overline{y}})^2 \;=\; \sum (y_i - {\widehat{y}}_i + {\widehat{y}}_i - {\overline{y}})^2
\;=\; \sum (y_i - {\widehat{y}}_i)^2 \;+\; \sum ({\widehat{y}}_i - {\overline{y}})^2\]
SST = \(\sum (y_i - {\overline{y}})^2\) = total sum of squares
SSE = \(\sum (y_i - {\widehat{y}}_i)^2\) = error sum of squares
SSR = \(\sum ({\widehat{y}}_i - {\overline{y}})^2\) = regression sum of squares
So, SST = SSE + SSR
\(R^2\) = proportion of variability in the response (\(Y\))
explained by regression on the predictor (\(x\))
\[R^2 \;=\; \frac{{\mathrm{SSR}}}{{\mathrm{SST}}} \;=\; 1 - \frac{{\mathrm{SSE}}}{{\mathrm{SST}}}
\;=\; 1 - \frac{\sum (y_i - {\widehat{y}}_i)^2}{\sum (y_i - {\overline{y}})^2}\]
\[{\rm adjusted\;} R^2 = \frac{{\mathrm{SSR}}/ {\mathrm{SSR}}({\mathrm{df}})}{{\mathrm{SST}}/ {\mathrm{SST}}({\mathrm{df}})} = \frac{\sum ({\widehat{y}}_i - {\overline{y}})^2 / 1}{\sum (y_i - {\overline{y}})^2 / (n-1)}\]
F-statistic
\[F = \frac{[ {\mathrm{SSE}}({\mathrm{RM}}) - {\mathrm{SSE}}({\mathrm{FM}}) ] \;/\; [ {\mathrm{df}}({\mathrm{RM}}) - {\mathrm{df}}({\mathrm{FM}}) ]}{{\mathrm{SSE}}({\mathrm{FM}}) \;/\; {\mathrm{df}}({\mathrm{FM}})}\]
\[F \;=\; \frac{\left[ \sum y_i^2 - \sum (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2 \right] \;/\; [n - (n-2)]}{\sum (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2 \;/\; (n-2)} \;\sim\; F_{2, n-2}\]
Confidence interval for \(\beta_0\) (note: not \({\widehat{\beta}_{0}}\), why?): \[{\widehat{\beta}_{0}} \;\pm\; t^* {\mathrm{se}}({\widehat{\beta}_{0}})\]
\[{\widehat{\beta}_{1}} \;\pm\; t^* {\mathrm{se}}({\widehat{\beta}_{1}})\]
Confidence interval for mean: \[\widehat{\mu}_0 \;\;=\;\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]
\[\widehat{\mu}_0 \;\pm\; t^*\; \widehat{\sigma}\; \sqrt{\frac{1}{n} + \frac{(x_0 - {\overline{x}})^2}{\sum (x_i-{\overline{x}})^2}}\]
Confidence interval for prediction: \[{\widehat{y}}_0 \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]
\[{\widehat{y}}_0 \;\pm\; t^*\; \widehat{\sigma}\; \sqrt{1 + \frac{1}{n} + \frac{(x_0 - {\overline{x}})^2}{\sum (x_i-{\overline{x}})^2}}\]
## MLR Model: \[Y \;=\; \beta_0 + \beta_1x_1 + \beta_2x_2 + \ldots \beta_px_p + \epsilon\]
SSE: \[SSE \;=\; \sum_i e_i^2
\;=\; \sum_i (y_i - \widehat{\beta_0} - \widehat{\beta_1} x_{i1} - \widehat{\beta_2} x_{i2} - \cdots + {\widehat{\beta}_p}x_{ip})^2\]
Formula for \({\widehat{\beta}_{0}}\) (others can be written with matrices, you don’t neet do know those formulas): \[\widehat{\beta_0} \;=\; {\overline{y}}- \widehat{\beta_1} {\overline{x}}_1 - \widehat{\beta_2} {\overline{x}}_2 - \cdots - {\widehat{\beta}_p}{\overline{x}}_p
\;=\; {\overline{y}}- \sum_{j=1}^p \widehat\beta_j {\overline{x}}_j\]
\[\widehat\sigma^2 \;=\; \frac{{\mathrm{SSE}}}{{\mathrm{df}}} \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{{\mathrm{df}}}\]
degrees of freedom (df)
= sample size \(-\) ### parameters estimated for the mean (\(\beta_0, \beta_1, \beta_2, \ldots, \beta_p\))
= \(n-(p+1) \;=\; (n - p - 1)\)
\[\widehat\sigma^2 \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{n - p - 1}\]
Recall: \(\beta_1\) is the effect of \(x_1\) on expected \(Y\)
but only after we have “adjusted for” \(x_2\) in the model.
\[ R^2 = [ {\mathrm{corr}}(y, {\widehat{y}}) ]^2 = \frac{{\mathrm{SSR}}}{{\mathrm{SST}}} = 1 - \frac{{\mathrm{SSE}}}{{\mathrm{SST}}}\]
\[{\rm Adjusted}\; R^2
\;=\; 1 - \frac{{\mathrm{SSE}}\;/\; {\mathrm{SSE}}(df)}{{\mathrm{SST}}\;/\; {\mathrm{SST}}(df)}
\;=\; 1 - \frac{{\mathrm{SSE}}\;/\; (n-p-1)}{{\mathrm{SST}}\;/\; (n-1)}\]
\[\frac{{\widehat{\beta}_j}- \beta_j}{{\mathrm{se}}({\widehat{\beta}_j})} \;\sim\; t_{n-p-1}\]
\[F \;=\; \frac{[ {\mathrm{SSE}}({\mathrm{RM}}) - {\mathrm{SSE}}({\mathrm{FM}}) ] \;/\; [ {\mathrm{df}}({\mathrm{RM}}) - {\mathrm{df}}({\mathrm{FM}}) ]}{{\mathrm{SSE}}({\mathrm{FM}}) \;/\; {\mathrm{df}}({\mathrm{FM}})}\]
\[r_i \;=\; \frac{e_i}{\widehat{\sigma}\sqrt{1 - h_{ii}}}\]
\[h_{ii} = \frac{1}{n} + \frac{(x_i -{\overline{x}})^2}{\sum(x_i - {\overline{x}})^2}\]
\[h_{ij} = \frac{1}{n} + \frac{(x_i -{\overline{x}})(x_j-{\overline{x}})}{\sum(x_i - {\overline{x}})^2}\]
The average of the hat values should be about
\[\frac{(p+1)}{n}\]
\[C_i = \frac{\sum_{j=1}^n ({\widehat{y}}_j - {\widehat{y}}_{j(i)})^2}{\widehat{\sigma}^2 (p+1)}\]
\[S = \beta_0 + \beta_1 x + \beta_2 e_1 + \beta_3 e_2 + \beta_4 m + \epsilon\]
\[{\rm risk} = \beta_0 + \beta_1 {\rm smoke} + \beta_2 {\rm OC} + \beta_3 ({\rm OC}\times {\rm smoke}) + \epsilon\]