All lectures

1 Review

SLR model: \[Y \;=\; \beta_0 + \beta_1X_1 + \epsilon\]



Model assumption (linearity) \[\begin{align} E(Y\; |\; X=x) &\;=\; E(\beta_0 +\beta_1 x + \epsilon ) \\ \\ & \;=\; \beta_0 +\beta_1 x + E(\epsilon) \\ \\ & \;=\; \beta_0 +\beta_1 x + 0 \;=\; \beta_0 +\beta_1 x \end{align}\]



Model for an individual observation: \[Y_i \;=\; \beta_0 +\beta_1 x_i + \epsilon_i\]



Meaning of \(\beta_0\): \[E(Y\; |\; X=0) \;=\; \beta_0 +\beta_1 (0) \;=\; \beta_0\]



Meaning of \(\beta_1\):

\[\begin{align} E(Y\; |\; X=x+1) &\;=\; \beta_0 +\beta_1 (x+1) \\ \\ &\;=\; \beta_0 +\beta_1 x + \beta_1 \;=\; E(Y\; |\; X=x) + \beta_1 \end{align}\]



Fitted value for an observation in the data set: \[{\widehat{y}}_i \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_i\]



Prediction for the average of y values given \(x=x_0\): \[\widehat{\mu}_0 = \;=\;{\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]



Prediction for the an individual y value given \(x=x_0\):

\[{\widehat{y}}_0 \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]



Residuals \[\begin{align} {\rm residual} \;=\; e_i &\;=\; {\rm observed - fitted} \\ &\;=\; {\rm observed - expected} \\ &\;=\; {\rm observed - predicted} \\ \\ &\;=\; y_i - {\widehat{y}}_i\\ \\ &\;=\; y_i - ({\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_i) \end{align}\]





Sum of Squared Errors \[{\mathrm{SSE}}\;=\; \sum_i e_i^2 \;=\; \sum_i (y_i - {\widehat{y}}_i)^2 \;=\; \sum_i (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2\]



Least squares estimates of \(\beta_0\) and \(\beta_1\): \[{\widehat{\beta}_{0}} \;=\; {\overline{y}}- {\widehat{\beta}_{1}} {\overline{x}}\]



\[{\widehat{\beta}_{1}} \;=\; r \frac{s_y}{s_x}\]



Estimators are unbiased:

\[E({\widehat{\beta}_{0}}) = {\beta_{0}} \qquad \text{ and } \qquad E({\widehat{\beta}_{1}}) = {\beta_{1}}\]

\[\begin{align} E({\widehat{\beta}_{1}}) &\;=\; E\left(\sum k_iY_i\right) \;=\; \sum E(k_i Y_i) \;=\; \sum k_i E(Y_i) \\ \\ &\;=\; \sum k_i (\beta_0 + \beta_1 x_i) \;=\; \beta_0 \sum k_i + \beta_1 \sum k_i x_i \\ \\ &\;=\; \beta_0 (0) + \beta_1 (1) \;=\; \beta_1 \end{align}\]





\[\begin{align} E({\widehat{\beta}_{0}}) &\;=\; E({\overline{Y}}- {\widehat{\beta}_{1}} {\overline{x}}) \;=\; E({\overline{Y}}) - {\overline{x}}E({\widehat{\beta}_{1}}) \\ \\ &\;=\; (\beta_0 + \beta_1 {\overline{x}}) - {\overline{x}}\beta_1 \;=\; \beta_0 \end{align}\]





Estimator of the population variance: \[\widehat\sigma^2 \;=\; \frac{{\mathrm{SSE}}}{{\mathrm{df}}} \;=\; \frac{e_i^2}{n-2} \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{n-2}\]

(Note: \(\widehat\sigma\) is called “residual standard error” on some R output.)





Standard errors for the estimators:

\[{\mathrm{se}}({\widehat{\beta}_{1}}) \;=\; \frac{\widehat{\sigma}}{\sqrt{\sum(x_i-{\overline{x}})^2}}\]



\[{\mathrm{se}}({\widehat{\beta}_{0}}) \;=\; \widehat{\sigma} \;\sqrt{\frac{1}{n} + \frac{{\overline{x}}^2}{\sum(x_i-{\overline{x}})^2}}\]



T-statistic:

\[\frac{{\widehat{\beta}_{0}} - \beta_0}{{\mathrm{se}}(\,\beta_0\,)} \;\sim\; t_{n-1}\]





Partitioning the sum of squares:

\[\sum (y_i - {\overline{y}})^2 \;=\; \sum (y_i - {\widehat{y}}_i + {\widehat{y}}_i - {\overline{y}})^2 \;=\; \sum (y_i - {\widehat{y}}_i)^2 \;+\; \sum ({\widehat{y}}_i - {\overline{y}})^2\]



SST = \(\sum (y_i - {\overline{y}})^2\) = total sum of squares



SSE = \(\sum (y_i - {\widehat{y}}_i)^2\) = error sum of squares

SSR = \(\sum ({\widehat{y}}_i - {\overline{y}})^2\) = regression sum of squares

So, SST = SSE + SSR





\(R^2\) = proportion of variability in the response (\(Y\))
explained by regression on the predictor (\(x\))



\[R^2 \;=\; \frac{{\mathrm{SSR}}}{{\mathrm{SST}}} \;=\; 1 - \frac{{\mathrm{SSE}}}{{\mathrm{SST}}} \;=\; 1 - \frac{\sum (y_i - {\widehat{y}}_i)^2}{\sum (y_i - {\overline{y}})^2}\]



\[{\rm adjusted\;} R^2 = \frac{{\mathrm{SSR}}/ {\mathrm{SSR}}({\mathrm{df}})}{{\mathrm{SST}}/ {\mathrm{SST}}({\mathrm{df}})} = \frac{\sum ({\widehat{y}}_i - {\overline{y}})^2 / 1}{\sum (y_i - {\overline{y}})^2 / (n-1)}\]





F-statistic

\[F = \frac{[ {\mathrm{SSE}}({\mathrm{RM}}) - {\mathrm{SSE}}({\mathrm{FM}}) ] \;/\; [ {\mathrm{df}}({\mathrm{RM}}) - {\mathrm{df}}({\mathrm{FM}}) ]}{{\mathrm{SSE}}({\mathrm{FM}}) \;/\; {\mathrm{df}}({\mathrm{FM}})}\]



\[F \;=\; \frac{\left[ \sum y_i^2 - \sum (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2 \right] \;/\; [n - (n-2)]}{\sum (y_i - {\widehat{\beta}_{0}} - {\widehat{\beta}_{1}} x_i)^2 \;/\; (n-2)} \;\sim\; F_{2, n-2}\]



Confidence interval for \(\beta_0\) (note: not \({\widehat{\beta}_{0}}\), why?): \[{\widehat{\beta}_{0}} \;\pm\; t^* {\mathrm{se}}({\widehat{\beta}_{0}})\]



\[{\widehat{\beta}_{1}} \;\pm\; t^* {\mathrm{se}}({\widehat{\beta}_{1}})\]



Confidence interval for mean: \[\widehat{\mu}_0 \;\;=\;\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]



\[\widehat{\mu}_0 \;\pm\; t^*\; \widehat{\sigma}\; \sqrt{\frac{1}{n} + \frac{(x_0 - {\overline{x}})^2}{\sum (x_i-{\overline{x}})^2}}\]



Confidence interval for prediction: \[{\widehat{y}}_0 \;=\; {\widehat{\beta}_{0}} + {\widehat{\beta}_{1}} x_0\]



\[{\widehat{y}}_0 \;\pm\; t^*\; \widehat{\sigma}\; \sqrt{1 + \frac{1}{n} + \frac{(x_0 - {\overline{x}})^2}{\sum (x_i-{\overline{x}})^2}}\]



## MLR Model: \[Y \;=\; \beta_0 + \beta_1x_1 + \beta_2x_2 + \ldots \beta_px_p + \epsilon\]



SSE: \[SSE \;=\; \sum_i e_i^2 \;=\; \sum_i (y_i - \widehat{\beta_0} - \widehat{\beta_1} x_{i1} - \widehat{\beta_2} x_{i2} - \cdots + {\widehat{\beta}_p}x_{ip})^2\]



Formula for \({\widehat{\beta}_{0}}\) (others can be written with matrices, you don’t neet do know those formulas): \[\widehat{\beta_0} \;=\; {\overline{y}}- \widehat{\beta_1} {\overline{x}}_1 - \widehat{\beta_2} {\overline{x}}_2 - \cdots - {\widehat{\beta}_p}{\overline{x}}_p \;=\; {\overline{y}}- \sum_{j=1}^p \widehat\beta_j {\overline{x}}_j\]



\[\widehat\sigma^2 \;=\; \frac{{\mathrm{SSE}}}{{\mathrm{df}}} \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{{\mathrm{df}}}\]





degrees of freedom (df)

= sample size \(-\) ### parameters estimated for the mean (\(\beta_0, \beta_1, \beta_2, \ldots, \beta_p\))

= \(n-(p+1) \;=\; (n - p - 1)\)



\[\widehat\sigma^2 \;=\; \frac{\sum_i (y_i - {\widehat{y}}_i)^2}{n - p - 1}\]

1.0.1 Geometric interpretation of regression coefficients

1.0.1.1 One predictor (regression line)

  • The fitted values (\({\widehat{Y}}\)’s) will all lie on the regression line
  • \(\beta_0\) is the intercept of that line
  • \(\beta_1\) is the slope of that line
  • \(Y\) values may be above, below, or on the line within the (\(x,Y\)) plane of the data

1.0.1.2 Two predictors (regression plane)

  • The fitted values (\({\widehat{Y}}\)’s) will all lie on the regression plane
  • \(\beta_0\) is the \(Y\) value with the \(x\) variables both = 0
  • \(\beta_1\) is the slope of the plane along the \(x_1\) direction
  • \(\beta_2\) is the slope of the same plane along the \(x_2\) direction
  • \(Y\) values may be above, below, or on the plane within the full space of the data

Recall: \(\beta_1\) is the effect of \(x_1\) on expected \(Y\)
but only after we have “adjusted for” \(x_2\) in the model.



\[ R^2 = [ {\mathrm{corr}}(y, {\widehat{y}}) ]^2 = \frac{{\mathrm{SSR}}}{{\mathrm{SST}}} = 1 - \frac{{\mathrm{SSE}}}{{\mathrm{SST}}}\]



\[{\rm Adjusted}\; R^2 \;=\; 1 - \frac{{\mathrm{SSE}}\;/\; {\mathrm{SSE}}(df)}{{\mathrm{SST}}\;/\; {\mathrm{SST}}(df)} \;=\; 1 - \frac{{\mathrm{SSE}}\;/\; (n-p-1)}{{\mathrm{SST}}\;/\; (n-1)}\]



\[\frac{{\widehat{\beta}_j}- \beta_j}{{\mathrm{se}}({\widehat{\beta}_j})} \;\sim\; t_{n-p-1}\]



\[F \;=\; \frac{[ {\mathrm{SSE}}({\mathrm{RM}}) - {\mathrm{SSE}}({\mathrm{FM}}) ] \;/\; [ {\mathrm{df}}({\mathrm{RM}}) - {\mathrm{df}}({\mathrm{FM}}) ]}{{\mathrm{SSE}}({\mathrm{FM}}) \;/\; {\mathrm{df}}({\mathrm{FM}})}\]

1.1 Diagnostics

1.1.1 The model assumptions

  1. Assumption about the form of the model
    1. Linearity: The mean of \(Y\) is a linear function of the \(x\)’s
  2. Assumptions about the errors (\(\epsilon\)’s)
    1. Normally distributed (and thus so are the \(Y\)’s)
    2. Mean zero: No systematic mis-prediction
    3. Constant variance \(\sigma^2\) over all values \(X\)’s
    4. Independent
    5. Uncorrelated with predictors (\(x\)’s)
  3. Assumptions about the predictors (\(x\)’s)
    1. Non-random, “fixed”
      • True for many designed experiments
      • For observational studies, inferences are conditional on \(x\)’s
    2. Measured without error
      • Probably not true, but doubtful ever have enough information to assess
    3. Linearly independent
      • No predictor can be expressed as linear combination of others
      • Not collinear (predictor variables not interrelated)
      • Rarely true, but minor collinearity OK
    4. Uncorrelated with errors (\(\epsilon\)’s)
  4. Assumptions about the observations: \(y_i, x_{1i}, x_{2i}, \ldots, x_{pi}\)
    1. Independent (values of observation \(i\) not dependent on values of observation \(j\))
    2. Equally reliable and informative

\[r_i \;=\; \frac{e_i}{\widehat{\sigma}\sqrt{1 - h_{ii}}}\]



\[h_{ii} = \frac{1}{n} + \frac{(x_i -{\overline{x}})^2}{\sum(x_i - {\overline{x}})^2}\]



\[h_{ij} = \frac{1}{n} + \frac{(x_i -{\overline{x}})(x_j-{\overline{x}})}{\sum(x_i - {\overline{x}})^2}\]

The average of the hat values should be about

\[\frac{(p+1)}{n}\]



\[C_i = \frac{\sum_{j=1}^n ({\widehat{y}}_j - {\widehat{y}}_{j(i)})^2}{\widehat{\sigma}^2 (p+1)}\]

1.2 Categorical variables

\[S = \beta_0 + \beta_1 x + \beta_2 e_1 + \beta_3 e_2 + \beta_4 m + \epsilon\]



\[{\rm risk} = \beta_0 + \beta_1 {\rm smoke} + \beta_2 {\rm OC} + \beta_3 ({\rm OC}\times {\rm smoke}) + \epsilon\]