\documentclass[landscape]{seminar} \input epsf.sty \usepackage{epsfig,graphicx,epstopdf} \usepackage{fancyhdr} \usepackage{subfigure} \def\hb{\mbox{$\hat{\beta}$}} \def\P{\mbox{\textrm{P}}} \pagestyle{fancy} \lhead{}\chead{}\rhead{} \lfoot{Note: See Canvas for the examples completed with R} \cfoot{p.\ \thepage} \rfoot{} \renewcommand{\headrulewidth}{0pt} \slideframe{none} \slidesmag{4} % this will make the slides bigger or smaller. % For use with pdflatex -- as in TeXShop % Correct for \slidesmag{4} \AtBeginDocument{% \pdfpageheight=5in% \pdfpagewidth=6.5in% } % \slidesmag{2} % Correct for \slidesmag{2} % % A FIRST ATTEMPT -- NOT YET PERFECT! % \AtBeginDocument{% % \pdfpageheight=7.07in % \pdfpagewidth=9.19in % } % } \renewcommand{\familydefault}{\sfdefault} % Commands for slides \newcommand{\checkoff}{\hspace*{-.2in} {\color{red}$\surd$}\ } \newcommand{\brkslide}[1] { \end{slide} \begin{slide} \begin{center} {\bf #1} \end{center} } \newcommand{\brkslideitem}[1] { \end{itemize} \end{slide} \begin{slide} \begin{center} {\bf #1} \end{center} \begin{itemize} } \newcommand{\brkslidedescr}[1] { \end{description} \end{slide} \begin{slide} \begin{center} {\bf #1} \end{center} \begin{description} } %\input{preamb2ld} \input{preamb1} %\textheight 8.0in \textwidth 6.0in %\oddsidemargin 0.25in %\newcommand{\examp}{{\bf Example:} \ } \newcommand{\boxit}[1] { \mbox{ } \\*[-.5\baselineskip] \centerline{\fbox{ \parbox{4.25in}{#1} }} } \begin{document} \begin{slide} %\vspace{-1in} %\texttt{ } %\vspace{-1in} %\centerline{\includegraphics[height=4.1in]{graphs/regAll.pdf}} %\newpage \flushright{ PBHS 32400 / STAT 22400} % \medskip \begin{center} {\bf A Generalized Approach for Many Model Types} \end{center} \begin{itemize} \item Noting and taking advantage of commonalities among linear models for different response variable types, Nelder and Wedderburn and later McCullagh (UChicago) and Nelder developed {\bf Generalized Linear Models}. \item This approach generalizes many types of models into one framework, unifying theory and estimation methods. \item For each model relating $Y$ to predictors $X$, one specifies \begin{itemize} \item The {\sl link function} $h( \cdot)$, which indicates the relationship between the linear prediction equation and $\E(Y)$; \item The distribution for the error term $\epsilon$ of the model. \end{itemize} \item Then, a unified theory and single estimation approach subsumes a wide variety of models. \newpage \begin{center} {\bf A Generalized Approach for Many Model Types} \end{center} \medskip \item[-] A Few of the Several Types of GLMs:\\ \medskip %\begin{center} {\small \begin{tabular}{cccc} {\bf Response} & {\bf Link Function} & {\bf Error Term} & {\bf Model} \\ \hline Continuous ($\approx$ normal) & identity & normal & linear \\ \hline 0/1 discrete & logit & Binomial & logistic \\ %% \hline polychotomous discrete & logit & multinomial & multinomial logistic \\ \hline Integer counts & natural log & Poisson & Poisson regression \\ %% \hline real valued, non-negative & inverse & Gamma & survival\\ %% \hline \end{tabular}} \newpage \begin{center} {\bf Regression With Normal Response} \end{center} \item Linear regression \item Response: Continuous, $\approx$ Normal ($E(Y|X), \sigma^2$) \item Link = identity = $h(w) = w$ \item Error: Normal (does not depend on predictors) $$h[E(Y|X)] = E(Y|X) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$$ \newpage \begin{center} {\bf Regression With Binary/Bernoulli Response} \end{center} \item Logistic regression \item Response: Binary (0, 1), Bernoulli($p$), \\ $\E(Y)=p$, $\var(Y) = p(1-p)$ \item Link = logit = $\displaystyle{ h(w) = \log\left(\frac{w}{1-w}\right) }$ \item Error: Bernoulli/binomial (and depends on predictors) $$h[E(Y|X)] = \log\left[\frac{E(Y|X)}{1 - E(Y|X)}\right] = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$$ where $E(Y|X) = P(Y=1 | X) = p(X)$\\ ($p$ is a function of predictors) \newpage \begin{center} {\bf Regression With Poisson Response} \end{center} \item Poisson regression, log-linear regression \item Response: Poisson($\lambda$), $\E(Y)=\lambda$, $\var(Y)=\lambda$ \item Link = $\log_e\quad h(w) = \log_e(w)$ \item Error: Poisson (and depends on predictors) $$h[E(Y|X)] = \log_e(E(Y|X)) = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_p x_p$$ where $E(Y|X) = \lambda(X)\quad$ ($\lambda$ is a function of the predictors) %\end{center} \newpage \begin{center} {\bf Poisson Regression} \end{center} \item {\bf Poisson regression} is used to model {\bf count variables} as outcome.\\ The outcome (i.e., the count variable) in a Poisson regression cannot take on negative values (can equal 0).\\ \medskip \item A Poisson regression model is sometimes known as a {\bf log-linear model} (link function is log) and the model takes the form: \[ \log \big(\E(Y|X) \big) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p. \] \textbf{CAUTION:} This is NOT the same as the OLS transformation \[ \E\big( \log(Y|X) \big) = \beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p \] In fact, $\log \big(\E(Y|X) \big) \geq \E\big( \log(Y|X) \big)$, \\ and the latter is OLS using log transformation on $Y$. \newpage \begin{center} {\bf Poisson Regression} \end{center} \item In fact, $\E\big( \log(Y|X) \big) \leq \log \big(\E(Y|X) \big)$ \item The predicted mean of Poisson model is $\E(Y|X) = \exp \big(\beta_0 + \beta_1 X_1 + \ldots + \beta_p X_p \big).$ \item For OLS $\exp\big(\E[\log(Y|X)]\big) \le E(Y|X)$ \\ (bias, we tend to underestestimate the mean) \item For Poisson model (a GLM), \\ we model $\log(E(Y|X))$ and hence $E(Y|X)$ directly. \newpage \item Recall that the mean and variance of Poisson distribution are the same. \item Therefore, a very strong assumption in Poisson regression is -- {\bf conditional on the predictors, the conditional mean and variance of outcome are equal. } \newpage \begin{center} {\bf Poisson Regression} \end{center} \item {\bf Examples of Poisson observations:} \medskip \begin{enumerate} \item The number of persons killed by mule or horse kicks in the Prussian army per year. Ladislaus Bortkiewicz collected data from 20 volumes of Preussischen Statistik. These data were collected on 10 corps of the Prussian army in the late 1800s over 20 years. \medskip \item The number of people in line in front of you at the grocery store. Predictors may include the number of items currently offered at a special discounted price and whether a special event (e.g., a holiday, a big sporting event) is three or fewer days away. Analyses involving queueing frequently involve the Poisson distribution. \medskip \item The number of awards earned by students at one high school. Predictors of the number of awards earned include the type of program in which the student was enrolled (e.g., vocational, general or academic) and the score on a mathematics exam. \end{enumerate} \newpage \begin{center} {\bf Poisson Regression} \end{center} We illustrate Poisson regression using Example 3 above:\\ \begin{itemize} \item \texttt{num}$\_$\texttt{awards} is the outcome variable and indicates the number of awards earned by students at a high school in a given year, \\ \item \texttt{math} is a continuous predictor variable and represents students' scores on their math final exam, and \\ \item \texttt{prog} is a categorical predictor variable with three levels indicating the type of program in which the students were enrolled. \\ \end{itemize} \bigskip For Poisson regression, we assume that the outcome variable number of awards, conditioned on the predictor variables, will have roughly equal mean and variance.\\ \newpage \begin{center} {\bf Poisson Regression - Assumptions} \end{center} Examining the mean numbers of awards by program type and seems to suggest that program type is a good candidate for predicting the number of awards. Additionally, the means and variances within each level of program -- the conditional means and variances -- are similar.\\ {\scriptsize \begin{verbatim} . use http://statistics.uchicago.edu/~collins/data/STAT224other/poisson_sim, clear . tabstat num_awards, by(prog) stats(mean sd n) Summary for variables: num_awards by categories of: prog (type of program) prog | mean sd N ---------+------------------------------ general | .2 .4045199 45 academic | 1 1.278521 105 vocation | .24 .5174506 50 ---------+------------------------------ Total | .63 1.052921 200 ---------------------------------------- . histogram num_awards, discrete freq . qnorm num_awards \end{verbatim}} \begin{figure} \subfigure[Histogram of num awards]{\includegraphics[width=2in]{graphs/poisson2.eps}} \subfigure[QQ plot]{\includegraphics[width=2in]{graphs/poisson.eps}} \end{figure} Can we use OLS here? Normality assumption is violated. Count outcome variables are sometimes log-transformed and analyzed using OLS regression. Some issues arise with this approach, for example, more than half of the data (124 students) have {\bf zero} awards (log is undefined) \newpage \begin{center} {\bf Poisson Regression - Model and Coefficients} \end{center} \item[-] In Stata, with categories for program (general is baseline or reference group) {\scriptsize \begin{verbatim} . poisson num_awards i.prog math Iteration 0: log likelihood = -182.75759 Iteration 1: log likelihood = -182.75225 Iteration 2: log likelihood = -182.75225 Poisson regression Number of obs = 200 LR chi2(3) = 98.22 Prob > chi2 = 0.0000 Log likelihood = -182.75225 Pseudo R2 = 0.2118 ------------------------------------------------------------------------------ num_awards | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | academic | 1.083859 .358253 3.03 0.002 .3816962 1.786022 vocation | .3698092 .4410703 0.84 0.402 -.4946727 1.234291 | math | .0701524 .0105992 6.62 0.000 .0493783 .0909265 _cons | -5.247124 .6584531 -7.97 0.000 -6.537669 -3.95658 ------------------------------------------------------------------------------ \end{verbatim}} \begin{center} {\bf Poisson Regression - Model and Coefficients} \end{center} \item Results ($\beta$s) are increase/decrease in log(E(counts)) on an additive scale. \item To interpret the coefficients, one needs to take $\exp(\beta)$'s and interpret them as the expected relative change in counts per unit of X change.\\ \item To get relative increase in counts per unit of $X$ on a multiplicative scale, use ``irr" (stands for incidence-rate ratio, similar to risk ratio or relative risk): {\scriptsize \begin{verbatim} . poisson num_awards i.prog math, irr Iteration 0: log likelihood = -182.75759 Iteration 1: log likelihood = -182.75225 Iteration 2: log likelihood = -182.75225 Poisson regression Number of obs = 200 LR chi2(3) = 98.22 Prob > chi2 = 0.0000 Log likelihood = -182.75225 Pseudo R2 = 0.2118 ------------------------------------------------------------------------------ num_awards | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- prog | academic | 2.956065 1.059019 3.03 0.002 1.464767 5.965674 vocation | 1.447458 .6384309 0.84 0.402 .6097705 3.435942 | math | 1.072672 .0113695 6.62 0.000 1.050618 1.095188 _cons | .0052626 .0034652 -7.97 0.000 .0014479 .0191284 ------------------------------------------------------------------------------ Note: _cons estimates baseline incidence rate. \end{verbatim}} \newpage \begin{center} {\bf Poisson Regression - Model Fit} \end{center} To help assess the fit of the model, the ``\texttt{estat gof}" command can be used to obtain the {\bf goodness-of-fit} $\chi^2$ test. This is {\bf not} a test of the model coefficients, but rather \underline{a test of the model form}: Does the Poisson model form fit our data? Thus, a large goodness-of-fit p-value indicates the observed and the predicted data are not too different from each other, i.e., a good fit. {\scriptsize \begin{verbatim} . estat gof Deviance goodness-of-fit = 189.4496 Prob > chi2(196) = 0.6182 Pearson goodness-of-fit = 212.1437 Prob > chi2(196) = 0.2040 \end{verbatim}} A statistically significant (small p-value) here would indicate that the model does not fit the data well. In that situation, we may try to determine if there are omitted predictor variables, if our linearity assumption holds and/or if the conditional mean and variance of outcome are very different. \newpage %We plot the fitted model by program.\\ %{\scriptsize %\begin{verbatim} %. gen n=num_awards %. separate n, by(prog) %. predict c %. separate c, by(prog) %. twoway scatter n1 n2 n3 c1 c2 c3 math, connect(. . . l l l) sort % ytitle("Predicted Count") ylabel( ,nogrid) legend(rows(3)) % legend(ring(0) position(10)) scheme(s1mono) %\end{verbatim}} %\centerline{\includegraphics[width=2.8in]{graphs/poisson3.eps}} \newpage \begin{center} {\bf Fitting GLMs} \end{center} An alternative way to fit Poission regression is using the ``glm" function (Stata or R), specifying which ``family" to use. The default is linear regression and ``binomial" is logistic regression (for binary outcome -- Bernoulli is a special case of binomial). {\scriptsize \begin{verbatim} . glm num_awards math i.prog, family(poisson) Iteration 0: log likelihood = -187.46951 Iteration 1: log likelihood = -182.75816 Iteration 2: log likelihood = -182.75225 Iteration 3: log likelihood = -182.75225 Generalized linear models No. of obs = 200 Optimization : ML Residual df = 196 Scale parameter = 1 Deviance = 189.4496199 (1/df) Deviance = .9665797 Pearson = 212.1437315 (1/df) Pearson = 1.082366 Variance function: V(u) = u [Poisson] Link function : g(u) = ln(u) [Log] AIC = 1.867523 Log likelihood = -182.7522516 BIC = -849.0206 ------------------------------------------------------------------------------ | OIM num_awards | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- math | .0701524 .0105992 6.62 0.000 .0493783 .0909265 | prog | academic | 1.083859 .358253 3.03 0.002 .3816961 1.786022 vocation | .3698092 .4410703 0.84 0.402 -.4946727 1.234291 | _cons | -5.247124 .6584531 -7.97 0.000 -6.537669 -3.95658 ------------------------------------------------------------------------------ \end{verbatim}} \item[-] Estimates are same as earlier. Again, the $\beta$'s are in log(counts) on an additive scale. \item[-] In a GLM framework, separate computer modules for logistic, Poisson, etc. would not be needed. \newpage \begin{center} {\bf Summary -- Poisson Regression and GLMs} \end{center} \item Poisson is a useful model for many phenomena, but has a strong theoretical assumption, that conditional mean and variance of the outcome variable are equal. \item When there seems to be an issue of bad fit, we should first check if our model is appropriately specified, such as omitted variables and functional forms. \item The assumption that the conditional variance is equal to the conditional mean should be checked. If not reasonably equal, there are alternative variations on Poisson regression that may work. \newpage \begin{center} {\bf Linear mixed-effects models} \end{center} \item In certain data, a group of individuals are measured repeatedly over a time period. For example, we measure the blood pressures of ICU patients in the day of admission, then day 1, day 2, until dismissal or death. \item Samples can be naturally clustered. E.g., we took the income and education data of 357 individuals from 31 extended families. \item Other examples of clustered data: students in multiple schools/classes/sections, patients from multiple clinics/hospitals, samples processed in multiple experiments, workers at several work sites, people from different cities, etc. \item When analyzing $N$ samples from $k$ clusters, observations are not completely independent. Samples from the same cluster can be more similar to each other, and are correlated. \newpage \begin{center} {\bf Linear mixed-effects models}\end{center} \item Example: Consider a longitudinal dataset used in Ruppert, Wand, and Carroll (2003) and Diggle et al. (2002). We took weight measurements of 48 pigs in 9 successive weeks. Each pig has a unique id. We are interested in the rate of growth of pig weight over the 9 week period. Each pig experiences a linear trend in growth, but each pig has a different starting weight (i.e., intercept). \\ \newpage {\scriptsize \begin{verbatim} . use http://www.stata-press.com/data/r15/pig, clear . twoway scatter weight week, connect(L) \end{verbatim} } \centerline{\includegraphics[width=3in]{./graphs/pigs.png}} \newpage \item You may fit a simple linear regression with weight being the response and week being the predictor. However, the weight measurements for the same pig are correlated with each other, and the error terms are correlated -- violating regression assumptions. %Note the $s.e.(\hat\beta_1)_{SLR}=0.0818409$.\\ {\scriptsize \begin{verbatim} . regress weight week Source | SS df MS Number of obs = 432 -------------+---------------------------------- F(1, 430) = 5757.41 Model | 111060.882 1 111060.882 Prob > F = 0.0000 Residual | 8294.72677 430 19.2900622 R-squared = 0.9305 -------------+---------------------------------- Adj R-squared = 0.9303 Total | 119355.609 431 276.927167 Root MSE = 4.392 ------------------------------------------------------------------------------ weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- week | 6.209896 .0818409 75.88 0.000 6.049038 6.370754 _cons | 19.35561 .4605447 42.03 0.000 18.45041 20.26081 ------------------------------------------------------------------------------ \end{verbatim} } \newpage \item You may also fit a linear regression with each pig having a different intercept. This model implicitly says, other than each pig may start with different baseline weight, the weight growth rate for different pigs is the same. Note that the $\beta_{1}$ estimate is similar to OLS estimate, however, se$(\widehat{\beta}_1)=0.0390633$, much smaller than before. Note that the $\beta_1$ estimate is similar to overall OLS. {\scriptsize \begin{verbatim} . regress weight week i.id Source | SS df MS Number of obs = 432 -------------+---------------------------------- F(48, 383) = 557.83 Model | 117672.435 48 2451.50907 Prob > F = 0.0000 Residual | 1683.17352 383 4.39470894 R-squared = 0.9859 -------------+---------------------------------- Adj R-squared = 0.9841 Total | 119355.609 431 276.927167 Root MSE = 2.0964 ------------------------------------------------------------------------------ weight | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- week | 6.209896 .0390633 158.97 0.000 6.13309 6.286701 | id | 2 | 2.666667 .9882317 2.70 0.007 .7236279 4.609705 3 | -.2777778 .9882317 -0.28 0.779 -2.220816 1.665261 .............. 48 | 8.055556 .9882317 8.15 0.000 6.112517 9.998594 | _cons | 17.61719 .7255684 24.28 0.000 16.19059 19.04378 ------------------------------------------------------------------------------ \end{verbatim} } \item However, we are not really interested in estimating the baseline weights of these particular 48 pigs per se. \\ The intercepts consumed 47 degrees of freedom in the model estimation (47 indicator variables). \item Another idea is to consider those 48 pigs as a random sample from a larger population, and their baseline weights coming from the same distribution with a mean of $\beta_0$, and each pig may have a different baseline weight differing from the population average by $u_j$, and $u_j \sim N(0, \sigma_u^2)$. \item The weight measurements for the same pig are correlated with each other because they are from the same pig, sharing a same baseline. (This is a simplifying assumption that can be relaxed later.) \item In summary, we treat the pigs as a random sample from a larger population and model the between-pig variability as a random effect, and more specifically, as a random-intercept term for each pig. For each pig $j$, the beginning weight may differ from the population average $\beta_0$ by $u_j$, and $u_j \sim N(0, \sigma_u^2)$. \item In comparison with estimating the intercept for each of the 48 pigs, in the mixed-effects model here we only need to estimate one parameter $\sigma_u$, and the estimated $u_j$ can be derived given other parameter estimates. \newpage \begin{center} {\bf Linear mixed-effects models}\end{center} \item We thus wish to fit the model \[ \textrm{weight}_{ij} = \beta_0+\beta_1\textrm{week}_{ij}+u_j+\epsilon_{ij}, \] for $i=1,\ldots,9$ weeks and $j=1,\ldots,48$ pigs. \item Here $\beta_0+\beta_1\textrm{week}_{ij}$ is called the ``fixed-effects" term. The coefficients $\beta_0$ and $\beta_1$ are fixed for all individuals and represent the population average effects, but not individual-specific effects. \item In contrast, the $u_j$ here is a random effect (a random intercept here) that shifting the regression line up or down according to each pig. That is, the random effect occurs as an individual-specific effect, with $u_j\sim N(0, \sigma_u^2)$. \newpage \begin{center} {\bf Linear mixed-effects models} \end{center} \[ \textrm{weight}_{ij} = \beta_0+\beta_1\textrm{week}_{ij}+u_j+\epsilon_{ij}, \] \item The model above contains both fixed-effects (population effects common to all clusters/individuals) and random-effects (individual-specific or cluster-specific effects), and as such is called a linear mixed-effects model. \item Based on the linear mixed-effects model, \[ \textrm{weight}_{ij} \sim N( \beta_0+\beta_1\textrm{week}_{ij}, \sigma^2_u+\sigma_e^2). \] \item The variances add because the model considers the variability in intercepts is not related to the variability of observations around the population trend. \newpage \begin{center} {\bf Linear mixed-effects models} \end{center} \item In Stata, you may use the function ``mixed". In R, you may load the R library ``lme4". {\scriptsize \begin{verbatim} . mixed weight week || id: Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -1014.9268 Iteration 1: log likelihood = -1014.9268 Computing standard errors: Mixed-effects ML regression Number of obs = 432 Group variable: id Number of groups = 48 Obs per group: min = 9 avg = 9.0 max = 9 Wald chi2(1) = 25337.49 Log likelihood = -1014.9268 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- week | 6.209896 .0390124 159.18 0.000 6.133433 6.286359 _cons | 19.35561 .5974059 32.40 0.000 18.18472 20.52651 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Identity | var(_cons) | 14.81751 3.124226 9.801716 22.40002 -----------------------------+------------------------------------------------ var(Residual) | 4.383264 .3163348 3.805112 5.04926 ------------------------------------------------------------------------------ LR test vs. linear model: chibar2(01) = 472.65 Prob >= chibar2 = 0.0000 \end{verbatim} } \item In the mixed-effects model here, $s.e.(\hat\beta_1)_{mixed}=0.0390124$. \\ Very similar to the SE when fitting model with separate intercept for each pig. So, correlations among samples are accounted for. \newpage \item Now, look at the scatter plot again. \centerline{\includegraphics[width=3in]{./graphs/pigs.png}} \item It also seems like that there is a general average growth rate for the entire pig population ($\beta_1$) but individual pigs may have slightly faster or slower rates of growth. \item Instead of fitting 48 interaction effects and estimating 48 slopes ($\beta_{1j}$) for individual pigs. We may think that $\beta_{1j} = \beta_{1}+ u_{1j}$ with $u_{1j}\sim N(0, \sigma_{u_1}^2)$. We can fit the following model with both random intercepts and random-effects of week on weight: \[ \textrm{weight}_{ij} = \beta_0+u_{0j}+\beta_1\textrm{week}_{ij}+u_{1j}\textrm{week}_{ij}+\epsilon_{ij}, \] {\scriptsize \begin{verbatim} . mixed weight week || id: week Performing EM optimization: Performing gradient-based optimization: Iteration 0: log likelihood = -869.03825 Iteration 1: log likelihood = -869.03825 Computing standard errors: Mixed-effects ML regression Number of obs = 432 Group variable: id Number of groups = 48 Obs per group: min = 9 avg = 9.0 max = 9 Wald chi2(1) = 4689.51 Log likelihood = -869.03825 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ weight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- week | 6.209896 .0906819 68.48 0.000 6.032163 6.387629 _cons | 19.35561 .3979159 48.64 0.000 18.57571 20.13551 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ id: Independent | var(week) | .3680668 .0801181 .2402389 .5639103 var(_cons) | 6.756364 1.543503 4.317721 10.57235 -----------------------------+------------------------------------------------ var(Residual) | 1.598811 .1233988 1.374358 1.85992 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(2) = 764.42 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference. \end{verbatim} } \item Both random effects (random intercepts and random slopes) are statistically significant in the above model. \newpage \begin{center} {\bf Summary -- Linear mixed-effects models}\end{center} \item When data are clustered and samples within each clusters are correlated, OLS would not provide the right s.e. estimates nor valid inferences. One may consider using linear mixed-effects models. \item The linear mixed-effects model estimates the population average ``fixed-effects" and the individual-specific ``random-effects." \item The random effects are the individual-specific effects that are uncorrelated with the predictors. The fixed effect are the population average effects of the predictor variables. \item In estimating the parameters, no closed form solutions exist and the Expectation-Maximization (EM) algorithm is commonly used. \newpage \begin{center} {\bf Summary -- Beyond Linear Regression} \end{center} \item There are additional models for other data types - studied in courses such as: Statistics Department: \begin{itemize} \item STAT 226: Categorical Data Analysis - all types of discrete outcome variables \item STAT 347: Generalized Linear Models - all GLMs \end{itemize} Public Health Sciences Department: \begin{itemize} \item STAT 227: Biostatistical Methods - logistic, Poisson, hazard (time to event data) \item STAT 356: Applied Survival Analysis - time to event data \item STAT 369: Applied Longitudinal Data Analysis - mixed effects model for data with repeated measures or clustered data \end{itemize} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \end{itemize} \end{slide} \end{document} %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% %--------------------------------------------------------------------------------------------------------------------------------------------------------------- %---------------------------------------------------------------------------------------------------------------------------------------------------------------- old junk For example, negative binomial nbreg num_awards math i.prog \newpage \begin{center}{\bf Generalized linear models} \end{center} More generally, we can say that the mean of $Y$, $\E(Y)$, depends linearly on $X$ through some function $f$:\\ \[ f(E(Y )) = \beta_0 + \beta_1X_1 + \ldots + \beta_pX_p \] The function $f$ is called the {\bf link function}.\\ Just specifying the mean is not enough, and we must also specify the probability distribution that $Y$ follows.\\ \bigskip There are several commonly used link functions:\\ \begin{itemize} \item $f(t) = t$. This is the {\bf identity} link. When $Y$ has a normal distribution, this leads to OLS regression.\\ \newpage \item $f(t) = \log(t)$. This is the {\bf log} link. When $Y$ following a Poisson distribution, this leads to Poisson regression. Recall that it is often used for counts (non-negative integer response). \medskip \item $f(t) = \log(t/(1-t))$ is called the {\bf logit} link. When $Y$ has a Bernoulli distribution (binary response), this leads to logistic regression.\\ \end{itemize} \bigskip Stata has good facilities for fitting generalized linear models. All basic ideas that we have developed for OLS regression can be applied when fitting generalized linear models. Check the command ``glm". newpage \begin{center} {\bf Example: The Challenger Disaster}\end{center} \bigskip The space shuttle Challenger was destroyed on January 28, 1986 during launch as a result of a catastrophic failure of O-rings intended to keep burning rocket file from escaping from joints in the rocket booster.\\ \medskip During the investigation, physicist Richard Feynman discovered the answer: the air temperature at the time of launch was near freezing, and the O-ring material did not expand well at low temperatures. Feynman illustrated the effect cold weather would have by dipping O-ring material into a beaker of ice water.\\ \centerline{\includegraphics[scale=0.5]{o_ring_va.jpg}} \centerline{\includegraphics[scale=0.5]{nasa472.jpg}} \newpage Could this have been predicted?\\ There are six O-rings on the shuttle that could fail, and after 23 previous shuttle flights, the O-rings had been inspected for damage. The results of these inspections are given. If the probability of an O-ring being damaged were constant, we could model this as a binomial random variable: \[Y_i= Bin(6, \pi)\] We could estimate this probability of ring failure by $\hat{\pi}= Y_i/n$ for each launch $i$. But we are interested in whether this probability is a function of temperature, so a simple model would be: \[ \hat{\pi} = \hb_0 + \hb_1 \textrm{temp} \] But this linear regression model has obvious shortcomings: $\hat{\pi}$ can go beyond 0 and 1 and its variance depends on true $\pi$. \newpage What's wrong with least squares method? \bigskip {\scriptsize \begin{verbatim} . gen total = 6 . gen failpct = damaged/total . regress failpct temp Source | SS df MS Number of obs = 23 -------------+------------------------------ F( 1, 21) = 7.43 Model | .068783265 1 .068783265 Prob > F = 0.0127 Residual | .194501775 21 .009261989 R-squared = 0.2613 -------------+------------------------------ Adj R-squared = 0.2261 Total | .26328504 22 .011967502 Root MSE = .09624 ------------------------------------------------------------------------------ failpct | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.0079233 .0029075 -2.73 0.013 -.0139697 -.0018769 _cons | .6164021 .2032521 3.03 0.006 .1937163 1.039088 ------------------------------------------------------------------------------ . scatter failpct yhat temp, s(o i) c(. l) \end{verbatim}} \centerline{\includegraphics[height=2in]{graphs/orings1.eps}} \bigskip Note that for temperature $> 78$ F, the fitted values are negative! \newpage Now, we introduce two ways to analyze the data using logistic regression:\\ \bigskip \item One way is using ordinary logistic regression, but recode the response variable ``damage" to a 0-1 variable, indicating if there is any damage happened to the launch $i$.\\ \bigskip In fact, you don't even need to recode a new variable. If your response have values other than zero, and you specify that you want to use logistic regression, Stata will automatically treat those other values as 1.\\ {\scriptsize \begin{verbatim} . logit damaged temp Iteration 0: log likelihood = -14.133576 Iteration 1: log likelihood = -10.302864 Iteration 2: log likelihood = -10.157825 Iteration 3: log likelihood = -10.157596 Iteration 4: log likelihood = -10.157596 Logistic regression Number of obs = 23 LR chi2(1) = 7.95 Prob > chi2 = 0.0048 Log likelihood = -10.157596 Pseudo R2 = 0.2813 ------------------------------------------------------------------------------ damaged | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.2321627 .1082365 -2.14 0.032 -.4443024 -.0200231 _cons | 15.0429 7.378636 2.04 0.041 .5810401 29.50476 ------------------------------------------------------------------------------ . gen newdam=1 . replace newdam=damaged if damaged ==0 (16 real changes made) . logit newdam temp Iteration 0: log likelihood = -14.133576 Iteration 1: log likelihood = -10.302864 Iteration 2: log likelihood = -10.157825 Iteration 3: log likelihood = -10.157596 Iteration 4: log likelihood = -10.157596 Logistic regression Number of obs = 23 LR chi2(1) = 7.95 Prob > chi2 = 0.0048 Log likelihood = -10.157596 Pseudo R2 = 0.2813 ------------------------------------------------------------------------------ newdam | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.2321627 .1082365 -2.14 0.032 -.4443024 -.0200231 _cons | 15.0429 7.378636 2.04 0.041 .5810401 29.50476 ------------------------------------------------------------------------------ . predict yhat . scatter failpct yhat temp, s(o i) c(. l) \end{verbatim}} \centerline{\includegraphics[height=1.5in]{graphs/orings2.eps}} \item You get the same result using \texttt{damage} or the 0-1 variable \texttt{newdam} as response variable. \item The fitted values are all between 0 and 1. \item The fitted value is not linear in the predictor! They are linear in the logit scale. \item Note that the predicted probability is the probability that getting at least one damage. \newpage \item Another way to analyze this data is using {\bf logistic regression for grouped data}:\\ The basic idea is that for launch $i$ that have \texttt{damaged=2}. In fact it is the data $Y_{ik}=\{1,1,0,0,0,0\}$, where $i$ is the index for launch and $k$ is the index for the 6 O-rings. I.e., there are two o-rings out of six damaged in this launch.\\ \bigskip Logistic regression for grouped data expands the data into a longer dataset that specifies the damage for each O-ring in each launch. \\ \bigskip Therefore, the number of observations in logistic regression for grouped data is (\# of launch)$\times$(\# of O-rings) = $23 \times 6 = 138$.\\ \bigskip The Stata command is ``blogit". \bigskip {\scriptsize \begin{verbatim} . blogit damaged total temp Logistic regression for grouped data Number of obs = 138 LR chi2(1) = 6.14 Prob > chi2 = 0.0132 Log likelihood = -30.19817 Pseudo R2 = 0.0923 ------------------------------------------------------------------------------ _outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.1156012 .0470238 -2.46 0.014 -.2077662 -.0234361 _cons | 5.084977 3.052486 1.67 0.096 -.8977845 11.06774 ------------------------------------------------------------------------------ . logit, or Logistic regression for grouped data Number of obs = 138 LR chi2(1) = 6.14 Prob > chi2 = 0.0132 Log likelihood = -30.19817 Pseudo R2 = 0.0923 ------------------------------------------------------------------------------ | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | .8908304 .0418903 -2.46 0.014 .8123969 .9768364 ------------------------------------------------------------------------------ . predict phat, p . sort temp . scatter failpct phat temp, s(o i) c(. l) \end{verbatim}} \centerline{\includegraphics[height=2in]{graphs/orings2.eps}} The fitted values are all between 0 and 1.\\ \newpage \item We can see how different the two analyses are.\\ \bigskip If the probability of damage to one O-ring is $\pi$ and if being damaged is independent across O-rings, then the probability that at least one of six O-rings will incur damage is: \[ \P(Y>0) = 1 -P(Y = 0)^6 = 1 -(1-\pi)^6 \] We calculate this probability and compare it with the estimated probability using ordinary logistic regression.\\ \bigskip {\scriptsize \begin{verbatim} . gen patleastone=1-(1-phat)^6 . scatter failpct phat yhat patleastone temp, s(o i i i) c(. l l l) legend(order(1 "failpct" 2 "P(damage) grouped") 3 "P(damage>1) logit" 4 "P(damage>1) grouped" ) \end{verbatim}} \centerline{\includegraphics[height=2in]{graphs/orings4.eps}} The difference in yellow and green fitted curve is due to the fact that O-rings are not completely independent. In that sense, if your goal is to estimate the probability of at least one damage, then better directly using ordinary logistic regression. Sure! \newpage \item Using your knowledge in the last lecture, you can also fit the logistic model using ``glm" function:\\ {\scriptsize \begin{verbatim} . glm newdam temp, family(binomial) Iteration 0: log likelihood = -10.290949 Iteration 1: log likelihood = -10.157988 Iteration 2: log likelihood = -10.157596 Iteration 3: log likelihood = -10.157596 Generalized linear models No. of obs = 23 Optimization : ML Residual df = 21 Scale parameter = 1 Deviance = 20.31519269 (1/df) Deviance = .9673901 Pearson = 23.16908325 (1/df) Pearson = 1.10329 Variance function: V(u) = u*(1-u) [Bernoulli] Link function : g(u) = ln(u/(1-u)) [Logit] AIC = 1.057182 Log likelihood = -10.15759634 BIC = -45.53019 ------------------------------------------------------------------------------ | OIM newdam | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.2321627 .1082365 -2.14 0.032 -.4443024 -.0200231 _cons | 15.0429 7.378636 2.04 0.041 .58104 29.50476 ------------------------------------------------------------------------------ . logit newdam temp Iteration 0: log likelihood = -14.133576 Iteration 1: log likelihood = -10.302864 Iteration 2: log likelihood = -10.157825 Iteration 3: log likelihood = -10.157596 Iteration 4: log likelihood = -10.157596 Logistic regression Number of obs = 23 LR chi2(1) = 7.95 Prob > chi2 = 0.0048 Log likelihood = -10.157596 Pseudo R2 = 0.2813 ------------------------------------------------------------------------------ newdam | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- temp | -.2321627 .1082365 -2.14 0.032 -.4443024 -.0200231 _cons | 15.0429 7.378636 2.04 0.041 .5810401 29.50476 ------------------------------------------------------------------------------ \end{verbatim}} Same results! Different fitting/estimation algorithm. \newpage \begin{center} {\bf Interpretation of Results from Logistic Regression} \end{center} \bigskip To discuss interpretation of coefficients in logistic regression, let us look at a simple logistic regression with one covariate only:\\ \[ \logit(\pi)=\log \left(\frac{\pi}{1-\pi}\right)= \log \left(\frac{\P(Y=1)}{1-\P(Y=1)} \right)= \beta_0 + \beta_1X_1. \] \bigskip The quantity \[ \frac{\pi}{1-\pi}= \frac{\P(Y=1)}{1-\P(Y=1)} \] is also called the odds, because it represents the odds of something happening versus not happening. Therefore, $\logit(\pi)$ is also often referred to as the log-odds.\\ \newpage Suppose that we have estimated the coefficients in the above model and that the fitted model is: \[ \widehat{\logit(\pi)}=\logit(\hat{\pi})= \hb_0+\hb_1X_1 \] \bigskip What is the interpretation of $\hb_0$ and $\hb_1$? Note that \[ \widehat{\textrm{odds}}=\frac{\hat{\pi}}{1-\hat{\pi}}= \exp(\hb_0+\hb_1X_1) \] That means that $e^{\hb_0}$ is the estimated odds of success for people with $X_{1i} = 0$.\\ On the other hand, $e^{\hb_1}$ measures the estimated number of times the odds increase when $X_1$ increases by 1 unit. In other words, if $X_1$ increases by one unit, we expect the odds to increase $e^{\hb_1}$ times, or $(1-e^{\hb_1})\times 100$ percent.\\ \newpage Observe that the odds ratio for a person with $X = x + 1$ versus a person with $X = x$ is simply then $e^{\hb_1}$, thus a common name for $\hb_1$ is the {\bf log odds ratio}.\\ \bigskip Note that if $\hb_1$ is small, then $e^{\hb_1}\approx 1+ \hb_1$ and thus we can say that the odds approximately increase (if $e^{\hb_1}$ is positive; decrease if negative) by $\hb_1\times 100$ percent.\\ \bigskip Note that if ${\hb_1}$ is negative, then we say that $X$ has a negative effect, because the increase in $X$ in fact lowers the expected odds and consequently the probability of $Y =1$ ($\hb_1 < 0 \Rightarrow e^{\hb_1}< 1)$; however, when $\hb_1> 0$, we have that $e^{\hb_1}>1$ and we say that $X$ has a positive effect. \newpage \begin{center} {\bf Multiple Logistic regression} \end{center} \bigskip Consider a multiple logistic regression case with $p$ variables now: \[ \logit(\hat{\pi})=\log \left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)= \hb_0 + \hb_1X_1+\ldots + \hb_pX_p. \] \bigskip How do we interpret $\hb_j$? Analogous to the multiple linear regression, this parameter is interpreted as the partial log-odds ratio. It is the log-odds-ratio for the group with $X_j = x_j+1$ versus group with $X_j = x_j$, holding all other $X_{j^\prime}$s constant. In other words, when comparing the odds of two groups that differ only by 1 unit of $X_j$, $\beta_j$ is the log of the odds ratio between those two groups. %\includegraphics[height=3in]{graphs/.pdf} \newpage \item How about if interactions are present: \[ \logit(\hat{\pi})=\log \left(\frac{\hat{\pi}}{1-\hat{\pi}}\right)= \hb_0 + \hb_1X_1+ \hb_2X_2 + \hb_3X_1X_2. \] Then we would say that the log-odds change by $\hb_1 + \hb_3X_2$ when $X_1$ increases by 1 unit (holding $X_2$ and other predictors constant), or alternatively that the odds change $\exp(\hb_1 + \hb_3X_2) = \exp(\hb_1)\exp(\hb_3X_2)$ times.\\ \medskip If $\hb_1 + \hb_3X_2$ is small, we can say that the odds change by $(\hb_1 + \hb_3X_2)\times 100$ percent with each 1-unit increase in $X_1$. \bigskip \item If you type in ``help logit" in Stata, you will see the help file for logistic regression. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \newpage \begin{center} {\bf Logistic Regression - Example of further analyses} \end{center} \medskip \item Glioblastoma Multiforme (GBM, the data ``GBM$\_$logistic.dta" is on Canvas Files -$>$ data$\_$in$\_$lectures): A randomized clinical trial (RTOG 0825) was conducted comparing standard treatment (radiation + chemotherapy) to this treatment plus bevacizumab (Avastin) with the goal of improving two endpoints: \begin{itemize} \item progression-free survival (time remaining free of disease progression or death) and \item survival time from diagnosis \end{itemize} \item While prognosis is poor for most patients, important new tumor molecular markers that are strongly associated with prognosis and possibly treatment response are being discovered. One of these ( O(6)-methylguanine-DNA methyltransferase (MGMT) inactivation via methylation) was also examined (Gilbert et al {\it NEJM} 2014) %\item[-] Early Stage Breast Cancer: In a large randomized trial, women with early stage hormone responsive breast cancer received either tamoxifen or placebo after surgery. In addition to the treatment effect on recurrence and death, there was interest in whether factors such as body mass index (a measure of obesity) are associated with tamoxifen and tamoxifen efficacy \newpage \begin{center} {\bf Logistic Regression} \end{center} \item[-] GBM data: {\scriptsize \begin{verbatim} .* TRT effect = exposed here is trt = 1 (bev), use cohort with MGMT . cc survival trt if cohort==1 Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 210 193 | 403 0.5211 Controls | 97 108 | 205 0.4732 -----------------+------------------------+------------------------ Total | 307 301 | 608 0.5049 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 1.211474 | .8530073 1.721077 (exact) Attr. frac. ex. | .1745591 | -.1723229 .4189684 (exact) Attr. frac. pop | .0909613 | +------------------------------------------------- chi2(1) = 1.25 Pr>chi2 = 0.2639 .* MGMT effect - exposed here is bad marker value (=1) . cc survival mgmt_grp if cohort==1 Proportion | Exposed Unexposed | Total Exposed -----------------+------------------------+------------------------ Cases | 321 82 | 403 0.7965 Controls | 115 90 | 205 0.5610 -----------------+------------------------+------------------------ Total | 436 172 | 608 0.7171 | | | Point estimate | [95% Conf. Interval] |------------------------+------------------------ Odds ratio | 3.063627 | 2.085235 4.496513 (exact) Attr. frac. ex. | .6735895 | .5204378 .7776055 (exact) +------------------------------------------------- chi2(1) = 37.16 Pr>chi2 = 0.0000 \end{verbatim}} \item[-] No treatment benefit for survival (trt/control OR = 1.21, indicating 21\% {\it greater odds} of failure on Avastin, although not significantly different from 1.0). MGMT-positive patients have 3-fold greater risk of death. In GBM, this marker represents significant heterogeneity in prognosis \newpage \begin{center} {\bf Logistic Regression} \end{center} \item Examine treatment, MGMT, and RPA class (a baseline prognostic class indicator) together via logistic regression {\scriptsize \begin{verbatim} . logit survival trt mgmt_grp rpa_class, or Iteration 0: log likelihood = -388.59785 Iteration 1: log likelihood = -361.36406 . . . Iteration 4: log likelihood = -361.10958 Logistic regression Number of obs = 608 LR chi2(3) = 54.98 Prob > chi2 = 0.0000 Log likelihood = -361.10958 Pseudo R2 = 0.0707 ------------------------------------------------------------------------------ survival | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- trt | 1.253899 .2259569 1.26 0.209 .8807916 1.785057 mgmt_grp | 3.354802 .6471911 6.27 0.000 2.298569 4.896394 rpa_class | 1.967377 .3278095 4.06 0.000 1.419248 2.727201 _cons | .0507661 .0358217 -4.22 0.000 .0127335 .2023957 ------------------------------------------------------------------------------ \end{verbatim}} \newpage \item[-] Predict probabilities and assess {\scriptsize \begin{verbatim} . predict p . list trt mgmt_grp rpa_class survival p in 20/40, clean trt mgmt_grp rpa_cl~s survival p 20. 0 0 4 0 .4319939 21. 0 1 4 1 .7184269 22. 1 0 5 0 .6523168 23. 0 0 4 0 .4319939 24. 1 1 5 1 .8629051 25. 0 1 4 1 .7184269 .... 34. 1 1 4 1 .7618647 35. 1 0 4 1 .4881366 36. 1 1 4 1 .7618647 37. 0 1 4 1 .7184269 38. 1 0 5 0 .6523168 39. 0 1 5 1 .8338792 40. 1 1 5 1 .8629051 \end{verbatim}} \newpage \item[-] A summary function is provided in Stata {\small \begin{verbatim} . estat classification Logistic model for survival -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 340 131 | 471 - | 63 74 | 137 -----------+--------------------------+----------- Total | 403 205 | 608 Classified + if predicted Pr(D) >= .5 True D defined as survival != 0 -------------------------------------------------- Sensitivity Pr( +| D) 84.37% Specificity Pr( -|~D) 36.10% Positive predictive value Pr( D| +) 72.19% Negative predictive value Pr(~D| -) 54.01% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 63.90% False - rate for true D Pr( -| D) 15.63% False + rate for classified + Pr(~D| +) 27.81% False - rate for classified - Pr( D| -) 45.99% -------------------------------------------------- Correctly classified 68.09% -------------------------------------------------- \end{verbatim}} \item[-] This rule uses 0.5 as a classification probability. What about other values? \newpage {\scriptsize \begin{verbatim} . tab p Pr(survival | ) | Freq. Percent Cum. ------------+----------------------------------- .2788 | 9 1.48 1.48 .3264767 | 11 1.81 3.29 .4319939 | 56 9.21 12.50 .4881366 | 61 10.03 22.53 ................. .7184269 | 141 23.19 61.84 .7618647 | 161 26.48 88.32 .8338792 | 40 6.58 94.90 .8629051 | 31 5.10 100.00 ------------+----------------------------------- Total | 608 100.00 . estat classification, cutoff(.667) Logistic model for survival -------- True -------- Classified | D ~D | Total -----------+--------------------------+----------- + | 284 89 | 373 - | 119 116 | 235 -----------+--------------------------+----------- Total | 403 205 | 608 Classified + if predicted Pr(D) >= .667 True D defined as survival != 0 -------------------------------------------------- Sensitivity Pr( +| D) 70.47% Specificity Pr( -|~D) 56.59% Positive predictive value Pr( D| +) 76.14% Negative predictive value Pr(~D| -) 49.36% -------------------------------------------------- False + rate for true ~D Pr( +|~D) 43.41% False - rate for true D Pr( -| D) 29.53% False + rate for classified + Pr(~D| +) 23.86% False - rate for classified - Pr( D| -) 50.64% -------------------------------------------------- Correctly classified 65.79% -------------------------------------------------- \end{verbatim}} \item[-] This rule is better on some measures, worse on others. To look at all probability cutoff values at once, use ROC curve \newpage \begin{center} {\bf Logistic Regression - ROC Curve for Prediction} \end{center} \item[-] Receiver Operating Characteristic (ROC) curve: plot of Sensitivity ($\Pr(\mbox{predict +} | \mbox{case is +})$ vs. False Positive Rate ($\Pr(\mbox{predict +} | \mbox{case is -})$) for all possible probability cut points.\\ {\scriptsize \begin{verbatim} . lroc Logistic model for survival number of observations = 608 area under ROC curve = 0.6756 \end{verbatim}} \centerline{\includegraphics[height=2.25in]{graphs/ROC_GBM.pdf}} \newpage \begin{center} {\bf Logistic Regression - ROC Curve for Prediction} \end{center} \item[-] Area under curve is the metric of interest. Perfect prediction has area = 1.0. Area under `Random Guessing' line = 0.50. Area can be thought of as equalling, if given two cases (event and non-event), the probability of correctly classifying one as event and other and non-event \item[-] Point closest to upper left corner is the best classifier (best trade-off between sensitivity and specificity) \item[-] This model has area = 0.68, best probability cutpoint near sensitivity = 75\%, specificity = 50\%. Can examine ordered list of predicted probabilities to find (\texttt{lroc} in Stata will not list plot values) \item A related option \texttt{lsens} in Stata will provide the sensitivity/specificity values at each model predicted probability. \newpage \begin{center} {\bf Logistic Regression - ROC Curve for Prediction} \end{center} {\scriptsize \begin{verbatim} . lsens, gense(sensval) gensp(specval) . list p sensval specval in 1/12, clean p sensval specval 1. .8629051 0.000000 1.000000 2. .8338792 0.071960 0.990244 3. .7618647 0.156328 0.960976 4. .7184269 0.459057 0.770732 5. .6523168 0.704715 0.565854 6. .6192176 0.732010 0.531707 7. .5994036 0.766749 0.478049 8. .5646291 0.786600 0.434146 9. .4881366 0.843672 0.360976 10. .4319939 0.918114 0.209756 11. .3264767 0.985112 0.068293 12. .2788 0.995037 0.034146 \end{verbatim}} \item[-] These are the points on the ROC curve. A plot is also produced \centerline{\includegraphics[height=2.25in]{graphs/sens_spec.pdf}} \item[-] IMPORTANT NOTE: Optimizing cut point on a given dataset does not establish that classifier will work on novel data - requires validation in independent samples \newpage \begin{center} {\bf Logistic Regression - Goodness of Fit Test} \end{center} \item The HL test can be obtained as follows: {\scriptsize \begin{verbatim} . estat gof, group(10) table Logistic model for survival, goodness-of-fit test (Table collapsed on quantiles of estimated probabilities) (There are only 7 distinct quantiles because of ties) +--------------------------------------------------------+ | Group | Prob | Obs_1 | Exp_1 | Obs_0 | Exp_0 | Total | |-------+--------+-------+-------+-------+-------+-------| | 1 | 0.4320 | 33 | 30.3 | 43 | 45.7 | 76 | | 2 | 0.4881 | 30 | 29.8 | 31 | 31.2 | 61 | | 3 | 0.5994 | 31 | 31.6 | 24 | 23.4 | 55 | | 6 | 0.7184 | 124 | 128.5 | 60 | 55.5 | 184 | | 8 | 0.7619 | 122 | 122.7 | 39 | 38.3 | 161 | |-------+--------+-------+-------+-------+-------+-------| | 9 | 0.8339 | 34 | 33.4 | 6 | 6.6 | 40 | | 10 | 0.8629 | 29 | 26.8 | 2 | 4.2 | 31 | +--------------------------------------------------------+ number of observations = 608 number of groups = 7 Hosmer-Lemeshow chi2(5) = 2.43 Prob > chi2 = 0.7863 \end{verbatim}}