1 Chapter 2: September 26

2 Chapter 2: September 28

3 Chapter 2: October 3

4 Chapter 3: October 5

5 Chapter 3: October 10

6 Chapter 4: October 12

7 Chapter 4: October 17

8 Chapter 5: October 19

9 Confounding: October 24

10 Midterm Exam: October 26

11 Chapter 6: October 31

12 Chapter 6: November 2

13 Chapter 7: November 2

14 Chapter 8: November 7

15 Chapter 8: November 9

16 Chapter 9: November 9

17 Chapter 11: November 14

18 Chapter 12: November 16

First, I loaded some R packages needed for the code in this document. Click the Code button to see my R code.

library(GGally)
library(gridExtra)
library(knitr)
library(mosaic)
library(tidyverse)
library(ggformula)
library(broom)
# library(modelr)
# library(car)

18.1 See external slides and handout first

See Canvas for the introductory slides and a handout of handwritten mathematical work.

Examples there are shown in Stata.

This document continues from there, working the same examples in R.

18.2 Logistic Regression

18.2.1 Example: Hospital Admissions and ICU Survival

First few observations:

icuData <- read.csv("http://statistics.uchicago.edu/~collins/data/STAT224other/icu.csv")
glimpse(icuData)
head(icuData)
icuData <- read.csv("./data/icu.csv")
glimpse(icuData)
Observations: 200
Variables: 5
$ sta <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ age <int> 27, 59, 77, 54, 87, 69, 63, 30, 35, 70, 55, 48, 66, 61, 66, 52,...
$ inf <int> 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, ...
$ hra <int> 88, 80, 70, 103, 154, 132, 66, 110, 60, 103, 86, 100, 80, 99, 9...
$ typ <int> 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, ...
head(icuData)
sta age inf hra typ
0 27 1 88 1
0 59 0 80 1
0 77 0 70 0
0 54 1 103 1
0 87 1 154 1
0 69 1 132 1

The 2x2 table of icu survival status (sta) vs. predictor (hospital admission type, typ)

tally(sta ~ typ, data=icuData, margins=TRUE)
       typ
sta       0   1
  0      51 109
  1       2  38
  Total  53 147

18.2.2 Logistic regression model fit

This is a generalized linear model (glm).

The “link function” is the logit.
The errors have a Bernoulli = binomial(\(n=1\)) distribution.

R syntax:
glmfit <- glm(y ~ x, data = mydata, family = "binomial") tidy(glmfit, conf.int = TRUE)

glmfit <- glm(sta ~ typ, data = icuData, family = "binomial")
tidy(glmfit, conf.int = TRUE)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) -3.239 0.7206 -4.495 0.0000 -5.0492 -2.071
typ 2.185 0.7448 2.934 0.0034 0.9487 4.025

Transform to the odds ratio scale: \(e^{{\widehat{\beta}_1}}\)

b1 <- coef(glmfit)[2]
exp(b1)
 typ 
8.89 

This should match the odds ratio right from the 2x2 table.

theTable <- tally(sta ~ typ, data = icuData)
oddsRatio(theTable)
[1] 0.1125

Hmmmm…. The odds ratio does not match.
After looking at the documentation for oddsRatio in the mosaic package,
I can see it calculates OR = (odds row 1) / (odds row 2).

That’s odds of elective admission among those who survied
divided by odds of elective admission among those who died.

Instead, we wanted odds of death among those with emergency admission
divided by odds of death among those with elective admission.

A little care shows that what we want is the reciprocal of the OR that R provided.

theTable <- tally(sta ~ typ, data = icuData)
1 / oddsRatio(theTable)
[1] 8.89

Now, the OR from the table matches the OR from the model, as expected.

Another way to do this is to swap the order of the response and predictor variables and get the “successes” (deaths) into the firs column as suggested in the documentation.

theTable <- tally(typ ~ -sta, data = icuData)
theTable
   -sta
typ  -1   0
  0   2  51
  1  38 109
oddsRatio(theTable)
[1] 8.89
summary(oddsRatio(theTable))
Odds Ratio

Proportions
       Prop. 1:  0.03774 
       Prop. 2:  0.2585 
     Rel. Risk:  6.85 

Odds
        Odds 1:  0.03922 
        Odds 2:  0.3486 
    Odds Ratio:  8.89 

95 percent confidence interval:
     1.712 < RR < 27.42 
     2.064 < OR < 38.29 

18.2.3 Example continued: Add another predictor

glmfit2 <- glm(sta ~ age + typ, data = icuData, family = "binomial")
tidy(glmfit2, conf.int=TRUE)
term estimate std.error statistic p.value conf.low conf.high
(Intercept) -5.509 1.0335 -5.330 0.0000 -7.7942 -3.6590
age 0.034 0.0107 3.181 0.0015 0.0141 0.0564
typ 2.454 0.7526 3.260 0.0011 1.1984 4.3029

The coefficients on the odds ratio scale:

coef(glmfit2)
(Intercept)         age         typ 
   -5.50876     0.03402     2.45354 
exp( coef(glmfit2) )
(Intercept)         age         typ 
   0.004051    1.034601   11.629386