First, I loaded some R packages needed for the code in this document. Click the Code
button to see my R code.
library(GGally)
library(gridExtra)
library(knitr)
library(mosaic)
library(tidyverse)
library(ggformula)
library(broom)
# library(modelr)
# library(car)
See Canvas for the introductory slides and a handout of handwritten mathematical work.
Examples there are shown in Stata.
This document continues from there, working the same examples in R.
First few observations:
icuData <- read.csv("http://statistics.uchicago.edu/~collins/data/STAT224other/icu.csv")
glimpse(icuData)
head(icuData)
icuData <- read.csv("./data/icu.csv")
glimpse(icuData)
Observations: 200
Variables: 5
$ sta <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
$ age <int> 27, 59, 77, 54, 87, 69, 63, 30, 35, 70, 55, 48, 66, 61, 66, 52,...
$ inf <int> 1, 0, 0, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, ...
$ hra <int> 88, 80, 70, 103, 154, 132, 66, 110, 60, 103, 86, 100, 80, 99, 9...
$ typ <int> 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, ...
head(icuData)
sta | age | inf | hra | typ |
---|---|---|---|---|
0 | 27 | 1 | 88 | 1 |
0 | 59 | 0 | 80 | 1 |
0 | 77 | 0 | 70 | 0 |
0 | 54 | 1 | 103 | 1 |
0 | 87 | 1 | 154 | 1 |
0 | 69 | 1 | 132 | 1 |
The 2x2 table of icu survival status (sta
) vs. predictor (hospital admission type, typ
)
tally(sta ~ typ, data=icuData, margins=TRUE)
typ
sta 0 1
0 51 109
1 2 38
Total 53 147
This is a generalized linear model (glm).
The “link function” is the logit.
The errors have a Bernoulli = binomial(\(n=1\)) distribution.
R syntax:
glmfit <- glm(y ~ x, data = mydata, family = "binomial")
tidy(glmfit, conf.int = TRUE)
glmfit <- glm(sta ~ typ, data = icuData, family = "binomial")
tidy(glmfit, conf.int = TRUE)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | -3.239 | 0.7206 | -4.495 | 0.0000 | -5.0492 | -2.071 |
typ | 2.185 | 0.7448 | 2.934 | 0.0034 | 0.9487 | 4.025 |
Transform to the odds ratio scale: \(e^{{\widehat{\beta}_1}}\)
b1 <- coef(glmfit)[2]
exp(b1)
typ
8.89
This should match the odds ratio right from the 2x2 table.
theTable <- tally(sta ~ typ, data = icuData)
oddsRatio(theTable)
[1] 0.1125
Hmmmm…. The odds ratio does not match.
After looking at the documentation for oddsRatio
in the mosaic
package,
I can see it calculates OR = (odds row 1) / (odds row 2).
That’s odds of elective admission among those who survied
divided by odds of elective admission among those who died.
Instead, we wanted odds of death among those with emergency admission
divided by odds of death among those with elective admission.
A little care shows that what we want is the reciprocal of the OR that R provided.
theTable <- tally(sta ~ typ, data = icuData)
1 / oddsRatio(theTable)
[1] 8.89
Now, the OR from the table matches the OR from the model, as expected.
Another way to do this is to swap the order of the response and predictor variables and get the “successes” (deaths) into the firs column as suggested in the documentation.
theTable <- tally(typ ~ -sta, data = icuData)
theTable
-sta
typ -1 0
0 2 51
1 38 109
oddsRatio(theTable)
[1] 8.89
summary(oddsRatio(theTable))
Odds Ratio
Proportions
Prop. 1: 0.03774
Prop. 2: 0.2585
Rel. Risk: 6.85
Odds
Odds 1: 0.03922
Odds 2: 0.3486
Odds Ratio: 8.89
95 percent confidence interval:
1.712 < RR < 27.42
2.064 < OR < 38.29
glmfit2 <- glm(sta ~ age + typ, data = icuData, family = "binomial")
tidy(glmfit2, conf.int=TRUE)
term | estimate | std.error | statistic | p.value | conf.low | conf.high |
---|---|---|---|---|---|---|
(Intercept) | -5.509 | 1.0335 | -5.330 | 0.0000 | -7.7942 | -3.6590 |
age | 0.034 | 0.0107 | 3.181 | 0.0015 | 0.0141 | 0.0564 |
typ | 2.454 | 0.7526 | 3.260 | 0.0011 | 1.1984 | 4.3029 |
The coefficients on the odds ratio scale:
coef(glmfit2)
(Intercept) age typ
-5.50876 0.03402 2.45354
exp( coef(glmfit2) )
(Intercept) age typ
0.004051 1.034601 11.629386