All lectures

1 Chapter 8: Correlated Errors

1.0.1 Diagnostics

1.0.2 Another Problem to Solve: Autocorrelation

  • Regression assumption:
    error terms \(\epsilon_i\) and \(\epsilon_j\) for observations \(i\) and \(j\) are uncorrelated.

What does this mean?

Now, we know that observed residuals are correlated (due to minimizing least squares) + We still expect apparent randomness (no systematic patterns) if regression model OK

  • If observations follow a natural sequential order (in terms of how measured, collected, etc),
    then residuals may appear to have distinct serial patterns across time (order).
    • This suggest lack of independence.
    • This is called autocorrelation.

Sketch some examples.

1.0.3 Causes of autocorrelation

1.0.3.1 Time and space

Autocorrelation most often occurs often when

  • The same observations are measured in adjacent time periods
    • stock market volume, weather parameters, etc, today and tomorrow.
  • Observations are spatially close,
    • temperature measured in Illinois, Indiana, and Michigan
  • Experiments are run on “batches” of animals, cells, at specific times, etc

  • Tall trees compete for sunlight, thus are spaced apart with small plants in between
    • spatially, tall plants spaced apart, short plants closer together

Autocorrelation occurs since information/phenomena change slowly through time/space,
so factors influencing a variable are likely to be similar in adjacent periods/places.

1.0.3.3 So, what’s the problem with autocorrelation?

Autocorrelation can affect OLS analysis in the several ways:

  • OLS estimates are still unbiased but are not efficient (no longer have minimum variance)
  • \(\sigma^2\) and \(\mathrm{se}(\beta)\)s may be seriously under- (or over-) estimated.
  • positive correlation, often under-estimation of \(\sigma^2\) and \(\mathrm{se}(\beta)\)s
  • negative correlation (less common), often over-estimation

Underestimate of variability which leads to increased false positive decisions:
rejecting the null hypothesis \(\beta = 0\) when one should not

Whether positive or negative correlation,
confidence intervals for \(\beta\)s may no longer be valid.

1.0.3.4 Detecting autocorrelation

Plots:

  • Standardized residuals vs. time (or “index” = order of data collection)
  • Examine the sequence plot: signs of residuals by time/index:
    • \(+++++---++++-----+++\)

Statistical tests:

  • The runs test: use a statistical test based on the sequence plot
  • Durbin-Watson statistic: Measure the correlation in the residuals

1.0.4 Example: Consumer expenditure vs. money stock

Textbook Table 8.1 shows the data

Quarterly data from 1952 to 1956 on
\(y\) = U.S. consumer expenditure
\(x\) = U.S. money stock

Units: billions of current dollars

We regress the expenditure on money stock and plot vs. time
(these data are naturally collected in time order)

# expendData <- read.delim("http://statistics.uchicago.edu/~collins/data/RABE5/P211.txt")
expendData <- read.delim("./data/P211.txt")
# order data by time (year and quarter)
expendData <- arrange(expendData, Year, Quarter) 
head(expendData, 6)
Year Quarter Expenditure Stock
1952 1 214.6 159.3
1952 2 217.7 161.2
1952 3 219.6 162.8
1952 4 227.2 164.6
1953 1 230.9 165.9
1953 2 233.3 167.9
## fit linear regression: expenditures on money stock
lmfit <- lm(Expenditure ~ Stock, data = expendData)

tidy(lmfit)
term estimate std.error statistic p.value
(Intercept) -154.7 19.8500 -7.794 0
Stock 2.3 0.1146 20.080 0
glance(lmfit)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9573 0.9549 3.983 403.2 0 2 18

1.0.5 Plots and statistical tests for autocorrelation

1.0.5.1 Index plot

For time series data (data collected where time order is relevant),
a useful plot for analysis is the index plot

  • plot of the standardized residual versus
    • time or
    • some index of the collection order,
    • whichever makes more sense.

If errors are correlated,
residuals of the same sign (\(+\) or \(-\)) will occur in clusters or bunches called runs.

## add residuals, standardized residuals, and an index variable to dataset
n <- nrow(expendData) ## sample size
expendData <- mutate(expendData, e = residuals(lmfit), r = rstandard(lmfit), index = 1:n)
## index plot
gf_point(r ~ index, data = expendData, size = 2) %>% 
  gf_line() %>%
  gf_labs(x = "Quarter", y = "standardized residuals") %>%
  gf_lims(y = c(min(min(~r, data = expendData), -2.1), max(max(~r, data = expendData), 2.1))) %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

We connected the points from left to right to make it easier to see autocorrelation.

Here, the sequential residuals seem to follow an ordered pattern
(except for a few points) with respect to magnitude and sign

1.0.5.2 One statistic = number of runs

Number of runs:

The visual impression can be formally confirmed by counting
the number of sequences, batches, clusters, (runs) of either positive or negative residuals

Statistic = number of runs

Sequence Plot:

One-dimensional plot of the signs of the residuals, taken in the order of the observations.

For the expenditure vs. money stock example, the sequence plot is

expendData <- mutate(expendData, positive = 1 * (r > 0), sign = factor(positive, labels = c("-", "+")))

\(++-++++-------++++++\)


If you are curious, here is what I have stored in the dataset up to this point:

head(expendData, 6)
Year Quarter Expenditure Stock e r index positive sign
1952 1 214.6 159.3 2.8701 0.8095 1 1 +
1952 2 217.7 161.2 1.5994 0.4401 2 1 +
1952 3 219.6 162.8 -0.1812 -0.0490 3 0 -
1952 4 227.2 164.6 3.2782 0.8724 4 1 +
1953 1 230.9 165.9 3.9877 1.0513 5 1 +
1953 2 233.3 167.9 1.7869 0.4658 6 1 +

Now, back to looking at the sequence plot:

\(++-++++-------++++++\)

## rle = run length encoding (in base R)
rle(as.vector(expendData$positive)) ## runs and their lengths
Run Length Encoding
  lengths: int [1:5] 2 1 4 7 6
  values : num [1:5] 1 0 1 0 1
nruns <- length(rle(as.vector(expendData$sign))$lengths) ## number of runs

n <- nrow(expendData) ## sample size
n1 <- sum(expendData$positive) ## number of positive residuals
n2 <- n - n1 ## number of negative residuals

The sequence plot indicates that there are \(5\) runs.

  • There are \(n_1 = 12\) positive residuals and \(n_2 = 8\) negative residuals.
  • Under the null hypothesis of no autocorrelation,
    the expected number of runs \(\mu\) and its variance \(\sigma^2\) can be shown to be:

\(\displaystyle \mu=\frac{2n_1n_2}{n_1+n_2}+1\)

and

\(\displaystyle \sigma^2=\frac{2n_1n_2(2n_1n_2-n_1-n_2)}{(n_1+n_2)^2(n_1+n_2-1)}\)

mu <- 1 + 2*n1*n2 / (n1 + n2)

sigmasqr <- 2*n1*n2 * (2*n1*n2 - n1 - n2) / ( (n1 + n2)^2 * (n1 + n2 -1) )

In our example, \(n_1=12\) and \(n_2=8\), so
the expected number of runs is \(\mu=10.6\) and
the standard deviation is \(\sigma = 2.085\)

We observed only \(5\) runs; which is 2.7 SDs below \(\mu\).

Is this unusual under the null hypothesis of no autocorrelation?

1.0.5.3 The runs test

A well-known nonparametric test is called a “runs test”

In R, load the tseries (time series) package,
then use the runs.test function

The runs.test function expects you to give it a factor with two and only two levels.
For example,

library(tseries)
runs.test( as.factor( rstandard(lmfit) > 0) )

library(tseries, warn.conflicts = FALSE)
data.frame(sample.size=n, number.of.runs=nruns, positive=n1, negative=n2)
sample.size number.of.runs positive negative
20 5 12 8
runs.fit <- runs.test(expendData$sign)
runs.test.statistic <- runs.fit$statistic
runs.test.pvalue <- runs.fit$p.value

When I ran this code, I got a p-value of \(0.0072\).
This indicates we should reject the null hypothesis of no autocorrelation.

We kind of expected this given the appearance of the index plot.

1.0.5.4 The Durbin-Watson (DW) approach

A common approach to detecting serial correlation is known as the Durbin-Watson test.

The test is based on assuming that successive errors are correlated.

The model is written this way:

\[ Y_t = \beta_0 + \beta_1 x_{1t} + \cdots + \beta_p x_{pt} + \epsilon_t\]

where \(t\) is the time index and \(\epsilon_t\) is the \(t^{th}\) error in time.

If the errors are correlated, perhaps each residual (\(\epsilon_t\)):

  • depends linearly on the size and sign of the residual before it in time (\(\epsilon_{t-1}\))m,
  • has correlation \(\rho = \mathrm{cor}(\epsilon_t, \epsilon_{t-1})\), and
  • is not correlated with any earlier residuals (\(\epsilon_{t-2}, \epsilon_{t-3}, \ldots\)).

If this model is reasonable, then \[ \epsilon_t=\gamma_0 + \gamma_1\epsilon_{t-1}+\omega_t = \rho\epsilon_{t-1}+\omega_t, \qquad |\rho|<1, \] where \(\rho\) is the correlation coefficient between successive errors
and \(\omega_t\) are independent, normally distributed, with constant variance across the errors (\(\epsilon\)s).

Why is the intercept (\(\gamma_0\)) equal to zero?

  • By definition, intercept = (mean of response) \(-\) slope (mean of predictor).
  • mean of response = \(E(\epsilon_t) = 0\)
  • mean of predictor = \(E(\epsilon_{t-1}) = 0\)
  • \(\gamma_0 = 0 - \gamma_1(0) = 0\)

Why is the slope (\(\gamma_1\)) equal to \(\rho\)?

  • By definition, slope = correlation(response, predictor) \(\times \displaystyle{\frac{\text{SD(response)}}{\text{SD(predictor)}}}\)
    • correlation(response, predictor) = \(\mathrm{cor}(\epsilon_t, \epsilon_{t-1}) = \rho\)
    • SD(response) = SD(\(\epsilon_t\)) \(= \sigma\)
    • SD(predictor) = SD(\(\epsilon_{t-1}\)) \(= \sigma\)
  • \(\gamma_1 = \rho \times (\sigma\,/\,\sigma) = \rho\)

Thus, if \(\epsilon_t\) depends linearly on \(\epsilon_{t-1}\) and
is not correlated with the size of earlier errors, then this model is reasonable, then

\[ \epsilon_t=\rho\epsilon_{t-1}+\omega_t, \qquad |\rho|<1, \]

The relationship is likely more complex than this, but it’s a nice, simple starting point.
We call this a “first-order autoregressive model”

1.0.5.5 The Durbin-Watson statistic

The DW statistic (often labeled \(d\)) is defined as: \[ d\;=\; \frac{\sum_{t=2}^n (e_t-e_{t-1})^2}{\sum_{t=1}^n e_t^2} \;=\; \frac{\sum_{t=2}^n (e_t-e_{t-1})^2}{\mathrm{SSE}}, \] where \(e_t\) is the \(t^{th}\) observed residual with the data arranged in time order.

NOTE: Yes, the denominator of \(d\) sums from \(t=1\), the but numerator cannot.
The numerator has \((n-1)\) terms: \(\;e_t\; (e_2, e_3, \ldots, e_n)\;\) and \(\;e_{t-1}\; (e_1, e_2, \ldots, e_{n-1})\)

The DW statistic is used to test \(H_0:\rho=0\) vs \(H_1:\rho >0\).
When \(\rho=0\) cannot be rejected, we conclude that the errors are uncorrelated.

NOTE: \(H_1: \rho < 0\) is not common to test or observe,
but we can by using the statistic (\(4-d\)) instead of \(d\) and proceeding as below.

1.0.5.6 Properties of the DW statistic

We can show that the \(d\) statistic can be approximated by \(\widehat{\rho}\):

\[d \;\approx\; 2(1-\widehat{\rho}), \qquad\textrm{ where }\qquad \widehat{\rho} \;=\; \frac{\sum_{t=2}^n e_te_{t-1}}{\sum_{t=1}^n e_t^2} \;=\; \frac{\sum_{t=2}^n e_te_{t-1}}{\mathrm{SSE}}. \]

NOTE: Yes, the denominator of \(\widehat{\rho}\) sums from \(t=1\), the but numerator cannot.
The numerator has \((n-1)\) terms: \(\;e_t\; (e_2, e_3, \ldots, e_n)\;\) and \(\;e_{t-1}\; (e_1, e_2, \ldots, e_{n-1})\)

When \(\rho = 0\) (no autocorrelation), \(d \approx 2\).
When \(\rho = 1\) (perfect positive correlation), \(d \approx 0\).
When \(\rho = -1\) (perfect negative correlation), \(d \approx 4\).

The range of \(d\) is approximately \(0 \le d \le 4\).

So, when \(d\) is close to 2, there is not much autocorrelation.
When \(d\) is small, it suggests evidence of positive autocorrelation (\(\rho > 0\)).
When \((4-d)\) is small, it suggests evidence of negative autocorrelation (\(\rho < 0\)).

If your alternative hypothesis is \(H_1: \rho < 0\) (\(4-d\)) as your statistic.
Usually, the one-sided hypothesis is proposed prior to looking at the data (based on context).

1.0.5.7 Using the DW statistic

How close is considered “close to 2”?
For positive autocorrelations, (\(d\) is below 2 leads us to reject null), use the following process:

If \(d<d_L\), we reject the null hypothesis that there is no positive autocorrelation.

If \(d>d_U\), we fail to reject the null hypothesis. There is not much evidence of positive autocorrelation.

If \(d_L < d < d_U\), the test is inconclusive.

Like many nonparametric tests, values are tabled for reference.
See the textbook Appendix: Tables A.6 and A.7.

In R, there are a couple choices.

lmfit <- lm(y ~ x, data = mydata)

library(car)
durbinWatsonTest(lmfit)

library(lmtest)
dwtest(lmfit)

SHOW R CODE TO SEE RESULTS

library(car, warn.conflicts = FALSE)
durbinWatsonTest(lmfit, alternative = "positive")
 lag Autocorrelation D-W Statistic p-value
   1          0.7506        0.3282       0
 Alternative hypothesis: rho > 0
library(lmtest, warn.conflicts = FALSE)
dwtest(lmfit, alternative = "greater")

    Durbin-Watson test

data:  lmfit
DW = 0.33, p-value = 0.00000002
alternative hypothesis: true autocorrelation is greater than 0

Not surprisingly (given our earlier look at the residuals by time and the runs test),
the DW statistic also indicates strong autocorrelation.

1.0.6 Transformation

1.0.7 Removal of autocorrelation: transformation

When there is autocorrelation present in the errors,
one way to remove it is to transform the variables.

1.0.7.1 Cochrane-Orcutt transformation

The Cochrane-Orcutt transformation (textbook Section 8.4) is one method.

To see how it works, can write mathematically the errors for two adjacent observations:

\[\begin{align} \epsilon_t & \;=\; y_t-\beta_0-\beta_1 x_t \\ \epsilon_{t-1} & \;=\; y_{t-1}-\beta_0-\beta_1 x_{t-1} \end{align}\]

When errors are first-order autocorrelated with correlation coefficient \(\rho\), we have

\(\epsilon_t=\rho\epsilon_{t-1}+\omega_t\)

where the errors \(\omega_t\) are independent, normally distributed, with constant variance across \(\epsilon_t\)s

Plugging in the above expressions for two adjacent residuals gives

\[\begin{align} \epsilon_t & \;=\; \rho\epsilon_{t-1}+\omega_t\\ (y_t-\beta_0-\beta_1 x_t) & \;=\; \rho \left(y_{t-1}-\beta_0-\beta_1 x_{t-1} \right)+\omega_t. \end{align}\]

Rearranging the above equation we have \[ (y_t-\rho y_{t-1}) \;=\; \beta_0(1-\rho) + \beta_1( x_t-\rho x_{t-1}) + \omega_t. \]

This is equivalent to the linear model \[ y_t^\star=\beta_0^\star+\beta_1^\star x_t^\star +\omega_t \]

with
\(y_t^\star \;=\; (y_t-\rho y_{t-1})\),
\(x_t^\star \;=\; (x_t-\rho x_{t-1})\),
\(\beta_0^\star \;=\; \beta_0(1-\rho)\), and
\(\beta_1^\star \;=\; \beta_1\)

This linear model (with error term \(\omega_t\), which are uncorrelated by definition) is then fit to the data

1.0.7.2 Implementing Cochrane-Orcutt transformation

  1. Compute \(\widehat{\beta}_{0}\) and \(\widehat{\beta}_{1}\) by OLS as usual: \(Y_t = \beta_0 + \beta_1 x_t + \epsilon_t\)

  2. Calculate the residuals from Step 1 and estimate \(\rho\): \(\displaystyle \widehat{\rho} \;=\; \frac{\sum_{t=2}^n e_te_{t-1}}{\sum_{t=1}^n e_t^2}\)

  3. Compute OLS with response \((y_t - \widehat{\rho} y_{t-1})\) and predictor \((x_t-\widehat{\rho} x_{t-1})\)

Let \(\widehat{\beta}_{0} = \displaystyle{\frac{\widehat{\beta}_{0}^*}{1-\widehat{\rho}}}\) and \(\widehat{\beta}_{1} = \widehat{\beta}_{1}^*\).

\(\mathrm{se}(\widehat{\beta}_{1})\) can be obtained from the OLS estimate: \(\mathrm{se}(\widehat{\beta}_{1}^*)\).
\(\mathrm{se}(\widehat{\beta}_{0})\) can be esimated from \(\widehat{\rho}\) and the OLS estimate: \(\mathrm{se}(\widehat{\beta}_{0}^*)\).

That is, \(\displaystyle \mathrm{se}(\widehat{\beta}_{0}) \approx \frac{\mathrm{se}(\widehat{\beta}_{0}^*)}{1-\widehat{\rho}}\)

  1. If the new residuals from Step 3 still show autocorrelation, go back to Step 2 using \(\widehat{\beta}_{0}\) and \(\widehat{\beta}_{1}\) from Step 3.

Repeat until residuals after Step 3 show no evidence of autocorrelation.

1.0.7.3 Example: Expenditures vs. money stock

Step 1:

lmfit <- lm(Expenditure ~ Stock, data = expendData)

head(expendData, 6)
Year Quarter Expenditure Stock e r index positive sign
1952 1 214.6 159.3 2.8701 0.8095 1 1 +
1952 2 217.7 161.2 1.5994 0.4401 2 1 +
1952 3 219.6 162.8 -0.1812 -0.0490 3 0 -
1952 4 227.2 164.6 3.2782 0.8724 4 1 +
1953 1 230.9 165.9 3.9877 1.0513 5 1 +
1953 2 233.3 167.9 1.7869 0.4658 6 1 +

Step 2:

# estimate rho (correlation coefficient)
# "by hand"
e <- residuals(lmfit)
et <- e[2:n]
et1 <- e[1:(n-1)]
rhohat <- sum(et * et1) / sum(e^2)
rhohat
[1] 0.7506
## or grab rhohat from the Durbin-Watson test
durbinWatsonTest(lmfit, alternative = "positive")
 lag Autocorrelation D-W Statistic p-value
   1          0.7506        0.3282       0
 Alternative hypothesis: rho > 0
objects(durbinWatsonTest(lmfit, alternative = "positive"))
[1] "alternative" "dw"          "p"           "r"          
rhohat <- durbinWatsonTest(lmfit, alternative = "positive")$r
rhohat
[1] 0.7506

\(\widehat{\rho} = 0.7506\)

Step 3:

y <- with(expendData, Expenditure - rhohat * lag(Expenditure))
x <- with(expendData, Stock - rhohat * lag(Stock))

lmfit.CO1 <- lm(y ~ x)
tidy(lmfit.CO1)
term estimate std.error statistic p.value
(Intercept) -53.696 13.6164 -3.943 0.001
x 2.643 0.3069 8.614 0.000

b0star <- tidy(lmfit.CO1)$estimate[1]
b1star <- tidy(lmfit.CO1)$estimate[2]

\(\widehat{\beta}_{0}^* = -53.7\)
\(\widehat{\beta}_{1}^* = 2.643\)

b0hat <- b0star / (1 - rhohat)
b1hat <- b1star
data.frame(b0hat, b0star, b1hat, b1star)
b0hat b0star b1hat b1star
-215.3 -53.7 2.643 2.643

Step 4:

SHOW R CODE AND OUTPUT:

dwtest(lmfit.CO1, alternative = "greater")

    Durbin-Watson test

data:  lmfit.CO1
DW = 1.4, p-value = 0.06
alternative hypothesis: true autocorrelation is greater than 0

## grab the p-value from the Durbin-Watson test
objects(dwtest(lmfit.CO1, alternative = "greater"))
[1] "alternative" "data.name"   "method"      "p.value"     "statistic"  
dw.CO1.stat <- dwtest(lmfit.CO1, alternative = "greater")$statistic
dw.CO1.pvalue <- dwtest(lmfit.CO1, alternative = "greater")$p.value

Durbin-Watson statistic is now closer to 2: \(\; d = 1.426\)

But, the pvalue is still pretty small: p-value = \(0.05786\).
I would like to make one more iteration.

So, we go back to Step 2 with \(\widehat{\beta}_{0} = -215.3\) and \(\widehat{\beta}_{1} = 2.643\) calculated in Step 3 above.

Step 2: Estimate \(\rho\)

The residuals are now \(\quad e_t = y_t - (\widehat{\beta}_{0} + \widehat{\beta}_{1} x_t) = y_t - (-215.3 + 2.643 x_t)\)

## Calculate rhohat "manually"
e <- with(expendData, Expenditure - (b0hat + b1hat * Stock))
et <- e[2:n]
et1 <- e[1:(n-1)]
rhohat1 <- sum(et * et1) / sum(e^2)
rhohat1
[1] 0.79

I found \(\widehat{\rho} = 0.79\).

Step 3: Cochrane-Orcutt-transform \(y\) and \(x\) and re-estimate \(\widehat{\beta}_{0}\) and \(\widehat{\beta}_{1}\)

y <- with(expendData, Expenditure - rhohat1 * lag(Expenditure))
x <- with(expendData, Stock - rhohat1 * lag(Stock))

lmfit.CO2 <- lm(y ~ x)
tidy(lmfit.CO2)
term estimate std.error statistic p.value
(Intercept) -47.38 13.6885 -3.462 0.003
x 2.70 0.3643 7.411 0.000

b0star <- tidy(lmfit.CO2)$estimate[1]
b1star <- tidy(lmfit.CO2)$estimate[2]

\(\widehat{\beta}_{0}^* = -47.38\)
\(\widehat{\beta}_{1}^* = 2.7\)

b0hat <- b0star / (1 - rhohat1)
b1hat <- b1star
data.frame(b0hat, b0star, b1hat, b1star)
b0hat b0star b1hat b1star
-225.6 -47.38 2.7 2.7

Step 4:

SHOW R CODE AND OUTPUT:

dwtest(lmfit.CO2, alternative = "greater")

    Durbin-Watson test

data:  lmfit.CO2
DW = 1.5, p-value = 0.09
alternative hypothesis: true autocorrelation is greater than 0

## grab the p-value from the Durbin-Watson test
objects(dwtest(lmfit.CO2, alternative = "greater"))
[1] "alternative" "data.name"   "method"      "p.value"     "statistic"  
dw.CO2.stat <- dwtest(lmfit.CO2, alternative = "greater")$statistic
dw.CO2.pvalue <- dwtest(lmfit.CO2, alternative = "greater")$p.value

Durbin-Watson statistic is even closer to 2: \(\; d = 1.525\)

But, the pvalue is still pretty small: p-value = \(0.09133\).
I would like to make one more iteration.

So, we go back to Step 2 with \(\widehat{\beta}_{0} = -225.6\) and \(\widehat{\beta}_{1} = 2.7\) calculated in Step 3 above.

Step 2: Estimate \(\rho\)

The residuals are now \(\quad e_t = y_t - (\widehat{\beta}_{0} + \widehat{\beta}_{1} x_t) = y_t - (-225.6 + 2.7 x_t)\)

## Calculate rhohat "manually"
e <- with(expendData, Expenditure - (b0hat + b1hat * Stock))
et <- e[2:n]
et1 <- e[1:(n-1)]
rhohat2 <- sum(et * et1) / sum(e^2)
rhohat2
[1] 0.7977

I found \(\widehat{\rho} = 0.7977\).

Step 3: Cochrane-Orcutt-transform \(y\) and \(x\) and re-estimate \(\widehat{\beta}_{0}\) and \(\widehat{\beta}_{1}\)

y <- with(expendData, Expenditure - rhohat2 * lag(Expenditure))
x <- with(expendData, Stock - rhohat2 * lag(Stock))

lmfit.CO3 <- lm(y ~ x)
tidy(lmfit.CO3)
term estimate std.error statistic p.value
(Intercept) -46.081 13.7151 -3.360 0.0037
x 2.712 0.3785 7.165 0.0000

b0star <- tidy(lmfit.CO3)$estimate[1]
b1star <- tidy(lmfit.CO3)$estimate[2]

\(\widehat{\beta}_{0}^* = -46.08\)
\(\widehat{\beta}_{1}^* = 2.712\)

b0hat <- b0star / (1 - rhohat2)
b1hat <- b1star
data.frame(b0hat, b0star, b1hat, b1star)
b0hat b0star b1hat b1star
-227.8 -46.08 2.712 2.712

Step 4:

SHOW R CODE AND OUTPUT:

dwtest(lmfit.CO3, alternative = "greater")

    Durbin-Watson test

data:  lmfit.CO3
DW = 1.5, p-value = 0.1
alternative hypothesis: true autocorrelation is greater than 0

## grab the p-value from the Durbin-Watson test
objects(dwtest(lmfit.CO3, alternative = "greater"))
[1] "alternative" "data.name"   "method"      "p.value"     "statistic"  
dw.CO3.stat <- dwtest(lmfit.CO3, alternative = "greater")$statistic
dw.CO3.pvalue <- dwtest(lmfit.CO3, alternative = "greater")$p.value

Durbin-Watson statistic is even closer to 2: \(\; d = 1.543\)

p-value = \(0.09133\).

We fail to reject the null of no autocorrelation, so we stop here after three iterations.

NOTE: I’ll show you below how we can let the computer do iterations automatically.

\(\widehat{\beta}_{0} = \displaystyle{\frac{\widehat{\beta}_{0}^*}{1-\widehat{\rho}}} = -227.8\)

\(\widehat{\beta}_{1} = \widehat{\beta}_{1}^* = 2.712\)

Thus, our estimated regression equation is

\(y_t = -227.8 + 2.712 x_t\)

1.0.7.4 How does C-O compare to OLS regression?

Regression coefficient estimates

tidy(lmfit)
term estimate std.error statistic p.value
(Intercept) -154.7 19.8500 -7.794 0
Stock 2.3 0.1146 20.080 0
tidy(lmfit.CO3)
term estimate std.error statistic p.value
(Intercept) -46.081 13.7151 -3.360 0.0037
x 2.712 0.3785 7.165 0.0000

The slopes are similar and \(\mathrm{se}(\widehat{\beta}_{1})\) is larger after transformation.

The intercept and it’s SE do not get appropriately transformed in the standard regression output above.
We can transform the output ourselves:

se.b0hat <- tidy(lmfit.CO3)$std.error[1] / (1 - rhohat2)
b0hat <- b0star / (1 - rhohat2)
data.frame(b0hat, se.b0hat)
b0hat se.b0hat
-227.8 67.81

\(\widehat{\beta}_{0} = \displaystyle{\frac{\widehat{\beta}_{0}^*}{1-\widehat{\rho}}} = -227.8\)

\(\mathrm{se}(\widehat{\beta}_{0}) \approx \displaystyle{\frac{\mathrm{se}(\widehat{\beta}_{0}^*)}{1-\widehat{\rho}}} = 67.8\)

t-statistic \(\; = \displaystyle{\frac{\widehat{\beta}_{0} - 0}{\mathrm{se}(\widehat{\beta}_{0})}} = -3.36\) as above.

Regression goodness-of-fit statistics

glance(lmfit)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9573 0.9549 3.983 403.2 0 2 18
glance(lmfit.CO3)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.7512 0.7366 2.238 51.34 0 2 17

Durbin-Watson statistic

SHOW R CODE AND OUTPUT

durbinWatsonTest(lmfit, alternative = "positive")
 lag Autocorrelation D-W Statistic p-value
   1          0.7506        0.3282       0
 Alternative hypothesis: rho > 0

SHOW R CODE AND OUTPUT

durbinWatsonTest(lmfit.CO3, alternative = "positive")
 lag Autocorrelation D-W Statistic p-value
   1          0.1852         1.543   0.094
 Alternative hypothesis: rho > 0

After the model is corrected for autocorrelation by Cochrane-Orcutt transformaion:

  • DW statistic much closer to 2 (weaker evidence of autocorrelation)

  • Autocorrelation is much smaller.

  • Slope coefficients are similar, but \(\mathrm{se}(\widehat{\beta}_{1})\) is more than 3 times larger!

  • \(R^2\) is smaller.

WARNING:
Ignoring positive autocorrelation often leads to

  • underestimated errors,
  • too many false positives,
  • too narrow and invalid confidence intervals.

Index plots of residuals:

## index plot of regular OLS residuals
p1 <- gf_point(r ~ index, data = expendData, size = 2) %>% 
  gf_line() %>%
  gf_lims(y = c(-2.1, 2.1)) %>%
  gf_labs(x = "Quarter", y = "OLS residuals") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

## index plot of Cochrane-Orcutt residuals after three iterations
p2 <- gf_point(rstandard(lmfit.CO3) ~ 2:n, size = 2) %>%
  gf_line() %>%
  gf_lims(y = c(-2.1, 2.1)) %>%
  gf_labs(x = "Quarter", y = "Cochrane-Orcutt residuals (3 iterations)") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")  

grid.arrange(p1, p2, ncol=2)

1.0.7.5 Using R to iterate for Cochrane-Orcutt transformation

library(orcutt)
cochrane.orcutt(lmfit)

library(orcutt, warn.conflicts = FALSE)
cochrane.orcutt.fit <- cochrane.orcutt(lmfit)

OLS

tidy(lmfit)
term estimate std.error statistic p.value
(Intercept) -154.7 19.8500 -7.794 0
Stock 2.3 0.1146 20.080 0

Several iterations of Cochrane-Orcutt

tidy(cochrane.orcutt.fit)
Warning in tidy.orcutt(cochrane.orcutt.fit): deal with tidy.orcutt conf.int
nonsense
term estimate std.error statistic p.value
(Intercept) -235.488 78.6122 -2.996 0.0081
Stock 2.753 0.4365 6.308 0.0000

The slope coefficients are similar, but \(\mathrm{se}(\widehat{\beta}_{1})\) is almost 4 times larger!

Comparison of goodness-of-fit statistics:

glance(lmfit)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9573 0.9549 3.983 403.2 0 2 18
glance(cochrane.orcutt.fit)
r.squared adj.r.squared rho number.interaction dw.original p.value.original dw.transformed p.value.transformed
0.7006 0.683 0.8241 13 0.3282 0 1.601 0.1261

Index plots of residuals:

## index plot of regular OLS residuals
p1 <- gf_point(r ~ index, data = expendData, size = 2) %>% 
  gf_line() %>%
  gf_lims(y = c(-2.1, 2.1)) %>%
  gf_labs(x = "Quarter", y = "OLS residuals") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

## index plot of Cochrane-Orcutt residuals after three iterations

p2 <- gf_point(rstandard(lmfit.CO3) ~ 2:n, size = 2) %>%
  gf_line() %>%
  gf_lims(y = c(-2.1, 2.1)) %>%
  gf_labs(x = "Quarter", y = "Cochrane-Orcutt residuals (3 iterations)") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")  

## index plot of Cochrane-Orcutt "standardized" residuals after several iterations
rhohat.orcutt <- cochrane.orcutt.fit$rho
y <- with(expendData, Expenditure - rhohat.orcutt * lag(Expenditure))[-1]
x <- with(expendData, Stock - rhohat.orcutt * lag(Stock))[-1]
b0hat <- cochrane.orcutt.fit$coeff[1] * (1 - rhohat.orcutt)
b1hat <- cochrane.orcutt.fit$coeff[2]
e <- y - b0hat - b1hat * x 
## simple standardization of residuals (z-score)
z <- (e - mean(e)) / sd(e)

p3 <- gf_point(z ~ 2:n, size = 2) %>%
  gf_line() %>%
  gf_lims(y = c(-2.1, 2.1)) %>%
  gf_labs(x = "Quarter", y = paste("Cochrane-Orcutt residuals (", cochrane.orcutt.fit$number.interaction, " iterations)", sep="")) %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")  

grid.arrange(p1, p2, p3, ncol=3)

1.0.8 Omitted Variables

1.0.9 Correlated Errors due to Omitted Variables

Correlated errors can occur when a variable not in the model is related
to the response and related to the time or order in which the data were collected

1.0.9.1 Example: Relationship between housing starts and population of potential buyers

From textbook Table 8.4:
A midwestern construction industry association wants to know the relationship between housing starts and population growth. They obtain annual data on regional housing starts and try to relate it to number of potential home buyers in the region.

# houseData <- read.delim("http://statistics.uchicago.edu/~collins/data/RABE5/P219.txt")
houseData <- read.delim("./data/P219.txt")
names(houseData) <- c("houseStarts", "popnSize", "mortgageMoney")

head(houseData)
houseStarts popnSize mortgageMoney
0.0909 2.200 0.0364
0.0894 2.222 0.0334
0.0976 2.244 0.0387
0.0955 2.267 0.0374
0.0968 2.280 0.0406
0.1033 2.289 0.0424
gf_point(houseStarts ~ popnSize, data = houseData, size = 2) %>% 
  gf_labs(y = "Housing Starts") %>%
  gf_labs(x="Population")

lmfit <- lm(houseStarts ~ popnSize, data=houseData)
#plot(fit$residuals)
tidy(lmfit)
term estimate std.error statistic p.value
(Intercept) -0.0609 0.0104 -5.845 0
popnSize 0.0714 0.0042 16.867 0
glance(lmfit)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9252 0.922 0.0041 284.5 0 2 23
## add residuals, standardized residuals, and an index variable to dataset
n <- nrow(houseData) ## sample size
houseData <- mutate(houseData, e = residuals(lmfit), r = rstandard(lmfit), index = 1:n)
## index plot
p1 <- gf_point(r ~ index, data = houseData, size = 2) %>% 
  gf_line() %>%
    gf_labs(title='Fit with population only') %>%
  gf_labs(y = "standardized residuals") %>%
  gf_lims(y = c(min(min(~r, data = houseData), -2.1), max(max(~r, data = houseData), 2.1))) %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")
p1

This residual plot shows clear evidence of autocorrelation!

We can confirm this using a runs test:

houseData <- mutate(houseData, positive = 1 * (r > 0), sign = factor(positive, labels = c("-", "+")))

Signs of the residuals in time order:

—–+++++++++++—–+-++

runs.test(factor(sign(houseData$r)))

    Runs Test

data:  factor(sign(houseData$r))
Standard Normal = -3, p-value = 0.002
alternative hypothesis: two.sided
runs.fit <- runs.test(factor(sign(houseData$r)))
runs.test.statistic <- runs.fit$statistic
runs.test.pvalue <- runs.fit$p.value

p-value = \(0.0024\).
This indicates we should reject the null hypothesis of no autocorrelation.

We can also calculate the Durbin-Watson statistic:

durbinWatsonTest(lmfit)
 lag Autocorrelation D-W Statistic p-value
   1          0.6511        0.6208       0
 Alternative hypothesis: rho != 0

The residual plot, runs test, and Durbin-Watson statistic
all indicate the presence of autocorrelation.

The researchers further conjectured that the availability of mortgage money for the region might also be a predictor for housing starts, and decided to include it in the model.

Available mortgage money also varies by time.

fit.mortgage <- lm(houseStarts ~ popnSize + mortgageMoney, data = houseData)
tidy(fit.mortgage)
term estimate std.error statistic p.value
(Intercept) -0.0104 0.0103 -1.013 0.322
popnSize 0.0347 0.0064 5.394 0.000
mortgageMoney 0.7605 0.1216 6.255 0.000
glance(fit.mortgage)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9731 0.9706 0.0025 397.6 0 3 22
# add residuals and standardized residuals to dataset
n <- nrow(houseData) # sample size
houseData <- mutate(houseData, e.mortgage = residuals(fit.mortgage), r.mortgage = rstandard(fit.mortgage))
## index plot
p2 <- gf_point(r.mortgage ~ index, data = houseData, size = 2) %>% 
  gf_line() %>%
  gf_labs(title='Fit with population and mortgage money') %>%
  gf_labs(y = "standardized residuals") %>%
  gf_lims(y = c(min(min(~ r.mortgage, data = houseData), -2.1), max(max(~ r.mortgage, data = houseData), 2.1))) %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")
p2

This looks as though the autocorrelation has been fixed.

Let’s check first with a runs test:

houseData <- mutate(houseData, positive.mortgage = 1 * (r.mortgage > 0), sign.mortgage = factor(positive.mortgage, labels = c("-", "+")))

Signs of the residuals in time order:

–+–++++-++++-+–++-+++-

runs.fit <- runs.test(factor(sign(houseData$r.mortgage)))
runs.test.statistic <- runs.fit$statistic
runs.test.pvalue <- runs.fit$p.value

p-value = \(1\).
This indicates we should not reject the null hypothesis of no autocorrelation.

We can also calculate the Durbin-Watson statistic:

durbinWatsonTest(fit.mortgage)
 lag Autocorrelation D-W Statistic p-value
   1         0.03957         1.852   0.412
 Alternative hypothesis: rho != 0
grid.arrange(p1, p2, ncol=2)

Can we understand what happened?

First, make an added variable plot
to see how the mortgage money affects housing starts,
after controlling both mortgage money and housing for population:

avPlots(fit.mortgage, terms = ~ mortgageMoney, pch=16, xlab="Residuals of mortgage money after regressing on population", ylab="Resid. of housing after regressing on population")

We see that there is a positive relationship:
more mortgage money means more housing, all else being equal.

We can look at how residuals of mortgage money after regressing on population change over time:

fit.m <- lm(mortgageMoney ~ popnSize, data=houseData) # Regress mortgage money on residuals
houseData <- mutate(houseData, m.on.h.resids = fit.m$residuals)
p3 <- gf_point(m.on.h.resids ~ index, data = houseData, size = 2) %>% 
  gf_line() %>%
  gf_labs(y = "Residuals of mortgage money after regressing on population")%>%
 # gf_lims(y = c(min(min(~r.mortgage, data = house), -2.1), max(max(~r.mortgage, data = house), 2.1))) %>%
  gf_hline(yintercept = c(0), linetype = "dashed")
grid.arrange(p3 , p1, ncol=2)

It seems that in the original fit, with mortgage money omitted,
the model underestimated housing starts at those times when
mortgage money was higher than would have been predicted from population alone.

1.0.10 Seasonality

1.0.11 Autocorrelation: One More Example

The analytic methods examined thus far (DW statistic, runs test) deal with
first order autocorrelation,
meaning serial correlation of adjacent values.

There are cases where autocorrelation may indeed be present,
but with other, perhaps more complex, patterns of time-dependence.

From textbook Table 8.8,
we consider the exports of a company that produces and markets ski equipment in the US.

We hope to summarize a relationship of quarterly sales to some leading economic indicators,
like personal disposable income (PDI) in billions of current dollars.

The initial model is

\[ S_t = \beta_{0} + \beta_{1} PDI_t + \epsilon_t \]

where \(S_t\) is the ski Sales in period \(t\).

The sales period naturally follows time order, and cycles through seasons.

# skiData <- read.delim("http://statistics.uchicago.edu/~collins/data/RABE5/P224.txt")
skiData <- read.delim("./data/P224.txt")
head(select(skiData, Sales, PDI), 8)
Sales PDI
37.0 109
33.5 115
30.8 113
37.9 116
37.4 118
31.6 120
34.0 122
38.1 124

Let’s look at the scatterplot of Sales vs PDI:

gf_point(Sales ~ PDI, data = skiData) %>% 
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "Sales", x = "PDI")

This looks pretty good. A clear linear relationship.

Since the data were collected over time,
we should look for autocollinearity.

Start with an index plot of the Sales variable:

n <- nrow(skiData)
skiData <- mutate(skiData, index = 1:n)
gf_point(Sales ~ index, data = skiData, size = 2) %>% 
  gf_line(color='black') %>%
  gf_labs(title='Ski Sales') %>%
  gf_labs(y = "Sales", x="Index")

The Sales values go up-and-down and up-and-down over time.

We go ahead and fit the simple linear regression model for now to check further for autocorrelation.

ski.fit <- lm(Sales ~ PDI, data = skiData)
skiData <- mutate(skiData, r = rstandard(ski.fit))

tidy(ski.fit)
term estimate std.error statistic p.value
(Intercept) 12.3921 2.539 4.88 0
PDI 0.1979 0.016 12.35 0
glance(ski.fit)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.8006 0.7953 3.019 152.5 0 2 38

Look at an index plot of the residuals:

gf_point(r ~ index , data = skiData) %>% 
  gf_line(color = "black") %>%
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "r", x = "index") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

This makes clear the up-and-down and up-and-down pattern.

What do we see from the Durbin-Watson statistic?

durbinWatsonTest(ski.fit)
 lag Autocorrelation D-W Statistic p-value
   1      -0.0008867         1.968    0.79
 Alternative hypothesis: rho != 0

What?
The autocorrelation is basically zero.
The Durbin-Watson statistic is near 2.

There is no evidence of first-order autocorrelation. There must be something else that explains the pattern.

Were any other variables collected over time?

skiData <- mutate(skiData, theSeason = factor(Season, labels = c('Spring/Summer', 'Fall/Winter')))

head(skiData, 8)
Quarter Sales PDI Season index r theSeason
Q1/64 37.0 109 1 1 1.0516 Fall/Winter
Q2/64 33.5 115 0 2 -0.5679 Spring/Summer
Q3/64 30.8 113 0 3 -1.3633 Spring/Summer
Q4/64 37.9 116 1 4 0.8753 Fall/Winter
Q1/65 37.4 118 1 5 0.5665 Fall/Winter
Q2/65 31.6 120 0 6 -1.5522 Spring/Summer
Q3/65 34.0 122 0 7 -0.8654 Spring/Summer
Q4/65 38.1 124 1 8 0.3970 Fall/Winter

What is the pattern in the data by season?

gf_point(Sales ~ PDI, color = ~ theSeason, data = skiData) %>% 
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "Sales", x = "PDI") %>%
  gf_theme(legend.position = "bottom")

Perhaps two parallel lines are needed (same slope, different intercept by season).

n <- nrow(skiData)
skiData <- mutate(skiData, index = 1:n)
gf_point(Sales ~ index, color = ~ theSeason, data = skiData, size = 2) %>% 
  gf_line(color='black') %>%
  gf_labs(title='Ski Sales') %>%
  gf_labs(y = "Sales", x="Index")

It’s clear Sales are higher in Fall/Winter versus lower Sales in Spring/Summer
and this information not accounted for in our model.

Here is the residual plot with seasons indicated:

gf_point(r ~ index , color = ~ theSeason, data = skiData) %>% 
  gf_line(color = "black") %>%
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "r", x = "index") %>%
  gf_theme(legend.position = "bottom")%>%
#  gf_labs(legend=c('Fall and Winter','Spring and summer')) %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

The higher Sales in Fall/Winter lead to consistently higher residuals in these seasons.

This is a form of autocorrelation:
high Sales in Fall are paired with high Sales in the next Winter,
while low Sales in Spring are adjacent to low Sales in Summer.

Reminder: This autocorrelation was NOT detected by the Durbin-Watson test?

durbinWatsonTest(ski.fit)
 lag Autocorrelation D-W Statistic p-value
   1      -0.0008867         1.968   0.818
 Alternative hypothesis: rho != 0

The positive correlation within adjacent Fall/Winter and Spring/Summer quarters
is balanced by a negative correlation between adjacent Winter/Spring and Summer/Fall seasons,
leading to a \(\widehat\rho\) that is close to zero.

Also, the linear relationship between Sales and PDI
is different by season (same slope, different intercept).

Let’s try a different approach.
We’ll include Season as an indicator variable, with value 1 in the Fall/Winter.

ski.fit2 <- lm(Sales ~ PDI + Season, data = skiData)
skiData <- mutate(skiData, r2 = rstandard(ski.fit2))

tidy(ski.fit2)
term estimate std.error statistic p.value
(Intercept) 9.5402 0.9748 9.787 0
PDI 0.1987 0.0060 32.915 0
Season 5.4643 0.3597 15.192 0
glance(ski.fit2)[-c(7:10)]
r.squared adj.r.squared sigma statistic p.value df df.residual
0.9724 0.971 1.137 652.9 0 3 37

p1 <- gf_point(r2 ~ index, color = ~theSeason, data = skiData) %>% 
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "r", x="index") %>%
  gf_theme(legend.position="bottom") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

p2 <- gf_point(r2 ~ index, color = ~theSeason, data = skiData) %>% 
  gf_line(color = "black") %>%
  gf_labs(title = 'Ski Sales') %>%
  gf_labs(y = "r", x="index") %>%
  gf_theme(legend.position="bottom") %>%
  gf_hline(yintercept = c(-2, 0, 2), linetype = "dashed")

grid.arrange(p1, p2, ncol=2)

This looks much better.

I’m a little concerned that there may be some other autocorrelation going on:
that residuals are consistently higher in observations 15-30 or so.

But this autocorrelation, if it exists, would be first-order,
and I should be able to capture it with the Durbin-Watson test:

durbinWatsonTest(ski.fit2)
 lag Autocorrelation D-W Statistic p-value
   1         0.09692         1.772   0.326
 Alternative hypothesis: rho != 0

OK, this looks good! No evidence of autocorrelation.

Last, I will consider including an interaction term,
in case the slope differs for Fall/Winter versus Spring/Summer:

ski.fit3 <- lm(Sales ~ PDI * Season, data=skiData)

tidy(ski.fit3)
term estimate std.error statistic p.value
(Intercept) 9.5300 1.3787 6.9122 0.0000
PDI 0.1987 0.0087 22.8773 0.0000
Season 5.4846 1.9397 2.8276 0.0076
PDI:Season -0.0001 0.0122 -0.0106 0.9916

The interaction term isn’t significant, so I can stick with the simpler model.