Statistics 220

Homework 8

Due: Wednesday, March 7, 2001 in class

Reading Assignment

Chapters 9, 10, and 11.

Chapter 6 of Statistics with Stata 5 describes linear regression analysis in Stata. Some of the material goes beyond what is required in Statistics 220. Chapter 7 also concerns linear regression, but the topics in this chapter would be the subject of a more advanced course.

Written and Computer Assignment

9.12, page 648.

Read exercise 9.17 on page 649 and the answer in the back of the book.

9.22, page 651.

10.8, 10.9, and 10.10, pages 697-8. If you use the computer, explain how you would get the least-squares line from the summary information and the confidence interval from the coefficient estimate, standard error of the regression estimate, and the t-table.

pisa.dct

. display 6+1.96*7

. display 6-1.96*7

. graph lean year, saving(pisa)
. regr lean year
. summ lean year
. corr lean year
. display invt(11, .975)
The above command produces the value of the 97.5th percentile from a t distribution with 11 degrees of freedom.

In 10.10, here are some useful commands. Assuming you already have run the regression,
. edit
Enter the (coded) value for 1997 under year.
Then type
. list
and the new value should appear in the listings.
. predict yhat
This command makes predictions and stores them in variable yhat. Of course, you could do the prediction by hand.
. list
once again should show you the predictions.

10.16, page 700, plus (d) What is a 95% confidence interval for the average in the city when the rural reading is 43?

particles.dct

. clear

. summ if rural!=. & city!=.

. regr city rural

10.24 and 10.25, page 703.

cornsoy.dct

. clear

. summ

.regr y x
is the regression command and you replace y and x with the variable names you need to use.
. predict yhat
makes predictions for y based on the previous regression statement
. predict resid, residuals
Calculates residuals from the regression.
. plot resid yhat, saving(resid)
plots residuals versus predicted values and save a graph. . predict seCI, stdp
calculates standard errors for use in confidence intervals for a mean predicted value and stores them in variable seCI, whereas
. predict sePI, stdf
calculates standard errors for use in prediction intervals for a new value and stores them in variable sePI.
The last row of the data set has corn value of 100. Type
. list in 41
to see this last row.

In the previous problem, we used the corn soy data. It may be the case that year and soybean yield together are useful in a model for predicting corn yield. Run this multiple regression using stata.

.regr corn year soybean

(a) What is the equation for this multiple regression?

(b) What percent of the variation in corn yield is explained by these two variables?

(c) Compute the regression of corn on soybean (.regr corn soybean). Using this output and the output from the multiple regression, determine the increase in the percent of variation in corn yield explained by the regression on both variables over the percent of variation in corn yield explained by the regression on soybean alone. Do you think that including year and soybean in the model for predicting corn is significantly better than using soybean alone?

(d) What is the 95% confidence interval for the coefficient for year (in the multiple regression model)? What does this tell you about the contribution of year to the model when soybean is already included?

11.4, page 734, using only the first 39 observations in Table 1.6. We view GPA as the response variable. IQ and self-concept are the explanatory variables.

gpa2.dct

. clear