Statistics 220

Homework 8

Due: Wednesday, March 7, 2001 in class


Reading Assignment

Chapters 9, 10, and 11.

Chapter 6 of Statistics with Stata 5 describes linear regression analysis in Stata. Some of the material goes beyond what is required in Statistics 220. Chapter 7 also concerns linear regression, but the topics in this chapter would be the subject of a more advanced course.


Written and Computer Assignment

  1. 9.12, page 648.

  2.  
  3. Read exercise 9.17 on page 649 and the answer in the back of the book.

  4.  
  5. 9.22, page 651.

  6.  
  7. 10.8, 10.9, and 10.10, pages 697-8. If you use the computer, explain how you would get the least-squares line from the summary information and the confidence interval from the coefficient estimate, standard error of the regression estimate, and the t-table.

  8.  
      After you get the data into Stata (they are on the web page, data set named pisa.dct ), start a log file and use some of the commands below.
      To show how you would get the confidence interval from the summary information, you could use the display command to make basic calculations. For example,
      . display 6+1.96*7
      produces the value 19.72, and
      . display 6-1.96*7
      produces the value -7.72.

      . graph lean year, saving(pisa)
      . regr lean year
      . summ lean year
      . corr lean year
      . display invt(11, .975)
      The above command produces the value of the 97.5th percentile from a t distribution with 11 degrees of freedom.

      In 10.10, here are some useful commands. Assuming you already have run the regression,
      . edit
      Enter the (coded) value for 1997 under year.
      Then type
      . list
      and the new value should appear in the listings.
      . predict yhat
      This command makes predictions and stores them in variable yhat. Of course, you could do the prediction by hand.
      . list
      once again should show you the predictions.
       

  9. 10.16, page 700, plus (d) What is a 95% confidence interval for the average in the city when the rural reading is 43?

  10.  
      The data are available as data set particles.dct on the web site. If you are continuing from the previous part, type . clear in Stata to remove the previous data set. After you open the data set (if you do not stop Stata, your log file should still be open), the following commands might be useful. Notes: (1) Stata represents a missing value by a period "." . (2) Stata will as a default drop observations from calculations if they are missing.
      . summ if rural!=. & city!=.
      . regr city rural
      The CI and PI you should do using the regression output and summary statistics.
       
  11. 10.24 and 10.25, page 703.

  12.  
      The data are are in the Homework folder, dataset cornsoy.dct. If you are continuing from the previous problem, type . clear to remove previous data. If you do not stop Stata, your log file should still be going. Type . summ to learn the names of variables in Stata. Some general commands are given below that you can modify for this problem.

      .regr y x
      is the regression command and you replace y and x with the variable names you need to use.
      . predict yhat
      makes predictions for y based on the previous regression statement
      . predict resid, residuals
      Calculates residuals from the regression.
      . plot resid yhat, saving(resid)
      plots residuals versus predicted values and save a graph. . predict seCI, stdp
      calculates standard errors for use in confidence intervals for a mean predicted value and stores them in variable seCI, whereas
      . predict sePI, stdf
      calculates standard errors for use in prediction intervals for a new value and stores them in variable sePI.
      The last row of the data set has corn value of 100. Type
      . list in 41
      to see this last row.
       

  13. In the previous problem, we used the corn soy data. It may be the case that year and soybean yield together are useful in a model for predicting corn yield. Run this multiple regression using stata.

  14. .regr corn year soybean

    (a) What is the equation for this multiple regression?

    (b) What percent of the variation in corn yield is explained by these two variables?

    (c) Compute the regression of corn on soybean ( .regr corn soybean ). Using this output and the output from the multiple regression, determine the increase in the percent of variation in corn yield explained by the regression on both variables over the percent of variation in corn yield explained by the regression on soybean alone. Do you think that including year and soybean in the model for predicting corn is significantly better than using soybean alone?

    (d) What is the 95% confidence interval for the coefficient for year (in the multiple regression model)? What does this tell you about the contribution of year to the model when soybean is already included?
     

  15. 11.4, page 734, using only the first 39 observations in Table 1.6. We view GPA as the response variable. IQ and self-concept are the explanatory variables.

  16.  
      The data are are in the Data folder, dataset gpa2.dct. If you are continuing from the previous problem, type . clear to remove previous data.