Chapter 6 of Statistics with Stata 5 describes linear regression analysis in Stata. Some of the material goes beyond what is required in Statistics 220. Chapter 7 also concerns linear regression, but the topics in this chapter would be the subject of a more advanced course.
. graph lean year, saving(pisa)
. regr lean year
. summ lean year
. corr lean year
. display invt(11, .975)
The above command produces the value of the 97.5th percentile from
a t distribution with 11 degrees of freedom.
In 10.10, here are some useful commands. Assuming you already have run
the regression,
. edit
Enter the (coded) value for 1997 under year.
Then type
. list
and the new value should appear in the listings.
. predict yhat
This command makes predictions and stores them in variable yhat.
Of course, you could do the prediction by hand.
. list
once again should show you the predictions.
.regr y x
is the regression command and you replace y and x with the variable
names you need to use.
. predict yhat
makes predictions for y based on the previous regression statement
. predict resid, residuals
Calculates residuals from the regression.
. plot resid yhat, saving(resid)
plots residuals versus predicted values and save a graph.
. predict
seCI, stdp
calculates standard errors for use in confidence intervals for a mean
predicted value and stores them in variable seCI, whereas
. predict sePI, stdf
calculates standard errors for use in prediction intervals for a new
value and stores them in variable sePI.
The last row of the data set has corn value of 100. Type
. list in 41
to see this last row.
.regr corn year soybean
(a) What is the equation for this multiple regression?
(b) What percent of the variation in corn yield is explained by these two variables?
(c) Compute the regression of corn on soybean ( .regr corn soybean ). Using this output and the output from the multiple regression, determine the increase in the percent of variation in corn yield explained by the regression on both variables over the percent of variation in corn yield explained by the regression on soybean alone. Do you think that including year and soybean in the model for predicting corn is significantly better than using soybean alone?
(d) What is the 95% confidence interval for the coefficient for year
(in the multiple regression model)? What does this tell you about
the contribution of year to the model when soybean is already
included?