Instructor: Ronald Thisted Office: Eckhart 126 Phone: 702-8332 (voice mail) email: firstname.lastname@example.org
Weisberg, Sanford (1985). Applied Linear Regression, Second edition. New York: Wiley. Abbreviation: ALR. [Ordered at Seminary Coop Bookstore]
Chambers, John M., and Hastie, Trevor J., eds. (1992). Statistical Models in S. Pacific Grove: Wadsworth & Brooks/Cole. Abbreviation: SMS.
McCullagh, Peter, and Nelder, John (1989). Generalized Linear Models, Second Edition. London: Chapman & Hall.
Mosteller, Frederick and Tukey, John W. (1977). Data Analysis and Regression: A Second Course in Statistics. Reading: Addison-Wesley.
Rao, C. R. (1973). Linear Statistical Inference and its Applications, Second Edition. New York: Wiley.
Thisted, Ronald A. (1988). Elements of Statistical Computing: Numerical Computation. New York: Chapman & Hall.
1. We shall be using S-Plus for most purposes in this course. This program is
available on the Statistics Department computers, as well as on the SUN
Cluster. If you are not a member of the
Statistics Department, you should use the SUN Cluster
for access to S-Plus and to the data sets used in class. Important: If
you are not using the Statistics Department computers, you are responsible for
obtaining an account and for learning the computing environment on
the SUN Cluster, or another computer system.
2. All of the data sets from Weisberg's book, Applied Linear Regression, are available in the directory /ga/thisted/343 on galton. The files names are of the form ALRnnn, where nnn denotes the three-digit page number on which the data set appears. Note the UPPER CASE letters in the file names. These data are also available on the Sun Cluster in the directory /nfs/quads/q2/rats/343.
3. If you will be working on the Sun Cluster instead of galton, read the document entitled, "Using S+ on the Sun Cluster."
Example. Here are some computations in S to help you get started. The data are from exercise 1.2 in ALR. The statement "S UCINIT" should be done once and only once in each directory in which you plan to use S. It sets up a file called .Data, and turns off an invisible file .Audit that would otherwise grow without bound.
galton% mkdir Stat343 galton% cd Stat343 galton% S UCINIT # This sets up your Stat343 directory (one time only) /ga/thisted/Stat343/.Data has been created and new S can be run in ... directory /ga/thisted/Stat343 /ga/thisted/Stat343/.Data/.Audit is now null and locked against use. .. If you need Audit, see S notes on Audit. galton% Splus -e # On the SUN cluster, omit "-e" Warning: Cannot open audit file # This is entirely normal if you have set up S correctly > x <- read.table("/ga/thisted/343/ALR028") > # On the sun cluster, the file name should be "/nsf/quads/q2/~thisted/343/ALR028" > x V1 V2 1 210.8 29.211 2 210.2 28.559 3 208.4 27.972 ... etc ... 30 181.0 15.919 31 180.6 15.376 > reg.out <- lm(x[,2] ~ x[,1]) > summary(reg.out) Call: lm(formula = x[, 2] ~ x[, 1]) Residuals: Min 1Q Median 3Q Max -0.6138 -0.2497 -0.09921 0.2636 0.8123 Coefficients: Value Std. Error t value Pr(>|t|) (Intercept) -64.4128 1.4292 -45.0702 0.0000 x[, 1] 0.4403 0.0074 59.1431 0.0000 Residual standard error: 0.3563 on 29 degrees of freedom Multiple R-Squared: 0.9918 F-statistic: 3498 on 1 and 29 degrees of freedom, the p-value is 0 Correlation of Coefficients: (Intercept) x[, 1] -0.999 > reg.out Call: lm(formula = x[, 2] ~ x[, 1]) Coefficients: (Intercept) x[, 1] -64.41275 0.4402819 Degrees of freedom: 31 total; 29 residual Residual standard error: 0.356344 > q() galton%