Fall 1995
Instructor: George G. Woodworth
Office: 225D MLH
Telephone 335-0816 (office)
337-2000 (home)
Office hours: to be announced
e-mail George-Woodworth@uiowa.edu
Multivariate statistical methods are used to analyze
data in which 1) several variables are observed for each subject
(or case) and 2) the distribution of those variables cannot be
reduced to a univariate distribution. Multiple linear regression
is not multivariate, although it is included in some multivariate
statistics textbooks. This course is appropriate for students
in statistics and in fields in which statistics is used as a research
tool. Since multivariate analysis requires the use of computers,
students are expected to have some experience with computers,
although instruction will be provided in the statistical packages
used in this course..
This is an applications course, the goal is to provide
students with tools for analyzing and displaying multivariate
data. Students will learn some matrix algebra and will study some
multivariate distributions in order to understand each technique
and to know when it is appropriate to use each particular method,
but there will be no formal mathematical proofs. Students are
expected to have completed courses in basic statistical methods,
experimental design and regression.
Upon successful completion of this course, students
will be able to use statistical packages for multivariate analysis,
including SAS, MINITAB 10, BMDP, and XLISP-STAT. Students will
be able to make appropriate use of multivariate graphics and descriptive
statistics, multivariate general linear models (MANOVA, MGLM,
MGLH) multivariate components of variance (growth models, longitudinal
models), systems of linear equations (path analysis), latent structure
methods (factor analysis, structural equation systems, cluster
analysis), linear and nonlinear classification and discrimination
(discriminant analysis, CART, multinomial logistic regression),
scaling and ordination methods (principal components, canonical
variables, nonmetric scaling), and multivariate smoothing. Real
data and applications from biomedical sciences, geology, law and
justice, engineering, business, and other fields will be used.
Students are expected to purchase or have easy access to SAS Statistics manuals and the SAS Changes and Enhancements manual. Lecture notes and handouts will be distributed at most lectures, other readings will be placed on reserve in the Mathematics library, supplemental reading materials may be made available for downloading via FTP. Brief descriptions of the underlying theory for many methods can be found in SAS manuals. The instructor will have office hours, and can be reached by e-mail, voice-mail, or telephone at home or office.
There will be short homework problems and short data analysis assignments. Data analysis assignments will require the use of a computer. Students may submit homework as teams. Grades will be based on homework and two substantial data analysis projects, the projects will require writing individual reports. Students who wish to propose their own projects instead of analyzing the assigned data should contact the instructor for approval.
SAS for Windows is available from the Weeg computing
center under a one year license at an attractive price with a
modest annual renewal fee. It works best with a CD ROM drive,
a large hard drive, at least 8 megabytes of RAM, and a fast CPU
(486 50 or faster).
SAS is also available at some ITC's (Nursing and
Business, but not the Math) and is available on VAXA. Registered
students will receive a VAX ID and can access VAXA via TELNET
from any ITC or from home via modem using communications software
such as Kermit (free) or many commercial products. Modem speeds
above 2400 are recommended for satisfactory performance.
XLISP-STAT has been installed in the computer lab in B5 MLH: however, versions for Macintosh, UNIX, and Windows are available free and can be downloaded via anonymous FTP from STAT.UMN.EDU. If you don't know how to do this, please contact the instructor.
The instructor will be available during posted office
hours or by appointment. He will send hints, tips and announcements
to students by e-mail. Students may telephone the instructor at
his home (337-2000) or office (335-0816).
Students will find it helpful to obtain an e-mail
address and to check it on a daily basis for announcements, notes,
and hints. Please send the instructor a short message containing
your e-mail address.
Instruction will be provided in the computer packages
used in this course. Students unfamiliar with computing at the
University of Iowa are urged to visit the Weeg computing center
and sign up for appropriate short courses.
The instructor aims to match the pace and contents
of the course to the backgrounds and needs of the students. Therefore
if you have any problems with the conduct of the course or your
treatment by the instructor, you are urged to bring your concern
to the instructor first. If that is not satisfactory, you may
contact Dr. James Broffitt, the chairman of the Department of
Statistics and Actuarial Science, (14 MLH, 335-0712).
Topics to be Covered
(Each unit will take 1 to 2 weeks; readings and homework assignments will be announced.)
1. Multivariate Graphics
Scatterplot Matrices (XLISP-STAT)
Scatterplot Brushing (XLISP-STAT)
Projections
Spinning (XLISP-STAT)
Principal Axes
Canonical Axes
2. Extensions of Univariate Statistical Concepts
Random Vector
Mean Vector
Variation
Covariance Matrix
Slices and Shadows
Directional Variance
Principal Axes (PRINCOMP)
Canonical Axes (CANDISC)
Concentration and Confidence Ellipsoids
Partial Covariance Matrix
Standardized scores (Mahalanobis distance)
Hotelling's T2
Matrix Algebra
Matrix Operations
Inverse and Generalized Inverses
Trace (sum of variances)
Determinant
Singular Value Decomposition and Spectral Decomposition
The SWEEP operator
Kronecker products and rollouts
4. Multivariate Normal and Related Distribution
Linear Forms
Quadratic Forms
Wishart Distribution (Multivariate Chi-Square)
Maximum Root (GCR) Distribution
Matrix Normal Distribution
5. The Multivariate General Linear Model (GLM)
The General Linear Hypothesis
Marginal / Joint / Simultaneous tests and confidence sets.
Structured Covariance (MIXED)
6. Multivariate Prediction and Classification
Discriminant Analysis (DISCRIM, CANDISC, STEPDISC)
Multinomial Logistic Regression (IML, CATMOD, BMDPPR)
Classification and Regression Trees (CART)
Neural Nets
7. Multidimensional Scaling
Factor Analysis (FACTOR)
Nonmetric Scaling (MDS)
8. Uncovering Latent Structure
Systems of Linear Equations (CALIS)
Cluster Analysis (ACECLUS, FASTCLUS, CLUSTER)
Correspondence Analysis (CORRESP)
Factor Analysis
9. Coping with Missing Data (BMDPAM, BMDP8D)
Missing at Random
EM Algorithm (IML)
10. Smoothers and Splines (TRANSREG)
11. Discrete Multivariate Distributions
Product of Poisson
Product of Multinomial
Multivariate Hypergeometric
12. Multivariate Categorical Data Analysis (BMDP4F, HILOGLINEAR)