22S:161 Applications of Multivariate Statistical Analysis

Fall 1995

Instructor: George G. Woodworth
Office: 225D MLH
Telephone 335-0816 (office)
337-2000 (home)
Office hours: to be announced

e-mail George-Woodworth@uiowa.edu

Introduction

Multivariate statistical methods are used to analyze data in which 1) several variables are observed for each subject (or case) and 2) the distribution of those variables cannot be reduced to a univariate distribution. Multiple linear regression is not multivariate, although it is included in some multivariate statistics textbooks. This course is appropriate for students in statistics and in fields in which statistics is used as a research tool. Since multivariate analysis requires the use of computers, students are expected to have some experience with computers, although instruction will be provided in the statistical packages used in this course..

Course Philosophy:

This is an applications course, the goal is to provide students with tools for analyzing and displaying multivariate data. Students will learn some matrix algebra and will study some multivariate distributions in order to understand each technique and to know when it is appropriate to use each particular method, but there will be no formal mathematical proofs. Students are expected to have completed courses in basic statistical methods, experimental design and regression.

Course Objectives

Upon successful completion of this course, students will be able to use statistical packages for multivariate analysis, including SAS, MINITAB 10, BMDP, and XLISP-STAT. Students will be able to make appropriate use of multivariate graphics and descriptive statistics, multivariate general linear models (MANOVA, MGLM, MGLH) multivariate components of variance (growth models, longitudinal models), systems of linear equations (path analysis), latent structure methods (factor analysis, structural equation systems, cluster analysis), linear and nonlinear classification and discrimination (discriminant analysis, CART, multinomial logistic regression), scaling and ordination methods (principal components, canonical variables, nonmetric scaling), and multivariate smoothing. Real data and applications from biomedical sciences, geology, law and justice, engineering, business, and other fields will be used.

Sources of information:

Students are expected to purchase or have easy access to SAS Statistics manuals and the SAS Changes and Enhancements manual. Lecture notes and handouts will be distributed at most lectures, other readings will be placed on reserve in the Mathematics library, supplemental reading materials may be made available for downloading via FTP. Brief descriptions of the underlying theory for many methods can be found in SAS manuals. The instructor will have office hours, and can be reached by e-mail, voice-mail, or telephone at home or office.

Grading

There will be short homework problems and short data analysis assignments. Data analysis assignments will require the use of a computer. Students may submit homework as teams. Grades will be based on homework and two substantial data analysis projects, the projects will require writing individual reports. Students who wish to propose their own projects instead of analyzing the assigned data should contact the instructor for approval.

Computing

SAS for Windows is available from the Weeg computing center under a one year license at an attractive price with a modest annual renewal fee. It works best with a CD ROM drive, a large hard drive, at least 8 megabytes of RAM, and a fast CPU (486 50 or faster).

SAS is also available at some ITC's (Nursing and Business, but not the Math) and is available on VAXA. Registered students will receive a VAX ID and can access VAXA via TELNET from any ITC or from home via modem using communications software such as Kermit (free) or many commercial products. Modem speeds above 2400 are recommended for satisfactory performance.

XLISP-STAT has been installed in the computer lab in B5 MLH: however, versions for Macintosh, UNIX, and Windows are available free and can be downloaded via anonymous FTP from STAT.UMN.EDU. If you don't know how to do this, please contact the instructor.

How to get help

The instructor will be available during posted office hours or by appointment. He will send hints, tips and announcements to students by e-mail. Students may telephone the instructor at his home (337-2000) or office (335-0816).

Students will find it helpful to obtain an e-mail address and to check it on a daily basis for announcements, notes, and hints. Please send the instructor a short message containing your e-mail address.

Instruction will be provided in the computer packages used in this course. Students unfamiliar with computing at the University of Iowa are urged to visit the Weeg computing center and sign up for appropriate short courses.

How to resolve conflicts.

The instructor aims to match the pace and contents of the course to the backgrounds and needs of the students. Therefore if you have any problems with the conduct of the course or your treatment by the instructor, you are urged to bring your concern to the instructor first. If that is not satisfactory, you may contact Dr. James Broffitt, the chairman of the Department of Statistics and Actuarial Science, (14 MLH, 335-0712).

Topics to be Covered

(Each unit will take 1 to 2 weeks; readings and homework assignments will be announced.)

1. Multivariate Graphics

Scatterplot Matrices (XLISP-STAT)
Scatterplot Brushing (XLISP-STAT)
Projections
Spinning (XLISP-STAT)
Principal Axes
Canonical Axes

2. Extensions of Univariate Statistical Concepts

Random Vector
Mean Vector
Variation
Covariance Matrix
Slices and Shadows
Directional Variance
Principal Axes (PRINCOMP)
Canonical Axes (CANDISC)
Concentration and Confidence Ellipsoids
Partial Covariance Matrix
Standardized scores (Mahalanobis distance)
Hotelling's T2

  1. Mathematics of Multivariate Analysis

Matrix Algebra
Matrix Operations
Inverse and Generalized Inverses
Trace (sum of variances)
Determinant
Singular Value Decomposition and Spectral Decomposition
The SWEEP operator
Kronecker products and rollouts

4. Multivariate Normal and Related Distribution

Linear Forms
Quadratic Forms
Wishart Distribution (Multivariate Chi-Square)
Maximum Root (GCR) Distribution
Matrix Normal Distribution

5. The Multivariate General Linear Model (GLM)

The General Linear Hypothesis
Marginal / Joint / Simultaneous tests and confidence sets.
Structured Covariance (MIXED)

6. Multivariate Prediction and Classification

Discriminant Analysis (DISCRIM, CANDISC, STEPDISC)
Multinomial Logistic Regression (IML, CATMOD, BMDPPR)
Classification and Regression Trees (CART)
Neural Nets

7. Multidimensional Scaling

Factor Analysis (FACTOR)
Nonmetric Scaling (MDS)

8. Uncovering Latent Structure

Systems of Linear Equations (CALIS)
Cluster Analysis (ACECLUS, FASTCLUS, CLUSTER)
Correspondence Analysis (CORRESP)
Factor Analysis

9. Coping with Missing Data (BMDPAM, BMDP8D)

Missing at Random
EM Algorithm (IML)

10. Smoothers and Splines (TRANSREG)

11. Discrete Multivariate Distributions

Product of Poisson
Product of Multinomial
Multivariate Hypergeometric

12. Multivariate Categorical Data Analysis (BMDP4F, HILOGLINEAR)