Next: Theory Up: Help for ANOVA Power Previous: Determining effect sizes Fixed

Subsections

Effect size--random and mixed effects

The example--modified

Consider a situation similar to the concrete example that we used to illustrate fixed effects; but suppose now that, instead of varying the kneading method, our experiment will involve several different workers who mix the concrete, all using the same kneading method. Since each worker has his own technique, there could be some variation in setting times due to these variations.

A similar experimental design will be used: test specimens will be mixed using different combinations of Aggregate type and Worker, in a completely randomized design. However, the purpose of the experiment has now changed somewhat. As in the earlier example, we want to compare the Aggregate-type means; but beyond that, we want to learn something about how much variation in tensile strength is attributable to worker variations, and also how much variation is attributable to interactions between workers and the aggregate type in use (some workers may be better at mixing one aggregate than the other). Note that we now have the option of using any number of workers, since we are no longer locked into considering just four particular kneading methods.

Random effects and inference space

To understand that there is a real difference between fixed and random effects, try running the live demo described above, and enter the effect SDs and levels as in the illustration. Now, use the radio button to set ``column'' to have random levels, and watch what happens--pretty dramatic, wouldn't you say?

Making columns random is equivalent to considering an experiment with 4 randomly chosen workers. We are also changing the inference space for the experiment: instead of making conclusions about four particular workers, we now have a random sample of workers drawn from some population, and we want the conclusions to apply to that whole population of workers.

Expected mean squares

To understand how this change in inference space is manifested in the ANOVA model, go to the Options menu, and select ``Show EMS.'' This produces the following output:

    Expected mean squares

    rows <fixed> 2 levels, 1 df
      EMS = 12*Var{rows} + 3*Var{rows*columns} + Var{Within}
      Denom = MS{rows*columns}

    columns <random> 4 levels, 3 df
      EMS = 6*Var{columns} + 3*Var{rows*columns} + Var{Within}
      Denom = MS{rows*columns}

    rows*columns <random> 3 df
      EMS = 3*Var{rows*columns} + Var{Within}
      Denom = MS{Within}

    Within <random> 3 levels, 16 df
      EMS = Var{Within}
      Denom =

These expected mean squares are the basis for the power calculations. Note in particular the line Denom = MS{rows*columns} in the rows section. What this says is that the F test for rows uses the mean square for rows*columns in the denominator, compared to the fixed-effects case where the Within MS is used as the denominator. (You can check this if you like: Make columns fixed and get a new display of the expected mean squares.)

The intuition is as follows: if we want to generalize our comparison of Aggregate types (rows) to all workers, then we have to understand that we have data on only 4 workers who (we presume) have been randomly sampled from the population of all workers; the 3 observations in each cell of the design are thus not true replicates; they are dependent subsamples.

Increasing the power

OK, well, the power for testing rows is way low. Let's see if we can make it higher by increasing the number of replications. Try increasing the ``within'' replications to 5; to 10; to 100; to 1000--are you getting anywhere? Not really, because we still have only a small number sample size of worker. Now go back to 3 reps and increase the number of workers instead. Try 10 workers; now with 10 workers, reduce the number of reps to 2; to 1; how do things look?

Effect SDs reconsidered

But now let's get realistic. You have been tricked into trying to solve a problem that is probably not as bad as it looks. In particular, there is no basis for the effect SDs that are entered in the ``columns'' and ``rows*columns'' boxes. That's because these effect sizes were based on hypothetical effects used in planning a fixed-effects experiment. Now that columns are random, we need to think differently about the entries in these boxes.

Since columns is a random effect, we should enter an error SD for columns that reflects the kind of worker-to-worker variations that we would want to be able to detect. Try changing this value; you'll notice that the power of the test of columns changes, but nothing else does.

The interaction is now a mixed effect--an interaction between a fixed and a random effect. Mixed effects are really not that different from random effects, in that they are sources of random variation. Since the interaction is that of workers and aggregate type, its effect SD should quantify how much variation is due to the possibility that the aggregate-type comparison may vary from one worker to the next. Try varying this SD, and notice that these changes affect the powers of all three tests. That's because the interaction also serves as the error term for the tests of the main effects (recall the EMS table).

A power analysis

So we have two sources of variation associated with workers--the inherent variation among them and the variation due to how they interact with Aggregate type. One might hope that the latter variation is negligible; and in fact this may be a good starting point in the plan.

Suppose, for instance, that we want to be able to detect a situation where the worker-to-worker SD is 3.5 psi (roughly the same as the error SD). Setting the interaction SD to zero, and using 3 specimens per worker per aggregate with 8 workers, we obtain the following results (output of the ``Show Report'' menu item):

    Factor        	levels
    rows          	  2	fixed
    columns       	  8	random
    Within        	  3	random

    Term                 	df	StdDev	Power
    rows                  	1	5.660	1.000
    columns               	7	3.500	0.775
    rows*columns          	7	0	.0500
    Within                	32	3.540

    Alpha = 0.05

This combination gives us much more than adequate power for our Aggregate-type goals (rows), and reasonable power to detect an SD of 3.5 among workers (columns).

Now, dragging the mouse on the ``rows*columns'' bar, we can find the SD that is detactable with 80% power:

    Term                 	df	StdDev	Power
    rows                  	1	5.660	0.895
    columns               	7	3.500	0.273
    rows*columns          	7	3.732	0.802
    Within                	32	3.540

We find that the experiment will be good enough to detect an interaction variation of roughly the same size as the target of 3.5 that we had set for the variation among workers. Further mouse work yields

    Term                 	df	StdDev	Power
    rows                  	1	5.660	0.806
    columns               	7	3.500	0.216
    rows*columns          	7	4.405	0.889
    Within                	32	3.540

showing that even if the interaction SD is as large as 4.4, we still have a big enough experiment to detect the desired Aggregate-type effect.

In conclusion, using 8 workers and 3 reps gives us a pretty good experiment for the goals we had in mind. It enables us to detect cases where one of the two sources of variation associated with workers is as large as the experimental error; and it provides the necessary power for comparing aggregate types except when there is very large interaction variation--which in itself would be an important result from the standpoint of process robustness.

You can try other possibilities; for example, using 12 workers and 2 reps (still a total of 48 observations) will increase the power of the main-effects tests but decrease the power of the interaction test. The choice depends on the relative importances of the various objectives.

Next: Theory Up: Help for ANOVA Power Previous: Determining effect sizes Fixed

Russ Lenth
6/3/1998