A similar experimental design will be used: test specimens will be mixed using different combinations of Aggregate type and Worker, in a completely randomized design. However, the purpose of the experiment has now changed somewhat. As in the earlier example, we want to compare the Aggregate-type means; but beyond that, we want to learn something about how much variation in tensile strength is attributable to worker variations, and also how much variation is attributable to interactions between workers and the aggregate type in use (some workers may be better at mixing one aggregate than the other). Note that we now have the option of using any number of workers, since we are no longer locked into considering just four particular kneading methods.
Making columns random is equivalent to considering an experiment with 4 randomly chosen workers. We are also changing the inference space for the experiment: instead of making conclusions about four particular workers, we now have a random sample of workers drawn from some population, and we want the conclusions to apply to that whole population of workers.
Expected mean squares rows <fixed> 2 levels, 1 df EMS = 12*Var{rows} + 3*Var{rows*columns} + Var{Within} Denom = MS{rows*columns} columns <random> 4 levels, 3 df EMS = 6*Var{columns} + 3*Var{rows*columns} + Var{Within} Denom = MS{rows*columns} rows*columns <random> 3 df EMS = 3*Var{rows*columns} + Var{Within} Denom = MS{Within} Within <random> 3 levels, 16 df EMS = Var{Within} Denom =These expected mean squares
Denom = MS{rows*columns}
in the
rows
section. What this says is that the F test for rows uses
the mean square for rows*columns in the denominator, compared to the
fixed-effects case where the Within MS is used as the denominator.
(You can check this if you like: Make columns fixed and get a new display of
the expected mean squares.)
The intuition is as follows: if we want to generalize our comparison of Aggregate types (rows) to all workers, then we have to understand that we have data on only 4 workers who (we presume) have been randomly sampled from the population of all workers; the 3 observations in each cell of the design are thus not true replicates; they are dependent subsamples.
Since columns is a random effect, we should enter an error SD for columns that reflects the kind of worker-to-worker variations that we would want to be able to detect. Try changing this value; you'll notice that the power of the test of columns changes, but nothing else does.
The interaction is now a mixed effect--an interaction between a fixed and a random effect. Mixed effects are really not that different from random effects, in that they are sources of random variation. Since the interaction is that of workers and aggregate type, its effect SD should quantify how much variation is due to the possibility that the aggregate-type comparison may vary from one worker to the next. Try varying this SD, and notice that these changes affect the powers of all three tests. That's because the interaction also serves as the error term for the tests of the main effects (recall the EMS table).
Suppose, for instance, that we want to be able to detect a situation where the worker-to-worker SD is 3.5 psi (roughly the same as the error SD). Setting the interaction SD to zero, and using 3 specimens per worker per aggregate with 8 workers, we obtain the following results (output of the ``Show Report'' menu item):
Factor levels rows 2 fixed columns 8 random Within 3 random Term df StdDev Power rows 1 5.660 1.000 columns 7 3.500 0.775 rows*columns 7 0 .0500 Within 32 3.540 Alpha = 0.05This combination gives us much more than adequate power for our Aggregate-type goals (rows), and reasonable power to detect an SD of 3.5 among workers (columns).
Now, dragging the mouse on the ``rows*columns'' bar, we can find the SD that is detactable with 80% power:
Term df StdDev Power rows 1 5.660 0.895 columns 7 3.500 0.273 rows*columns 7 3.732 0.802 Within 32 3.540We find that the experiment will be good enough to detect an interaction variation of roughly the same size as the target of 3.5 that we had set for the variation among workers. Further mouse work yields
Term df StdDev Power rows 1 5.660 0.806 columns 7 3.500 0.216 rows*columns 7 4.405 0.889 Within 32 3.540showing that even if the interaction SD is as large as 4.4, we still have a big enough experiment to detect the desired Aggregate-type effect.
In conclusion, using 8 workers and 3 reps gives us a pretty good experiment for the goals we had in mind. It enables us to detect cases where one of the two sources of variation associated with workers is as large as the experimental error; and it provides the necessary power for comparing aggregate types except when there is very large interaction variation--which in itself would be an important result from the standpoint of process robustness.
You can try other possibilities; for example, using 12 workers and 2 reps (still a total of 48 observations) will increase the power of the main-effects tests but decrease the power of the interaction test. The choice depends on the relative importances of the various objectives.