Copyright 1997, 1998, Jeff Miller.
This program and documentation may be duplicated and used without charge for any educational or noncommercial purposes. Use AnoGen at your own risk. I believe it to be correct, but cannot guarantee the accuracy of the calculations. If you do use this program in your teaching, please send me an acknowledgement letter, once a year or so, saying how you use it (see sample at end of this documentation). If I get enough such letters, I'll put this piece of software on my vita and maybe get some credit for it with the university. Besides, both my kids collect stamps.
For commercial use, please contact the author.
This program was designed for use in teaching the statistical procedure known as Analysis of Variance (ANOVA). In brief, it generates appropriate data sets for use as examples or practice problems, and it computes the correct ANOVA for each data set. It handles between-subjects, within-subjects, and mixed designs, and can go up to six factors, with some restrictions on the numbers of levels, subjects, and model terms.
The program can be run in either of two modes: one designed for use by students; the other, by teachers.
The student mode is simpler: The student simply specifies the experimental design, and AnoGen generates an appropriate random data set. The student can then view the data set and answers, and save them to a file. Thus, students can fairly simply get as much computational practice as they want.
The teacher mode is more complicated: The teacher not only specifies the experimental design, but also controls the the cell means and error variance to obtain whatever F values are desired for the example. Considerable familiarity with ANOVA is needed to use this mode.
Note that you can set these numbers in any order, and you can change each one as often as you like. After you have the settings you want, type ctrl-Q to move on to the next step.
Table shows an example of a problem display. There is one line per subject, and the different groups correspond to the different levels of the between-subjects factor(s). For this example, the problem display fits on a single screen; with larger designs (i.e., more groups or more subjects per group), the problem display may be split across several screens.
Group A1B1: | |
Sub 1: | 95 |
Sub 2: | 78 |
Sub 3: | 97 |
Group A2B1: | |
Sub 1: | -19 |
Sub 2: | -37 |
Sub 3: | -10 |
Group A1B2: | |
Sub 1: | 55 |
Sub 2: | 64 |
Sub 3: | 73 |
Group A2B2: | |
Sub 1: | 58 |
Sub 2: | 63 |
Sub 3: | 71 |
Table shows an example of a problem display for a more complex experimental design. Note that the different conditions tested within-subjects are listed across the line, and the different subjects and groups organized as in the between-subjects design.
Group B1C1: | ||
A1 | A2 | |
Sub 1: | 77 | 53 |
Sub 2: | 84 | 56 |
Sub 3: | 103 | 41 |
Group B2C1: | ||
A1 | A2 | |
Sub 1: | 77 | 65 |
Sub 2: | 54 | 64 |
Sub 3: | 73 | 69 |
Group B1C2: | ||
A1 | A2 | |
Sub 1: | 103 | 75 |
Sub 2: | 100 | 78 |
Sub 3: | 97 | 57 |
Group B2C2: | ||
A1 | A2 | |
Sub 1: | 72 | 10 |
Sub 2: | 74 | 18 |
Sub 3: | 58 | 2 |
The solution display has several components, as described below. Some of these components may be omitted if they do not fit well with the way your instructor teaches the material.
Cell: | Mean |
u | 65 |
A1 | 81 |
A2 | 49 |
B1 | 77 |
B2 | 53 |
A1 B1 | 94 |
A1 B2 | 68 |
A2 B1 | 60 |
A2 B2 | 38 |
C1 | 68 |
C2 | 62 |
A1 C1 | 78 |
... |
This section provides a brief review of the general linear model for ANOVA, intended for teachers who already have some background in this area. Besides refreshing the relevant concepts, this section is also intended to give some hints on how to generate desired patterns of data and to clarify the notation used in the program.
The model underlying ANOVA assumes that each data value (Y) is a sum of additive components due to main effects, interactions, and error. For example, the model for a two-factor between-subjects design is often written as
|
From a given set of data, ANOVA (implicitly or explicitly) estimates the numerical values of all the terms in the model. For example, these cell means for a 2×2 design:
Factor A | ||
Factor B | Level 1 | Level 2 |
Level 1 | 33 | 21 |
Level 2 | 35 | 31 |
yield these estimates for the model's parameters (notation: estimates are written with ``hats'' above them).
|
|
Factor A | ||
Factor B | Level 1 | Level 2 |
Level 1 | 33 = 30 + 4 - 3 + 2 | 21 = 30 - 4 - 3 - 2 |
Level 2 | 35 = 30 + 4 + 3 - 2 | 31 = 30 - 4 + 3 + 2 |
Individual observations are formed by these same sums, plus the random error term.
It takes a bit of trial and error to generate a desired pattern of cell means by specifying the terms in the model, but this gets easier with a little practice. The basic strategy is to think not in terms of individual cell means but rather in terms of the parameters of the model. Looking at the four cell means for the example above, you must think in terms like these:
The notation for error terms always involves the letter S (for subject). By convention, the program lists the between-subjects factors in parentheses after the S as part of the subjects term. For example, the error term for the example two-factor design would be S(AB)ijk, because factors A and B are both between-subjects factors. In repeated-measures designs, there are also various ``subject by treatment'' interaction error terms, and these are denoted as interactions with the treatment factor(s) listed before the S. For example, an ASij term would be a subject by treatment (A) interaction term for a one-factor repeated measures design. In mixed designs the between-subjects factors are always carried along as part of the S term. For example, with factor A a within-subjects factor and factor B a between-subjects factor, the error term for factor A is the AS(B)ijk interaction term.
Teacher mode is more complicated to use, and these instructions are not step-by-step. It is assumed that you have already used and understand the Student mode, and that you have some familiarity with the general linear model approach to ANOVA. (Section 5 provides a brief review for those who wish a refresher on this approach to ANOVA.)
To begin, start the program by typing AnoGen, and type T to enter Teacher mode. Next specify the design and the number of levels of each factor just as in the student mode.
Specification of Terms in Linear Model: Page 1 of 1 Num Source MS F ET Estimates ---- ------ -------- ------- ----- -------------------------> 0: u 0.0 1000.00*** S 0 1: A 0.0 1000.00*** AS 0 2: B 0.0 1000.00*** BS 0 0 3: AB 0.0 1000.00*** ABS 0 0 4: S 0.0 random 5: AS 0.0 random 6: BS 0.0 random 7: ABS 0.0 random Type green number to change estimate(s) for corresponding source, or type C, N, or P to view Cell means, Next page, or Previous page, or type ^Q to proceed:
Note: The numbers in the ``Num'' column at the far left should appear in green.
Each line numbered 0-7 corresponds to one ANOVA source (i.e., one term in the linear model), as identified in the column labelled ``Source.'' The current numerical value(s) corresponding to that term in the model are listed in the ``Estimates'' column, and the current mean square and F for that term are listed in the MS and F columns. The F value is marked with one asterisk if it is significant at the level of p < .05 and two asterisks if significant at p < .01. The error term used in computing each F is listed in the ET column.
To change the numerical value of a model term, you type the green one-digit number next to it (in the ``Num'' column). For example, if you want to change the overall mean (u), type 0, and you can then enter a new value for u at the bottom of the screen. The new value you enter will then appear in the ``Estimates'' column, and the MS and F for that source will be updated using the new value.
Note that there are two numbers listed in the ``Estimates'' column for the sources B and AB. This is because factor B had 2 degrees of freedom; the first estimate is the value for B1, and the second is the value for B2.
Estimates are handled slightly differently for the error terms (indicated by the word ``random'' under ``Estimates''). When you type its ``Num'' to change one of these, the program will ask you for a ``Maximum'', and then it will generate random numbers from -Maximum to +Maximum to use as values for these random terms (one per df). The MS is recomputed using the randomly generated terms, and F's are updated as appropriate.
Note that you can inspect the cell means that would be obtained with the current estimates by typing C at any point in this process.
``Next page'' and ``Previous page'' come into play when the model has more than 10 sources, so the terms are listed across multiple screens.
When you are happy with the cell means and F's, type ctrl-Q to continue on to the next phase.
After the data have been generated, you can look at the results and save them to disk. (As in student mode, printing is accomplished by saving output to a file and then printing that file from outside AnoGen.) Here, you have more flexibility than the student. The following display lists your possible actions at each point:
which of the following actions do you want? ------------------------------------------- write Raw data write Cell means write Anova table write Factor list write Model write Estimation equations write Decomposition matrix write effecTs codes write Notepad line write Input file set Options to select an action, type its capitalized letter, or ^Q to proceed :
Select the desired action by typing the capital letter in its description (e.g., ``F'' to display the factor list).
Seven of the write actions (i.e., Raw data, Cell means, Anova table, Factor list, Model, Estimation equations, Decomposition matrix, simply display the same sorts of information described in connection with the student mode.
Three rarely used actions are:
ftp://garbo.uwasa.fi/pc/stat/Mrf1_0.zip http://www.simtel.net/pub/simtelnet/msdos/statstcs/Mrf1_0.zip ftp://ftp.simtel.net/pub/simtelnet/msdos/statstcs/Mrf1_0.zip
In teacher mode, you can also set several program options, as listed here:
OUTPUT OPTION: CURRENT SETTING: -------------- ---------------- P = decimal Places : 0 F = disk File : Closed D = output Destination : Screen W = Widths of terms : variable O = Output format : ASCII Type capital letter to select option to be changed, or ^Q to proceed:
When you choose the P option, you can type in a number 0, 1, 2, 3, ... to indicate how many decimal places your data values should have. Then, the decimal point is shifted that many places to the left in all of the data values previously generated. For example, if you specified a mean of m = 150 when you were constructing the data and then use the P option to indicate that you want two decimal places, then the new mean after decimal point adjustment will be m = 1.50. The sums of squares will be adjusted accordingly.
\newcommand{\anovaheader}{Source & $df$ & $SS$ & $MS$ & $F$ & Error Term \\ } \newcommand{\startanovatable}{ \begin{center} \begin{tabular}{|lrrrrc|} \hline \anovaheader \hline } \newcommand{\stopanovatable}{ \hline \end{tabular} \end{center} }
When you are done viewing the generated data, you have several options described in this screen:
Restart from: 1: Numbers of factors and subjects. 2: Numbers of levels of each factor. 3: Estimates of variance sources. 4: More output of same data. or ^Q to exit program :
You can actually quit the program with control-Q, or you can restart it from various points if you want to do some more work, as follows:
There are a few AnoGen options that I expect any given user will always want to set the same way. To facilitate this, these options are set on the command line or in an environment variable. As illustrated below, once you determine how you would like the options to be set, you can set them and forget them in either of two ways as described in detail below.
C:\STATS> AnoGen CORRECTED-
To simplify the output for your students, you can turn off any combination of these three types of solution output by specifying the desired combination of these three parameters on the command line:
C:\STATS> AnoGen MODEL- ESTEQ- DECOMP-
These parameters can be specified in any combination and any order, with or without the option CORRECTED-, and they are not case sensitive.
AnoGen corrected- esteq- decomp-
Then, at your command prompt you would type MyAnoGen instead of AnoGen, and AnoGen would be started with the appropriate settings.
Alternatively, you can use an environment variable to invoke your preferred options. Inside your autoexec.bat file, include a line like
SET ANOGEN= corrected- esteq- decomp-
to select the options you like. Then, you can just invoke anogen at the command line without any parameters, and these parameters will be taken as if they had been typed at the command line.
If both environment variables and command-line parameters are used, the latter take precedence.
I don't want much, just some feedback on who is using AnoGen and what they are using it for. Something like the letter shown below would be fabulous. I'd prefer a real signed letter on paper, but acknowledgements by email would be better than nothing (email address: miller@otago.ac.nz). Of course I would also welcome bug reports and suggestions for improvement, too, although I can't promise any fast action on those. Don't forget what you paid for this!
Prof Jeff Miller
Department of Psychology
Univ of Otago
Dunedin, New Zealand
Dear Prof Miller,
This is to acknowledge that I have used the computer program AnoGen for teaching/studying the analysis of variance during the past year.
Include all of the following that apply, and any other uses that I haven't thought of: I have used it for generating practice problems, homework assignments, exam questions, for myself, for my class of 50 upper-division Psychology students at the University of ...
Sincerely,
etc