Friday, May 12, 2017

two group mean difference, anova

A t-test may examine gender differences in average salary or racial (white versus black) differences in average annual income.
Dependent variable--- Interval or ratio variable -- e.g., Likert
Independent variable---Binary variable with only two groups(categorical)-- male vs female -- classify groups (0 or 1).

the numbers of observations across groups are not necessarily equal.

While the t-test is limited to comparing means of two groups, one-way ANOVA can compare more than two groups. Therefore, the t-test is considered a special case of one-way ANOVA.

The one-way analysis of variance (ANOVA) is used to determine whether there are any statistically significant differences between the means of two or more independent (unrelated) groups (although you tend to only see it used when there are a minimum of three, rather than two groups).

For example, you could use a one-way ANOVA to understand whether exam performance differed based on test anxiety levels amongst students, dividing students into three independent groups (e.g., low, medium and high-stressed students). Also, it is important to realize that the one-way ANOVA is an omnibus test statistic and cannot tell you which specific groups were statistically significantly different from each other; it only tells you that at least two groups were different. Since you may have three, four, five or more groups in your study design, determining which of these groups differ from each other is important. You can do this using a post hoc test

The t-test assumes that samples are randomly drawn from normally distributed populations with unknown population means. Otherwise, their means are no longer the best measures of central tendency and the t-test will not be valid. The Central Limit Theorem says, however, that the distributions of y1^ and  y2^ are approximately normal when N is large. When  n1 + n2 ≥ 30, in practice, you do not need to worry too much about the normality assumption.

You may numerically test the normality assumption using the Shapiro-Wilk W (N<=2000), Shapiro-Francia W (N<=5000), Kolmogorov-Smirnov D (N>2000), and Jarque-Bera tests. If N is small and the null hypothesis of normality is rejected, you my try such nonparametric methods as the Kolmogorov-Smirnov test, Kruscal-Wallis test, Wilcoxon Rank-Sum Test, or Log-Rank Test, depending on the circumstances.

Analyze > Compare Means > One-Way ANOVA
3 courses: a beginner, intermediate and advanced course.
courses --the independent variable --- Factor in spss
beginners course a value of "1", the intermediate course a value of "2" and the advanced course a value of "3"---

dependent variable --- Time to complete the set problem --- Dependent List in spss

post-hoc---turkey

option --- Descriptive

ANOVA Table
This is the table that shows the output of the ANOVA analysis and whether there is a statistically significant difference between our group means. We can see that the significance value is 0.021 (i.e., p = .021), which is below 0.05. and, therefore, there is a statistically significant difference in the mean length of time to complete the spreadsheet problem between the different courses taken.
we know that there are statistically significant differences between the groups as a whole.
This is great to know, but we do not know which of the specific groups differed. Luckily, we can find this out in the Multiple Comparisons table which contains the results of the Tukey post hoc test.

Multiple Comparisons Table

We can see from the table below that there is a statistically significant difference in time to complete the problem between the group that took the beginner course and the intermediate course (p = 0.046), as well as between the beginner course and advanced course (p = 0.034). However, there were no differences between the groups that took the intermediate and advanced course (p = 0.989).

how to report
There was a statistically significant difference between groups as determined by one-way ANOVA (F(2,27) = 4.467, p = .021). A Tukey post hoc test revealed that the time to complete the problem was statistically significantly lower after taking the intermediate (23.6 ± 3.3 min, p = .046) and advanced (23.4 ± 3.2 min, p = .034) course compared to the beginners course (27.2 ± 3.0 min). There was no statistically significant difference between the intermediate and advanced groups (p = .989).