Statistical hypothesis testing is an essential process in data analysis that helps in making decisions based on data. One such non-parametric method used for performing hypothesis testing on independent samples is the Kruskal-Wallis test. This article is a comprehensive guide on how to execute a Kruskal-Wallis test in R.
Understanding the Kruskal-Wallis Test
The Kruskal-Wallis test, named after William Kruskal and W. Allen Wallis, is a non-parametric method used for testing whether samples originate from the same distribution. It is a rank-based nonparametric test that can be used to determine if there are statistically significant differences between two or more groups of an independent variable on a continuous or ordinal dependent variable.
It’s an extension of the Mann-Whitney U test to more than two groups. The Kruskal-Wallis test doesn’t assume a normal distribution of the residuals, unlike the One-way ANOVA, and instead assumes that all groups have the same shape distribution.
The Kruskal-Wallis test works by ranking the data from all groups together, then comparing the sum of ranks for each group. If the sums of ranks between groups are significantly different, then it suggests that the groups are different.
Kruskal-Wallis Test in R
R provides the
kruskal.test() function as a built-in function to perform the Kruskal-Wallis test.
The basic syntax to perform Kruskal-Wallis test in R is as follows:
formula is a formula object, with the response on the left of a
~ operator, and the group on the right, and
data is a data frame containing the variables specified in the formula.
Let’s illustrate this with an example:
# Defining the data data1 <- c(2.9, 3.0, 2.5, 2.6, 3.2) # group 1 data data2 <- c(3.1, 3.3, 3.4, 2.8, 3.5) # group 2 data data3 <- c(3.6, 3.8, 3.4, 3.7, 3.6) # group 3 data # Combine all the data to a data frame df <- data.frame( values = c(data1, data2, data3), group = factor(rep(c("Group1", "Group2", "Group3"), each = 5)) ) # Perform the Kruskal-Wallis test result <- kruskal.test(values ~ group, data = df) # Print the result print(result)
In this code,
values are the response variable, and
group is the group variable. The
~ symbol is used to indicate that
values are modeled as a function of
group. The data frame
df contains the variables specified in the formula.
kruskal.test() function calculates the Kruskal-Wallis rank sum statistic, degrees of freedom, and the p-value of the test.
The result of the
kruskal.test() function is an object of class “htest” that contains the following components:
statistic: the value of the Kruskal-Wallis rank sum statistic.
parameter: the degrees of freedom of the approximate chi-squared distribution of the test statistic.
p.value: the p-value of the test.
method: a character string indicating the name of the test.
data.name: a character string giving the names of the data.
The p-value is the probability of getting a test statistic as extreme as, or more extreme than, the observed statistic under the null hypothesis. If the p-value is less than the significance level (usually 0.05), you reject the null hypothesis.
If you find a significant result with Kruskal-Wallis test, you might want to explore further and find out which groups are different. You can do this using post-hoc tests.
A common choice for post-hoc analysis after Kruskal-Wallis test is the Dunn test. It compares the difference in the sum of ranks between two groups to the expected difference under the null hypothesis.
dunn.test() function in R performs the Dunn post-hoc test. It is part of the
dunn.test package. You need to install and load the package to use the function.
# Install the package install.packages("dunn.test") # Load the package library(dunn.test) # Perform the Dunn post-hoc test result_posthoc <- dunn.test(df$values, df$group, method = "bonferroni") # Print the result print(result_posthoc)
The Kruskal-Wallis test is a non-parametric method used to determine if there are statistically significant differences between two or more groups of an independent variable. Its primary advantage is that it doesn’t require the assumption of normal distributions and equal variances across the groups, unlike the ANOVA.
R provides the
kruskal.test() function to perform the Kruskal-Wallis test, which is simple and easy to use. Always remember to interpret the results appropriately and consider performing post-hoc analysis if needed to further analyze your data.