
Problem –
You want to calculate basic statistics – mean, median, standard deviation, variance, correlation, or covariance.
Solution –
Let’s create some vectors.
x <- c(5, 8, 2, 4, 9, 10, 3, 2)
y <- log(x + 1)
Now, let’s calculate mean, median, standard deviation and variance.
> # calculate mean
> mean(x)
[1] 5.375
>
> # calculate median
> median(x)
[1] 4.5
>
> # calculate standard deviation
> sd(x)
[1] 3.20435
>
> # calculate variance
> var(x)
[1] 10.26786
The cor and cov functions can calculate the correlation and covariance respectively between two vectors.
> # calculate correlation
> cor(x, y)
[1] 0.9869806
>
> # calculate covariance
> cov(x, y)
[1] 1.65884
If you have null values or NA in a vector then R will return NA or throw an error.
> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x)
[1] NA
You can avoid this by setting na.rm = TRUE, which tells R to ignore the NA values.
> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x, na.rm = TRUE)
[1] 5.375
If you are working with a dataframe then you need a helper functions to calculate mean, median and standard deviation. The tidyverse family of helper functions for this sort of things is in the purrr package. As with other tidyverse packages, this gets loaded when you run library(tidyverse). The function we’ll use to apply a function to each column of a dataframe is map_dbl.
> library(tidyverse)
> data(cars)
>
> # calculate mean
> map_dbl(cars, mean)
speed dist
15.40 42.98
>
> # calculate median
> map_dbl(cars, median)
speed dist
15 36
>
> # calculate standard deviation
> map_dbl(cars, sd)
speed dist
5.287644 25.769377
Notice that in this example each operation returns two values, one for each columns in the dataframe.
The var, cor, and cov functions understands dataframes without the help of a mapping functions. We can directly apply them
> data(cars)
>
> # calculate variance
> var(cars)
speed dist
speed 27.95918 109.9469
dist 109.94694 664.0608
>
> # calculate correlation
> cor(cars)
speed dist
speed 1.0000000 0.8068949
dist 0.8068949 1.0000000
>
> # calculate covariance
> cov(cars)
speed dist
speed 27.95918 109.9469
dist 109.94694 664.0608