How to Compute Basic Statistics in R?

Spread the love

Problem –

You want to calculate basic statistics – mean, median, standard deviation, variance, correlation, or covariance.

Solution –

Let’s create some vectors.

x <- c(5, 8, 2, 4, 9, 10, 3, 2)
y <- log(x + 1)

Now, let’s calculate mean, median, standard deviation and variance.

> # calculate mean
> mean(x)
[1] 5.375
> 
> # calculate median
> median(x)
[1] 4.5
> 
> # calculate standard deviation
> sd(x)
[1] 3.20435
> 
> # calculate variance
> var(x)
[1] 10.26786

The cor and cov functions can calculate the correlation and covariance respectively between two vectors.

> # calculate correlation
> cor(x, y)
[1] 0.9869806
> 
> # calculate covariance
> cov(x, y)
[1] 1.65884

If you have null values or NA in a vector then R will return NA or throw an error.

> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x)
[1] NA

You can avoid this by setting na.rm = TRUE, which tells R to ignore the NA values.

> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x, na.rm = TRUE)
[1] 5.375

If you are working with a dataframe then you need a helper functions to calculate mean, median and standard deviation. The tidyverse family of helper functions for this sort of things is in the purrr package. As with other tidyverse packages, this gets loaded when you run library(tidyverse). The function we’ll use to apply a function to each column of a dataframe is map_dbl.

> library(tidyverse)
> data(cars)
> 
> # calculate mean
> map_dbl(cars, mean)
speed  dist 
15.40 42.98 
> 
> # calculate median
> map_dbl(cars, median)
speed  dist 
   15    36 
> 
> # calculate standard deviation
> map_dbl(cars, sd)
    speed      dist 
 5.287644 25.769377 

Notice that in this example each operation returns two values, one for each columns in the dataframe.

The var, cor, and cov functions understands dataframes without the help of a mapping functions. We can directly apply them

> data(cars)
> 
> # calculate variance
> var(cars)
          speed     dist
speed  27.95918 109.9469
dist  109.94694 664.0608
> 
> # calculate correlation
> cor(cars)
          speed      dist
speed 1.0000000 0.8068949
dist  0.8068949 1.0000000
> 
> # calculate covariance
> cov(cars)
          speed     dist
speed  27.95918 109.9469
dist  109.94694 664.0608

Rating: 1 out of 5.

Posted in RTagged

Leave a Reply