## Problem –

You want to calculate basic statistics – mean, median, standard deviation, variance, correlation, or covariance.

## Solution –

Let’s create some vectors.

```
x <- c(5, 8, 2, 4, 9, 10, 3, 2)
y <- log(x + 1)
```

Now, let’s calculate mean, median, standard deviation and variance.

```
> # calculate mean
> mean(x)
[1] 5.375
>
> # calculate median
> median(x)
[1] 4.5
>
> # calculate standard deviation
> sd(x)
[1] 3.20435
>
> # calculate variance
> var(x)
[1] 10.26786
```

The **cor** and **cov** functions can calculate the correlation and covariance respectively between two vectors.

```
> # calculate correlation
> cor(x, y)
[1] 0.9869806
>
> # calculate covariance
> cov(x, y)
[1] 1.65884
```

If you have null values or NA in a vector then R will return NA or throw an error.

```
> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x)
[1] NA
```

You can avoid this by setting na.rm = TRUE, which tells R to ignore the NA values.

```
> x <- c(5, 8, 2, 4, 9, 10, 3, 2, NA)
> mean(x, na.rm = TRUE)
[1] 5.375
```

If you are working with a dataframe then you need a helper functions to calculate mean, median and standard deviation. The **tidyverse** family of helper functions for this sort of things is in the** purrr** package. As with other tidyverse packages, this gets loaded when you run library(tidyverse). The function we’ll use to apply a function to each column of a dataframe is **map_dbl**.

```
> library(tidyverse)
> data(cars)
>
> # calculate mean
> map_dbl(cars, mean)
speed dist
15.40 42.98
>
> # calculate median
> map_dbl(cars, median)
speed dist
15 36
>
> # calculate standard deviation
> map_dbl(cars, sd)
speed dist
5.287644 25.769377
```

Notice that in this example each operation returns two values, one for each columns in the dataframe.

The var, cor, and cov functions understands dataframes without the help of a mapping functions. We can directly apply them

```
> data(cars)
>
> # calculate variance
> var(cars)
speed dist
speed 27.95918 109.9469
dist 109.94694 664.0608
>
> # calculate correlation
> cor(cars)
speed dist
speed 1.0000000 0.8068949
dist 0.8068949 1.0000000
>
> # calculate covariance
> cov(cars)
speed dist
speed 27.95918 109.9469
dist 109.94694 664.0608
```