You want to calculate the running total of values in a column. Suppose, you have a employees table and you want to calculate the running total of all the employees salaries. A running total is the summation of a sequence of numbers which is updated each time a new number is added to the sequence, by adding the value of the new number to the previous running total.
To calculate a running total in SQL, we use the window function SUM OVER.
SELECT name, gender, salary, SUM(salary) OVER(ORDER BY salary, id) as running_total FROM employees
One thing to note, if you look at the above query, you can see that in the ORDER BY clause, I have also included the Primary key id to sort the data. The reason for this is that when we have duplicate values in a column, using only the salary column in the ORDER By clause will cause wrong calculation of running total. Sorting the data by both salary and the primary key id make sure that we get the current values for running total as any combination of salary and id is unique. Let’s look at one example to clear things.
SELECT name, gender, salary, SUM(salary) OVER(ORDER BY salary, id) as running_total1, SUM(salary) OVER(ORDER BY salary) as running_total2 FROM employees
If you look at the running total 2 calculation, you can see that for Tom and Brad the calculations are wrong as both have same salaries of $4000. But if you look at the running total 1 calculations, you can see the values are correct. The second row should have the values of 7000 ( 3000(Jerry) + 4000(Tom)) but for this row in the running total 2 the values is $11000, which is not correct. We can see this problem again in the running total 2 column for Jake and Angelina as they both have the same salary.
Now , if you want to calculate the running total separately for both Male and Female then all you have to do is add the PARTITION BY clause in OVER to partition the data.
SELECT name, gender, salary, SUM(salary) OVER(PARTITION BY gender ORDER BY salary, id) as running_total FROM employees
When you partition the data by gender, first the running total of females will be calculated then the running total of Males will be calculated from the beginning as we are partitioning the data first, then applying the window function to calculate the running total.
If you like this post then please share it with others and subscribe to our blog below to learn data science.