Google SQL interview Questions – Leetcode 569 – Median Employee Salary

Problem Description –

The Employee table holds all employees. The employee table has three columns: Employee Id, Company Name, and Salary.

Write a SQL query to find the median salary of each company. Bonus points if you can solve it without using any built-in SQL functions.

Difficulty Level – Hard

Solution –

WITH cte AS (
SELECT 
	id,
	company,
	Salary,
	ROW_NUMBER() OVER(PARTITION BY company ORDER BY Salary) as rn,
	COUNT(*) OVER(PARTITION BY company ) as rc 
FROM Employee
)

SELECT Id, company, salary 
FROM cte 
WHERE rn IN ( (rc+1 ) DIV 2 , (rc+2) DIV 2 )

Most of the time when the interviewers asked you to find the median, they want you to calculate it without any functions like percentile_cont , so that is what we are going to do.

Let’s first look at the common table expression. If you run the following query –

SELECT  id,
	company,
	Salary,
	ROW_NUMBER() OVER(PARTITION BY company ORDER BY Salary) as rn,
	COUNT(*) OVER(PARTITION BY company ) as rc 
FROM Employee

The output will be –

For Company A, we have six rows of data, So ROW_NUMBER() function will assign values from 1 to 6, and the COUNT() window function will produce the output of 6 for all the rows. For company B we have 6 rows and for company C we have 5 rows.

Now let’s look at the other part of the query –

SELECT Id, company, salary 
FROM cte 
WHERE rn IN ( (rc+1 ) DIV 2 , (rc+2) DIV 2 )

In the WHERE clause, we are using DIV for division instead of ‘/’ because we want an integer division. When we have an even number of rows like 6 in the case of companies A and B. This expression will evaluate to –

SELECT (6+1) DIV 2 first_div,
	   (6+2) DIV 2 second_div

The first division will result in 3 and the second one is 4, which is what we need. The median will be the average of the 3rd and the 4th-row value.

And when we will have odd numbers, then the expression will evaluate to –

SELECT (5+1) DIV 2 first_div,
	   (5+2) DIV 2 second_div

For both the division we will get 3. If we have 5 numbers then the median will be the number in the 3rd row given the data is sorted from lowest to highest.

So, this expression takes care of both the scenario of even and odd rows of data.

Rating: 1 out of 5.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s