Amazon SQL interview questions – Pivot Table with Text data.

Problem Description –

This question is from Leetcode 618 – Students Report By Geography.

A U.S graduate school has students from Asia, Europe, and America. The students’ location information is stored in the table student below.

Pivot the continent column in this table so that each name is sorted alphabetically and displayed underneath its corresponding continent. The output headers should be America, Asia, and Europe respectively. It is guaranteed that the student number from America is no less than either Asia or Europe.

For the sample input, the output is:

Difficulty Level – Hard

Problem Link – https://leetcode.com/problems/students-report-by-geography/

Solution –

SELECT
    MAX(CASE WHEN continent = 'America' THEN name END) AS America,
    MAX(CASE WHEN continent = 'Asia' THEN name END) AS Asia,
    MAX(CASE WHEN continent = 'Europe' THEN name END) AS Europe
FROM (
    SELECT 
        continent,
        name,
        ROW_NUMBER() OVER(PARTITION BY continent ORDER BY name) as rn
    FROM student
) x
GROUP BY rn

Let’s first look at the output of the CASE statement logic without the MAX and ROW_NUMBER function.

SELECT
    CASE WHEN continent = 'America' THEN name END AS America,
    CASE WHEN continent = 'Asia' THEN name END AS Asia,
    CASE WHEN continent = 'Europe' THEN name END AS Europe
FROM student

We almost have the desired result that we want. But we have to somehow remove those null values without losing the data. It’s much easier to deal with when we have numerical data, we can use various aggregate functions like SUM, MIN, MAX, COUNT, etc to get the desired result based on our situation. But it is a little difficult with the TEXT data.

Let’s first try the MAX function as we have used this function in our solution.

SELECT
    MAX(CASE WHEN continent = 'America' THEN name END) AS America,
    MAX(CASE WHEN continent = 'Asia' THEN name END) AS Asia,
    MAX(CASE WHEN continent = 'Europe' THEN name END) AS Europe
FROM student

We can see that it only return a single row of data. The data of jack is deleted from the result.

And if we use the SUM, then it will return 0 as the data is text. So, other aggregate functions are also not going to work.

SELECT
    SUM(CASE WHEN continent = 'America' THEN name END) AS America,
    SUM(CASE WHEN continent = 'Asia' THEN name END) AS Asia,
    SUM(CASE WHEN continent = 'Europe' THEN name END) AS Europe
FROM student
ORDER BY 1 DESC

To get the desired result, we have to use the ROW_NUMBER function and GROUP the data by the row numbers. The row number function will create unique values for all the rows and when we use the MAX function, we will not lose any value and get the right result.

SELECT 
	continent,
	name,
	ROW_NUMBER() OVER(PARTITION BY continent ORDER BY name) as rn
FROM student

We can see that for America, we have two unique row numbers, and now when we group the data by these unique numbers and apply the max function, for the first group where rn=1, the result will be Jack, and for the second group where rn=2, the max value will be jane. Similarly, we will get the correct result from Asia and Europe also.

Rating: 1 out of 5.

Leave a Reply