Problem Description –
Write a SQL query to delete all duplicate email entries in a table named
Person, keeping only unique emails based on its smallest Id.
For example, after running your query, the above
Person table should have the following rows:
Note: Your output is the whole
Person table after executing your sql. Use
Difficulty Level – Easy
Problem Link – https://leetcode.com/problems/delete-duplicate-emails/
WITH cte AS ( SELECT Id, EMail, ROW_NUMBER() OVER(PARTITION BY Email ORDER BY Id) as row_num FROM Person ) DELETE FROM Person WHERE Id IN (SELECT id FROM cte WHERE row_num > 1 )
Let’s start with the common table expression. If you run the following query.
WITH cte AS ( SELECT Id, EMail, ROW_NUMBER() OVER(PARTITION BY Email ORDER BY Id) as row_num FROM Person ) SELECT * FROM cte
The output will be this –
Here, we are first partitioning the data by Email, so that we get two partitions, one for bob and one for john and also ordering the data in ascending order by the Id column because we are instructed to keep only unique emails based on its smallest Id . And As bob’s email is unique, we will only have one row and a row_num of 1. But for John we have duplicate emails, so ROW_NUMBER() will assign two values 1 and 2.
Now when sql evaluates the inner query
SELECT id FROM cte WHERE row_num > 1, it will select all the id’s whose row_num is greater than 1, In this case it will only select the id 3 of john. And then the Outer delete query will delete this record from the table.