You want to delete duplicate records from a table. For example, here we have a students table, and you can see that there are some duplicate data for Martha, and jack. You goal is to only keep a single record for Martha and jack and delete the duplicate data from the table. And you don’t care which record you delete, you just care about having unique students data, and delete the duplicate data.
To solve this problem, first we need to group the data by name and select the MIN id for each student. If you want you can select the max id, it does not matter, but here I will use the min id.
SELECT name, MIN(id) FROM students GROUP BY name
To make things little bit more clear, I have also added the name column above but we do not need that in our final query. Now, if you look at the above result, you can see that for each student we have a single id. Our goal is to keep the records for all the these ids and delete the records for all the other ids that is not present here, which we can easily do by using a WHERE id NOT IN in the outer subquery and to delete the records using the DELETE statement.
DELETE FROM students WHERE id NOT IN ( SELECT MIN(id) FROM students GROUP BY name )
If you are using MySQL, and run the above query this will give you an error – You can’t specify target table ‘students’ for update in FROM clause
To solve this problem, you have to apply a little variation to the above query because MySQL does not allow you to reference the same table twice in a DELETE statement.
DELETE FROM students WHERE id NOT IN ( SELECT MIN(id) FROM (SELECT id, name FROM students) tmp GROUP BY name)
After running the query, all the duplicate records has been deleted from the table and we got the result that we want.
If you like this post then please share it with others and subscribe to our blog below to learn more about SQL.