By Seibar

2008-08-20 21:51:29 8 Comments

What is the best way to remove duplicate rows from a fairly large SQL Server table (i.e. 300,000+ rows)?

The rows, of course, will not be perfect duplicates because of the existence of the RowID identity field.


RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null


@Ankit Jindal 2020-08-06 04:01:39

For the table structure


RowID int not null identity(1,1) primary key,
Col1 varchar(20) not null,
Col2 varchar(2048) not null,
Col3 tinyint not null

The query for removing duplicates:

FROM MyTable t1
WHERE t1.RowID > t2.RowID
  AND t1.Col1 = t2.Col1
  AND t1.Col2=t2.Col2
  AND t1.Col3=t2.Col3;

I am assuming that RowID is kind of auto-increment and rest of the columns have duplicate values.

@Martin Smith 2010-09-29 14:52:32

Another possible way of doing this is


--Ensure that any immediately preceding statement is terminated with a semicolon above
WITH cte
                                       ORDER BY ( SELECT 0)) RN
         FROM   #MyTable)
WHERE  RN > 1;

I am using ORDER BY (SELECT 0) above as it is arbitrary which row to preserve in the event of a tie.

To preserve the latest one in RowID order for example you could use ORDER BY RowID DESC

Execution Plans

The execution plan for this is often simpler and more efficient than that in the accepted answer as it does not require the self join.

Execution Plans

This is not always the case however. One place where the GROUP BY solution might be preferred is situations where a hash aggregate would be chosen in preference to a stream aggregate.

The ROW_NUMBER solution will always give pretty much the same plan whereas the GROUP BY strategy is more flexible.

Execution Plans

Factors which might favour the hash aggregate approach would be

  • No useful index on the partitioning columns
  • relatively fewer groups with relatively more duplicates in each group

In extreme versions of this second case (if there are very few groups with many duplicates in each) one could also consider simply inserting the rows to keep into a new table then TRUNCATE-ing the original and copying them back to minimise logging compared to deleting a very high proportion of the rows.

@BrunoLM 2010-11-16 15:50:44

If I may add: The accepted answer doesn't work with tables that uses uniqueidentifier. This one is much simpler and works perfectly on any table. Thanks Martin.

@Julia Hayward 2011-07-12 11:11:09

This is the only solution that is workable on my large table (30M rows). Wish I could give it more than +1

@Mikael Eliasson 2011-07-19 19:55:29

This is such an awesome answer! It worked event when I had removed the old PK before I realised there where duplicates. +100

@Nick Chammas 2012-06-05 23:49:04

I suggest asking and then answering this question (with this answer) on DBA.SE. Then we can add it to our list of canonical answers.

@DDiVita 2012-09-03 13:06:48

Does anyone know how I could return the number of duplicate records in this same query while also deleting them? I believe, while using the With statement, you can only reverence the temporary cte once, correct?

@Martin Smith 2012-09-03 13:12:05

@DDiVita - If you just want to know how many rows were deleted look in the messages tab in SSMS for the rows affected message for more complicated needs look at the OUTPUT clause.

@vossad01 2013-03-04 17:39:32

Unlike the accepted answer, this also worked on a table that had no key (RowId) to compare on.

@sqladmin 2013-07-10 03:57:55

it has synonym syntax: delete t from (select ROW_NUMBER() OVER (PARTITION BY name ORDER BY (SELECT 0)) as rn from @table) t where rn > 1

@Whelkaholism 2013-08-27 12:12:55

Great solution as can also be used on tables with a compound primary key.

@Joe 2014-03-02 02:49:49

Just FYI this article on codeproject works as well‌​ate-Rows-from-a-Tabl‌​e-in-SQL-Server

@Martin Smith 2014-03-02 11:53:32

@Joe Already mentioned in this answer. Unless you are stuck on SQL Server 2000 that seems unnecessarily cumbersome and inefficient compared with ROW_NUMBER though.

@David 2015-03-24 15:31:48

This one doesn't work on all SQL server versions, on the other hand

@adam 2015-07-14 10:42:01

can one explain how does delete statement on (CTE) common table expression is able to delete the results in temporary table #MyTable?

@Martin Smith 2015-07-14 12:45:24

@adam - The same way that deleting rows from a view works. The CTE needs to meet the criteria for updatable views so that the Database Engine must be able to unambiguously trace modifications from the view definition to one base table.

@Chloe 2015-11-07 23:56:35

Here is @MartinSmith's answer for Postgres: with cte as (select id, row_number() over (partition by Col1 order by id) as rn from MyTable) delete from MyTable where id in (select id from cte where rn > 1); This will leave the lowest primary key and delete the rest (order by id).

@Martin Smith 2020-02-15 12:32:14

@David it works on versions 2005, 2008, 2008 R2, 2012, 2014, 2016, 2017, 2019. You need to go back to 2000 to find a version this does not work on.

@Suraj Kumar 2018-10-29 14:20:14

If all the columns in duplicate rows are same then below query can be used to delete the duplicate records.

SELECT DISTINCT * INTO #TemNewTable FROM #OriginalTable
INSERT INTO #OriginalTable SELECT * FROM #TemNewTable

@Shaini Sinha 2016-12-29 10:31:23

To Fetch Duplicate Rows:

name, email, COUNT(*)
name, email

To Delete the Duplicate Rows:

DELETE users 
(SELECT MIN(rowid)
FROM users
GROUP BY name, email);      

@Íhor Mé 2019-08-09 09:59:26

For MySQL users, note that first of all it has to be DELETE FROM, second, it won't work, because you can't SELECT from the same table you're DELETEing from. In MySQL this blasts off MySQL error 1093.

@Shamseer K 2015-05-19 14:35:26

I would prefer CTE for deleting duplicate rows from sql server table

strongly recommend to follow this article ::

by keeping original

SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable


without keeping original

(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)

@Selim Reza 2018-06-10 09:27:52

I think this would be helpfull. Here, ROW_NUMBER() OVER(PARTITION BY res1.Title ORDER BY res1.Id)as num has been used to differentiate the duplicate rows.

delete FROM
(SELECT res1.*,ROW_NUMBER() OVER(PARTITION BY res1.Title ORDER BY res1.Id)as num
(select * from [dbo].[tbl_countries])as res1
)as res2
WHERE res2.num > 1

@Michael 2018-06-22 14:14:10

Can you describe what makes your answer different from this one?

@Ostati 2014-11-13 16:20:35

Using CTE. The idea is to join on one or more columns that form a duplicate record and then remove whichever you like:

;with cte as (
        min(PrimaryKey) as PrimaryKey
    from dbo.DuplicatesTable 
    group by
        UniqueColumn1, UniqueColumn1
    having count(*) > 1
delete d
from dbo.DuplicatesTable d 
inner join cte on 
    d.PrimaryKey > cte.PrimaryKey and
    d.UniqueColumn1 = cte.UniqueColumn1 and 
    d.UniqueColumn2 = cte.UniqueColumn2;

@Justin R. 2016-03-16 20:32:50

I think you're missing an AND in your JOIN.

@Harikesh Yadav 2016-09-28 05:26:54

This is the easiest way to delete duplicate record

  SELECT MIN(id) FROM tblemp
   GROUP BY  title HAVING COUNT(id)>1

@crellee 2018-01-18 10:39:15

Why is anyone upvoting this? If you have more than two of the same id this WON'T work. Instead write: delete from tblemp where id not in (select min(id) from tblemp group by title)

@Jon Galloway 2008-08-20 21:53:14

There's a good article on removing duplicates on the Microsoft Support site. It's pretty conservative - they have you do everything in separate steps - but it should work well against large tables.

I've used self-joins to do this in the past, although it could probably be prettied up with a HAVING clause:

DELETE dupes
FROM MyTable dupes, MyTable fullTable
WHERE dupes.dupField = fullTable.dupField 
AND dupes.secondDupField = fullTable.secondDupField 
AND dupes.uniqueField > fullTable.uniqueField

@Drunken M 2020-02-11 22:47:16

perfect! i found this is the most efficient way to remove duplicate rows on my old mariadb version 10.1.xx. thank you!

@Marc 2020-02-21 19:12:44

Much simpler and easier to understand!

@Mark Brackett 2008-08-20 22:00:00

Assuming no nulls, you GROUP BY the unique columns, and SELECT the MIN (or MAX) RowId as the row to keep. Then, just delete everything that didn't have a row id:

   SELECT MIN(RowId) as RowId, Col1, Col2, Col3 
   FROM MyTable 
   GROUP BY Col1, Col2, Col3
) as KeepRows ON
   MyTable.RowId = KeepRows.RowId
   KeepRows.RowId IS NULL

In case you have a GUID instead of an integer, you can replace



CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

@Georg Schölly 2010-09-23 11:13:47

Would this work as well? DELETE FROM MyTable WHERE RowId NOT IN (SELECT MIN(RowId) FROM MyTable GROUP BY Col1, Col2, Col3);

@max 2010-12-25 10:06:38

Aswsome solution! Seems for PostgreSQL you need one subquery more like in

@Andriy M 2011-01-12 13:34:22

@Georg: I think it would. Your solution is shorter and clearer. Not so sure about performance, maybe it is equivalent to Mark's, but with really big tables I would probably stick to LEFT JOIN.

@Georg Schölly 2011-01-12 19:26:27

@Andriy: Isn't SQL supposed to choose the fastest possible algorithm no matter how the SQL query is structured?

@Andriy M 2011-01-13 05:52:53

@Georg: If you say so, sir. :) Honestly, it's only here on SO that I've started to take notice about such issues, like differently structured queries resulting in the same actual algorithm or quite the other way when the queries seemingly differ very slightly. From what I've learned so far, I would rather agree with you. It's just that the LEFT JOIN version seems (no more than that) to me more optimisable.

@Martin Smith 2011-01-14 09:17:50

@Andriy - In SQL Server LEFT JOIN is less efficient than NOT EXISTS… The same site also compares NOT IN vs NOT EXISTS. Out of the 3 I think NOT EXISTS performs best. All three will generate a plan with a self join though that can be avoided.

@Andriy M 2011-01-14 09:46:46

@Martin: Very interesting, thanks. And I'm going to make some tests similar to those described, only with self-derived tables, as more applicable to this here question.

@Andriy M 2011-01-14 10:23:10

@Martin, @Georg: So, I've made a small test. A big table was created and populated as described here:… Two SELECTs then were produced, one using the LEFT JOIN + WHERE IS NULL technique, the other using the NOT IN one. Then I proceeded with the execution plans, and guess what? The query costs were 18% for LEFT JOIN against 82% for NOT IN, a big surprise to me. I might have done something I shouldn't have or vice versa, which, if true, I would really like to know.

@GilaMonster 2011-07-06 07:54:40

Coming very late I know, but‌​. If the columns were nullable, NOT IN behaves differently and performs terribly. It's why I recommend NOT EXISTS.

@Will Marcouiller 2011-11-26 22:09:52

I have searched for such a simple solution for more than half hour now. I have come across solutions with DELETE TOP(n) using cursors, and no solutions is close to yours. You solution is lean and swift, and does exactly what is expected from it. Thanks for sharing such a great knowledge! =)

@Stefan Steiger 2012-08-10 11:56:17

Use CONVERT(uniqueidentifier, MAX(CONVERT(char(36), MyGuidColumn))) if you have a GUID instead of an integer.

@Steve Chambers 2013-04-17 19:23:10

Amazing how complicated this is given how common this problem must be - have worked on several projects where this kind of thing was needed. Core SQL is really crying out for a simpler way of doing this, especially given the ratings and number of comments this question and others like it have.

@Philip Kearns 2013-05-28 17:02:13

@GeorgSchölly has provided an elegant answer. I've used it on a table where a PHP bug of mine created duplicate rows.

@SysDragon 2013-09-12 11:17:14

As far as I know, RowId does not exists for Sql Server, it's an Oracle feature, and the question is tagged as sql-server. Am I right?

@Martin Smith 2013-10-06 14:12:31

@SysDragon - In this case RowId is just the name of a column. It has no special meaning. There is no direct equivalent to Oracle's RowId in SQL Server.

@levininja 2013-11-06 20:28:03

Sorry but why is DELETE MyTable FROM MyTable correct syntax? I don't see putting the table name right after the DELETE as an option in the documentation here. Sorry if this is obvious to others; I'm a newbie to SQL just trying to learn. More importantly than why does it work: what is the difference between including the name of the table there or not?

@Simon East 2014-02-19 10:34:44

@GeorgSchölly - your suggestion does not appear to work in MySQL unfortunately. I think it complains about it being a cyclical query.

@jftuga 2014-03-28 19:59:46

@GeorgSchölly: this statement also works in SQLite. Thank you!

@mOna 2014-06-24 14:21:36

@MarkBrackett: Tnx, yr query helped me too, but it just does not work for one of my big tables (about 2,000,000 rows), so I chose an index but still it takes a long time and at the end nothing happens! I don’t know if the problem is for my join or choosing index by mistake or something else?

@stef 2014-10-20 17:32:30

@Georg's solution errors: You can't specify target table 'products' for update in FROM clause

@Awena 2015-06-13 19:06:24

@GeorgSchölly, you query returns this error "#1093 - You can't specify target table MyTable for update in FROM clause "

@maulik13 2015-06-26 14:24:20

One thing to keep in mind is that if your table is active (i.e. inserting new entries all the time) then it is better to run this query with a restricted time period which ends just before the current time. Since self join might result in a mismatch if the new rows are read by outer query but not the sub-query. In this case non-duplicate rows could be deleted.

@bcody 2015-08-12 15:08:13

@Georg: for a table with very many rows, where only very few are duplicates that ought to be deleted, inverting the query to reduce the number of IN parameters can make the query much faster: DELETE FROM myTable WHERE id IN ( SELECT id FROM myTable EXCEPT (SELECT MIN(id) id FROM myTable GROUP BY col1, col2, col3));

@Mark Brackett 2015-09-20 10:58:29

@levininja - see FROM table_source (the T-SQL extension which allows FROM and JOIN in a DELETE) and FROM table_alias (the FROM is optional); the first MyTable is table_alias, the second is table_source.

@Chloe 2015-11-07 05:08:00

Here is @MarkBrackett's answer for Postgres: delete from MyTable where not exists (select 1 from (select min(RowId) as RowId from MyTable group by Col1, Col2, Col3) as KeepRows where MyTable.RowId = KeepRows.RowId); It is much too slow for 7M rows.

@Noumenon 2016-07-22 22:13:46

@MarkBrackett Is your 9/20/15 comment saying that your answer only works in T-SQL? That would be good for skimmers to know.

@ericyoung 2018-02-11 04:50:08

@StefanSteiger what is uniqueidentifier in your comment? the second CONVERT is understandable, but the first one I do not understand. could you explain? CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn)))

@ericyoung 2018-02-11 20:35:58

@MarkBrackett what is exactly uniqueidentifier in CONVERT(uniqueidentifier, MIN(CONVERT(char(36), MyGuidColumn))) ? is it a column name or a data type, or something else?

@Stefan Steiger 2018-02-12 01:40:27

@ericyoung: uniqueidentifier is an v4-uuid. That's a data-type. The first convert is because min cannot be applied to datatype uniqueidentifier.

@Chris 2018-02-27 13:44:05

Using the suggestion from @GeorgSchölly I got an error about specifying the table in the FROM clause that I'm trying to update. To fix it, I had to modify the query with a subselect in the subselect: DELETE FROM MyTable WHERE RowId NOT IN (SELECT MIN(RowId) FROM (SELECT * FROM MyTable) AS MyTableSubselect GROUP BY Col1, Col2, Col3);

@MarJer 2020-06-30 12:11:26

If the goal is to get only one row of the duplicated rows in a table, why don't just write a part of the syntax above? Like below: select col1,col2,col3 into MyNewTable from MyTable group by col1,col2col3? Is this causing some problem I don't see?

@Mark Brackett 2020-07-01 20:15:44

@MarJer - you'd lose the id then (from OP: "The rows, of course, will not be perfect duplicates because of the existence of the RowID identity field")

@Jakub Ojmucianski 2017-04-13 08:49:17

I know that this question has been already answered, but I've created pretty useful sp which will create a dynamic delete statement for a table duplicates:

    CREATE PROCEDURE sp_DeleteDuplicate @tableName varchar(100), @DebugMode int =1

IF(OBJECT_ID('tempdb..#tableMatrix') is not null) DROP TABLE #tableMatrix;

SELECT ROW_NUMBER() OVER(ORDER BY name) as rn,name into #tableMatrix FROM sys.columns where [object_id] = object_id(@tableName) ORDER BY name

DECLARE @MaxRow int = (SELECT MAX(rn) from #tableMatrix)
IF(@MaxRow is null)
    RAISERROR  ('I wasn''t able to find any columns for this table!',16,1)
DECLARE @i int =1 
DECLARE @Columns Varchar(max) ='';

WHILE (@i <= @MaxRow)
    SET @[email protected]+(SELECT '['+name+'],' from #tableMatrix where rn = @i)

    SET @i = @i+1;

---DELETE LAST comma
SET @Columns = LEFT(@Columns,LEN(@Columns)-1)

DECLARE @Sql nvarchar(max) = '
WITH cteRowsToDelte
     AS (
SELECT ROW_NUMBER() OVER (PARTITION BY '[email protected]+' ORDER BY ( SELECT 0)) as rowNumber,* FROM '[email protected]

DELETE FROM cteRowsToDelte
WHERE  rowNumber > 1;
    IF(@DebugMode = 1)
       SELECT @Sql
       EXEC sp_executesql @Sql

So if you create table like that:

IF(OBJECT_ID('MyLitleTable') is not null)
    DROP TABLE MyLitleTable 

    A Varchar(10),
    B money,
    C int

    ('ABC',100,1), -- only this row should be deleted


     exec sp_DeleteDuplicate 'MyLitleTable',0

It will delete all duplicates from your table. If you run it without the second parameter it will return a SQL statement to run.

If you need to exclude any of the column just run it in the debug mode get the code and modify it whatever you like.

@Syed Mohamed 2013-09-10 13:07:06

This will delete duplicate rows, except the first row

    RowID NOT IN (
        GROUP BY

Refer (

@Ritesh 2016-02-29 18:33:30

For mysql it will give error: Error Code: 1093. You can't specify target table 'Mytable' for update in FROM clause. but this small change will work for mysql: DELETE FROM Mytable WHERE RowID NOT IN ( SELECT ID FROM (SELECT MIN(RowID) AS ID FROM Mytable GROUP BY Col1,Col2,Col3) AS TEMP)

@Teena 2013-10-03 06:18:06

    table_name T1
    rowid > (
            table_name T2
            T1.column_name = T2.column_name

@Nagaraj M 2017-08-01 11:21:34

Hi Teena, you have missed the table Alice name T1 after the delete comment otherwise it will throgh syntax exception.

@Draško 2014-12-10 19:36:34

This query showed very good performance for me:

    MyTable tbl
    EXISTS (
            MyTable tbl2
            tbl2.SameValue = tbl.SameValue
        AND tbl.IdUniqueValue < tbl2.IdUniqueValue

it deleted 1M rows in little more than 30sec from a table of 2M (50% duplicates)

@Brett Ryan 2016-06-07 01:29:07

Sometimes a soft delete mechanism is used where a date is recorded to indicate the deleted date. In this case an UPDATE statement may be used to update this field based on duplicate entries.

   SET DELETED = getDate()
      JOIN (SELECT min(TABLE_ID) id, COL_1, COL_2, COL_3
              FROM MY_TABLE d
             GROUP BY d.COL_1, d.COL_2, d.COL_3
            HAVING count(*) > 1) AS d ON d.COL_1 = x.COL_1
                                     AND d.COL_2 = x.COL_2
                                     AND d.COL_3 = x.COL_3
                                     AND d.TABLE_ID <> x.TABLE_ID
             /*WHERE x.COL_4 <> 'D' -- Additional filter*/)

This method has served me well for fairly moderate tables containing ~30 million rows with high and low amounts of duplications.

@yuvi 2016-02-02 06:59:27

Another way of doing this :--

       TABLE B
       AND A.COL2 = B.COL2

@Lankymart 2016-03-08 14:43:58

What's different to this existing answer from Aug 20 2008? -

@Hamit YILDIRIM 2016-01-11 20:31:40

Now lets look elasticalsearch table which this tables has duplicated rows and Id is identical uniq field. We know if some id exist by a group criteria then we can delete other rows outscope of this group. My manner shows this criteria.

So many case of this thread are in the like state of mine. Just change your target group criteria according your case for deleting repeated (duplicated) rows.

FROM elasticalsearch
               (SELECT min(Id)
                     FROM elasticalsearch
                     GROUP BY FirmId,FilterSearchString


@Wai Ha Lee 2016-01-18 23:51:45

Could you explain how/why your code works? That'll enable the OP and others to understand and apply your methods (where applicable) elsewhere. Code-only answers are discouraged and liable to be deleted. — During review

@Hamit YILDIRIM 2016-01-19 07:15:47

ok i explaned my my answwer Wai Ha Lee inspute of the code shows all details

@Chanukya 2015-12-16 06:11:51

alter table MyTable add sno int identity(1,1)
    delete from MyTable where sno in
    select sno from (
    select *,
    From MyTable
    where rank>1

    alter table MyTable 
    drop  column sno

@SoftwareGeek 2010-09-30 02:35:06

delete t1
from table t1, table t2
where t1.columnA = t2.columnA
and t1.rowid>t2.rowid


from table t1
using table t2
where t1.columnA = t2.columnA
and t1.rowid > t2.rowid

@Lankymart 2016-03-08 14:34:07

Why post a Postgres solution on a SQL Server question?

@Gabriel 2018-01-15 15:38:19

@Lankymart Because postgres users are coming here too. Look at the score of this answer.

@Gabriel 2018-01-15 19:39:13

I've seen this in some popular SQL questions, as in here, here and here. The OP got his answer and everyone else got some help too. No problem IMHO.

@Kamil 2009-05-08 13:06:42

  1. Create new blank table with the same structure

  2. Execute query like this

    INSERT INTO tc_category1
    SELECT *
    FROM tc_category
    GROUP BY category_id, application_id
    HAVING count(*) > 1
  3. Then execute this query

    INSERT INTO tc_category1
    SELECT *
    FROM tc_category
    GROUP BY category_id, application_id
    HAVING count(*) = 1

@Jayron Soares 2014-01-02 15:27:26

FROM MyTable
              SELECT min(RowID)
              FROM Mytable
              WHERE (SELECT RowID 
                     FROM Mytable
                     GROUP BY Col1, Col2, Col3

@Haris 2015-07-23 11:42:20

Use this

WITH tblTemp as
   As RowNumber,* FROM <table_name>
DELETE FROM tblTemp where RowNumber >1

@Lauri Lubi 2015-01-01 15:32:03

I you want to preview the rows you are about to remove and keep control over which of the duplicate rows to keep. See

with MYCTE as (
    PARTITION BY DuplicateKey1
                ,DuplicateKey2 -- optional
    ORDER BY CreatedAt -- the first row among duplicates will be kept, other rows will be removed
  ) RN
  FROM MyTable

@Jithin Shaji 2014-05-21 07:54:47

                 OVER ( 
                   partition BY col1, col1, col3 
                   ORDER BY rowid DESC) [Row] 
        FROM   mytable) LU 
WHERE  [row] > 1 

@Amit 2016-10-22 06:54:20

I get this message on azure SQL DW: A FROM clause is currently not supported in a DELETE statement.

@DrPizza 2008-08-20 22:53:12

From the application level (unfortunately). I agree that the proper way to prevent duplication is at the database level through the use of a unique index, but in SQL Server 2005, an index is allowed to be only 900 bytes, and my varchar(2048) field blows that away.

I dunno how well it would perform, but I think you could write a trigger to enforce this, even if you couldn't do it directly with an index. Something like:

-- given a table stories(story_id int not null primary key, story varchar(max) not null)
CREATE TRIGGER prevent_plagiarism 
ON stories 
    DECLARE @cnt AS INT 

    SELECT @cnt = Count(*) 
    FROM   stories 
           INNER JOIN inserted 
                   ON ( stories.story = inserted.story 
                        AND stories.story_id != inserted.story_id ) 

    IF @cnt > 0 
          RAISERROR('plagiarism detected',16,1) 


Also, varchar(2048) sounds fishy to me (some things in life are 2048 bytes, but it's pretty uncommon); should it really not be varchar(max)?

@James Errico 2014-03-01 07:40:18

I prefer the subquery\having count(*) > 1 solution to the inner join because I found it easier to read and it was very easy to turn into a SELECT statement to verify what would be deleted before you run it.

--DELETE FROM table1 
--WHERE id IN ( 
     SELECT MIN(id) FROM table1 
     GROUP BY col1, col2, col3 
     -- could add a WHERE clause here to further filter
     HAVING count(*) > 1

@Sandy 2015-05-15 20:05:21

Doesn't it delete all the records that show up in the inner query. We need to remove only duplicates and preserve the original.

@James Errico 2015-05-18 00:24:51

You're only returning the one with the lowest id, based on the min(id) in the select clause.

@Sandy 2015-05-18 13:36:35

Yes, but the question was not asking about how to return the rows that are to be deleted, but it is asking about how to delete the rows that are duplicates. Can you elaborate on how I can delete the rows that the query has returned?

@James Errico 2015-05-18 16:25:25

Uncomment out the first, second, and last lines of the query.

@Chloe 2015-11-06 23:48:21

This won't clean up all duplicates. If you have 3 rows that are duplicates, it will only select the row with the MIN(id), and delete that one, leaving two rows left that are duplicates.

@Chloe 2015-11-07 19:44:25

Nevertheless, I ended up using this statement repeated over & over again, so that it would actually make progress instead of having the connection timing out or the computer go to sleep. I changed it to MAX(id) to eliminate the latter duplicates, and added LIMIT 1000000 to the inner query so it wouldn't have to scan the whole table. This showed progress much quicker than the other answers, which would seem to hang for hours. After the table was pruned to a manageable size, then you can finish with the other queries. Tip: make sure col1/col2/col3 has indices for group by.

@Ismail Yavuz 2013-09-18 06:36:12

The other way is Create a new table with same fields and with Unique Index. Then move all data from old table to new table. Automatically SQL SERVER ignore (there is also an option about what to do if there will be a duplicate value: ignore, interrupt or sth) duplicate values. So we have the same table without duplicate rows. If you don't want Unique Index, after the transfer data you can drop it.

Especially for larger tables you may use DTS (SSIS package to import/export data) in order to transfer all data rapidly to your new uniquely indexed table. For 7 million row it takes just a few minute.

@Ruben Verschueren 2014-01-27 12:20:09

I thought I'd share my solution since it works under special circumstances. I my case the table with duplicate values did not have a foreign key (because the values were duplicated from another db).

begin transaction
-- create temp table with identical structure as source table
Select * Into #temp From tableName Where 1 = 2

-- insert distinct values into temp
insert into #temp 
select distinct * 
from  tableName

-- delete from source
delete from tableName 

-- insert into source from temp
insert into tableName 
select * 
from #temp

rollback transaction
-- if this works, change rollback to commit and execute again to keep you changes!!

PS: when working on things like this I always use a transaction, this not only ensures everything is executed as a whole, but also allows me to test without risking anything. But off course you should take a backup anyway just to be sure...

@Nitish Pareek 2013-08-06 17:14:34

Yet another easy solution can be found at the link pasted here. This one easy to grasp and seems to be effective for most of the similar problems. It is for SQL Server though but the concept used is more than acceptable.

Here are the relevant portions from the linked page:

Consider this data:

A001    2011-01-01
A001    2011-01-01
A002    2011-01-01
A002    2011-01-01
A002    2011-01-01
A003    2011-01-01

So how can we delete those duplicate data?

First, insert an identity column in that table by using the following code:


Use the following code to resolve it:


@Andriy M 2013-08-07 12:42:07

"Easy to grasp", "seems to be effective", but not a word about what the method consists in. Just imagine that the link becomes invalid, what use would then be to know that the method was easy to grasp and effective? Please consider adding essential parts of the method's description into your post, otherwise this is not an answer.

@Jeff Davis 2013-11-05 21:25:39

This method is useful for tables where you don't yet have an identity defined. Often you need to get rid of duplicates in order to define the primary key!

@Martin Smith 2014-03-02 11:51:27

@JeffDavis - The ROW_NUMBER version works fine for that case without needing to go to the lengths of adding a new column before you begin.

@Jacob Proffitt 2008-08-20 22:27:53

Oh sure. Use a temp table. If you want a single, not-very-performant statement that "works" you can go with:

        (SELECT TOP 1 RowID FROM MyTable mt2 
        WHERE mt2.Col1 = mt.Col1 
        AND mt2.Col2 = mt.Col2 
        AND mt2.Col3 = mt.Col3) 
    FROM MyTable mt)

Basically, for each row in the table, the sub-select finds the top RowID of all rows that are exactly like the row under consideration. So you end up with a list of RowIDs that represent the "original" non-duplicated rows.

Related Questions

Sponsored Content

33 Answered Questions

[SOLVED] Finding duplicate values in a SQL table

  • 2010-04-07 18:17:29
  • Alex
  • 2875281 View
  • 1965 Score
  • 33 Answer
  • Tags:   sql duplicates

49 Answered Questions

[SOLVED] Removing duplicates in lists

16 Answered Questions

[SOLVED] How can I do an UPDATE statement with JOIN in SQL Server?

35 Answered Questions

[SOLVED] How do I UPDATE from a SELECT in SQL Server?

12 Answered Questions

[SOLVED] Removing duplicate rows in Notepad++

14 Answered Questions

[SOLVED] Best way to get identity of inserted row?

  • 2008-09-03 21:32:02
  • Oded
  • 879127 View
  • 1141 Score
  • 14 Answer
  • Tags:   sql sql-server tsql

43 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

30 Answered Questions

Sponsored Content