By Fearghal


2013-08-22 20:51:14 8 Comments

How can I delete duplicate rows where no unique row id exists?

My table is

col1  col2 col3 col4 col5 col6 col7
john  1    1    1    1    1    1 
john  1    1    1    1    1    1
sally 2    2    2    2    2    2
sally 2    2    2    2    2    2

I want to be left with the following after the duplicate removal:

john  1    1    1    1    1    1
sally 2    2    2    2    2    2

I've tried a few queries but I think they depend on having a row id as I don't get the desired result. For example:

DELETE
FROM table
WHERE col1 IN (
    SELECT id
    FROM table
    GROUP BY id
    HAVING (COUNT(col1) > 1)
)

20 comments

@Prajwal W 2019-06-03 10:32:46

     SELECT DISTINCT * FROM TABLE;

This will remove all the duplicate rows and provide you with only the distinct values(rows).

This solution might be useful in cases where the user just want to display the non-duplicate values instead of deleting the duplicates from the db.

@Adeptus 2019-07-09 06:54:21

It will only show you distinct rows, but the duplicates still exist in the table

@Prajwal W 2019-07-09 07:09:21

@Adeptus would be happy if you provide a better solution for this answer than than downvoting it!

@Adeptus 2019-07-09 07:12:07

Yours is not a bad solution, it is a non solution for what was asked. Any actual solution (eg, most other answers here) are therefore better.

@Emmanuel Bull 2019-08-15 10:47:03

Deleting duplicates from a huge(several millions of records) table might take long time . I suggest that you do a bulk insert into a temp table of the selected rows rather than deleting.

--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() 
OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM 
CTE WHERE ID =1;

@Surinder Singh 2019-08-05 12:29:28

DECLARE @TB TABLE(NAME VARCHAR(100));
INSERT INTO @TB VALUES ('Red'),('Red'),('Green'),('Blue'),('White'),('White')
--**Delete by Rank**
;WITH CTE AS(SELECT NAME,DENSE_RANK() OVER (PARTITION BY NAME ORDER BY NEWID()) ID FROM @TB)
DELETE FROM CTE WHERE ID>1
SELECT NAME FROM @TB;
--**Delete by Row Number** 
;WITH CTE AS(SELECT NAME,ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB)
DELETE FROM CTE WHERE ID>1;
SELECT NAME FROM @TB;

@Emmanuel Bull 2019-08-15 10:42:39

Deleting duplicates from a huge(several millions of records) table might take long time . I suggest that you do a bulk insert into a temp table of the selected rows rather deleting. '--REWRITING YOUR CODE(TAKE NOTE OF THE 3RD LINE) WITH CTE AS(SELECT NAME,ROW_NUMBER() OVER (PARTITION BY NAME ORDER BY NAME) ID FROM @TB) SELECT * INTO #unique_records FROM CTE WHERE ID =1;'

@Rhys 2016-01-26 18:54:34

If you have no references, like foreign keys, you can do this. I do it a lot when testing proofs of concept and the test data gets duplicated.

SELECT DISTINCT [col1],[col2],[col3],[col4],[col5],[col6],[col7]

INTO [newTable]

Go into the object explorer and delete the old table.

Rename the new table with the old table's name.

@Md Masududzaman Khan 2019-07-19 16:25:17

It can be done by many ways in sql server the most simplest way to do so is: Insert the distinct rows from the duplicate rows table to new temporary table. Then delete all the data from duplicate rows table then insert all data from temporary table which has no duplicates as shown below.

select distinct * into #tmp From table
   delete from table
   insert into table
   select * from #tmp drop table #tmp

   select * from table

Delete duplicate rows using Common Table Expression(CTE)

With CTE_Duplicates as

(select id,name , row_number() over(partition by id,name order by id,name ) rownumber from table ) delete from CTE_Duplicates where rownumber!=1

@Hadi Salehy 2019-01-23 04:47:47

You need to group by the duplicate records according to the field(s), then hold one of the records and delete the rest. For example:

DELETE prg.Person WHERE Id IN (
SELECT dublicateRow.Id FROM
(
select MIN(Id) MinId, NationalCode
 from  prg.Person group by NationalCode  having count(NationalCode ) > 1
 ) GroupSelect
 JOIN  prg.Person dublicateRow ON dublicateRow.NationalCode = GroupSelect.NationalCode 
 WHERE dublicateRow.Id <> GroupSelect.MinId)

@Fezal halai 2019-03-01 11:11:41

Try to Use:

SELECT linkorder
    ,Row_Number() OVER (
        PARTITION BY linkorder ORDER BY linkorder DESC
        ) AS RowNum
FROM u_links

enter image description here

@Moshe Taieb 2018-08-15 12:40:23

After trying the suggested solution above, that works for small medium tables. I can suggest that solution for very large tables. since it runs in iterations.

  1. Drop all dependency views of the LargeSourceTable
  2. you can find the dependecies by using sql managment studio, right click on the table and click "View Dependencies"
  3. Rename the table:
  4. sp_rename 'LargeSourceTable', 'LargeSourceTable_Temp'; GO
  5. Create the LargeSourceTable again, but now, add a primary key with all the columns that define the duplications add WITH (IGNORE_DUP_KEY = ON)
  6. For example:

    CREATE TABLE [dbo].[LargeSourceTable] ( ID int IDENTITY(1,1), [CreateDate] DATETIME CONSTRAINT [DF_LargeSourceTable_CreateDate] DEFAULT (getdate()) NOT NULL, [Column1] CHAR (36) NOT NULL, [Column2] NVARCHAR (100) NOT NULL, [Column3] CHAR (36) NOT NULL, PRIMARY KEY (Column1, Column2) WITH (IGNORE_DUP_KEY = ON) ); GO

  7. Create again the views that you dropped in the first place for the new created table

  8. Now, Run the following sql script, you will see the results in 1,000,000 rows per page, you can change the row number per page to see the results more often.

  9. Note, that I set the IDENTITY_INSERT on and off because one the columns contains auto incremental id, which I'm also copying

SET IDENTITY_INSERT LargeSourceTable ON DECLARE @PageNumber AS INT, @RowspPage AS INT DECLARE @TotalRows AS INT declare @dt varchar(19) SET @PageNumber = 0 SET @RowspPage = 1000000 select @TotalRows = count (*) from LargeSourceTable_TEMP

While ((@PageNumber - 1) * @RowspPage < @TotalRows )
Begin
    begin transaction tran_inner
        ; with cte as
        (
            SELECT * FROM LargeSourceTable_TEMP ORDER BY ID
            OFFSET ((@PageNumber) * @RowspPage) ROWS
            FETCH NEXT @RowspPage ROWS ONLY
        )

        INSERT INTO LargeSourceTable 
        (
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        )       
        select 
             ID                     
            ,[CreateDate]       
            ,[Column1]   
            ,[Column2] 
            ,[Column3]       
        from cte

    commit transaction tran_inner

    PRINT 'Page: ' + convert(varchar(10), @PageNumber)
    PRINT 'Transfered: ' + convert(varchar(20), @PageNumber * @RowspPage)
    PRINT 'Of: ' + convert(varchar(20), @TotalRows)

    SELECT @dt = convert(varchar(19), getdate(), 121)
    RAISERROR('Inserted on: %s', 0, 1, @dt) WITH NOWAIT
    SET @PageNumber = @PageNumber + 1
End

SET IDENTITY_INSERT LargeSourceTable OFF

@Shamseer K 2016-08-14 02:46:00

I would prefer CTE for deleting duplicate rows from sql server table

strongly recommend to follow this article ::http://codaffection.com/sql-server-article/delete-duplicate-rows-in-sql-server/

by keeping original

WITH CTE AS
(
SELECT *,ROW_NUMBER() OVER (PARTITION BY col1,col2,col3 ORDER BY col1,col2,col3) AS RN
FROM MyTable
)

DELETE FROM CTE WHERE RN<>1

without keeping original

WITH CTE AS
(SELECT *,R=RANK() OVER (ORDER BY col1,col2,col3)
FROM MyTable)
 
DELETE CTE
WHERE R IN (SELECT R FROM CTE GROUP BY R HAVING COUNT(*)>1)

@Robert Casey 2016-08-25 16:08:20

Windowing function is a great solution.

@Bigeyes 2019-04-19 20:29:46

I am little confused. You deleted it from CTE not the original table. So how does it work?

@Shamseer K 2019-04-20 11:26:58

@Bigeyes deleting records from CTE will remove corresponding records from actual physical table.(because CTE contains reference to actual records).

@Zakk Diaz 2019-08-19 22:41:17

I had no idea this was the case until this post... Thank you

@Rich 2019-08-26 03:23:01

Why would you want to delete both the original and its duplicate? I'm not understanding why you wouldn't want to just remove the duplicate and keep the other.

@Tim Schmelter 2013-08-22 20:55:50

I like CTEs and ROW_NUMBER as the two combined allow us to see which rows are deleted (or updated), therefore just change the DELETE FROM CTE... to SELECT * FROM CTE:

WITH CTE AS(
   SELECT [col1], [col2], [col3], [col4], [col5], [col6], [col7],
       RN = ROW_NUMBER()OVER(PARTITION BY col1 ORDER BY col1)
   FROM dbo.Table1
)
DELETE FROM CTE WHERE RN > 1

DEMO (result is different; I assume that it's due to a typo on your part)

COL1    COL2    COL3    COL4    COL5    COL6    COL7
john    1        1       1       1       1       1
sally   2        2       2       2       2       2

This example determines duplicates by a single column col1 because of the PARTITION BY col1. If you want to include multiple columns simply add them to the PARTITION BY:

ROW_NUMBER()OVER(PARTITION BY Col1, Col2, ... ORDER BY OrderColumn)

@Barka 2014-06-05 05:43:52

Thank you for a great answer. MSFT in contrast has a very complicated answer here: stackoverflow.com/questions/18390574/…

@CodeEngine 2015-02-11 19:07:47

@Tim Great answer. what about if you would want to only delete the duplicate for john. where would the 'WHERE' statement go ?. Just curious

@Tim Schmelter 2015-02-11 21:18:30

@omachu23: in this case it doesn't matter, although i think that it is more efficient in the CTE than outside(AND COl1='John'). Normally you should apply the filter in the CTE.

@CodeEngine 2015-02-11 22:12:56

@TimSchmelter I put the (AND Col1='Jhon') after WHERE RN >1 and it worked !. Because i didnt know where to put it inside CTE. It kept giving me errors. Anyways Thanks for the answer ! Awsome work !

@Tim Schmelter 2015-02-11 22:23:07

@omachu23: you can use any SQL in the CTE(apart from ordering), so if you want to filter by Johns: ...FROM dbo.Table1 WHERE Col1='John'. Here is the fiddle: sqlfiddle.com/#!6/fae73/744/0

@Zorgarath 2015-04-29 16:23:43

The easiest solution may just be set rowcount 1 delete from t1 where col1=1 and col2=1 as seen here

@beercohol 2015-06-15 18:20:01

That is beautiful... such a simple solution!

@MAK 2015-08-18 10:49:16

@TimSchmelter, Can please help me for this stackoverflow.com/questions/32065340/…

@Hoang Huynh 2015-10-24 04:08:13

This is the equivalent query for PostgreSQL sqlfiddle.com/#!15/c0ffa/1

@rlee 2016-03-16 11:26:00

This answer will only delete the rows that has duplicates in col1. Add the columns in the "select" to "partition by", for example using the select in the answer: RN = ROW_NUMBER()OVER(PARTITION BY col1,col2,col3,col4,col5,col6,col7 ORDER BY col1)

@Guilherme Silva 2016-07-06 08:39:28

This helped more than 5 times, always coming back for this CTE hahahaha Hope I have that in my mind now Thanks for your help!

@Whitecat 2016-08-05 22:10:04

What does CTE mean I get sql errors when I put that in.

@Thomas 2017-06-21 09:37:50

Please make a backup before running this query ... Or you might regret it.

@MrPk 2019-01-30 10:56:23

saved my day Upvote!

@messed-up 2018-07-17 12:52:45

Oh wow, i feel so stupid by ready all this answers, they are like experts' answer with all CTE and temp table and etc.

And all I did to get it working was simply aggregated the ID column by using MAX.

DELETE FROM table WHERE col1 IN (
    SELECT MAX(id) FROM table GROUP BY id HAVING ( COUNT(col1) > 1 )
)

NOTE: you might need to run it multiple time to remove duplicate as this will only delete one set of duplicate rows at a time.

@0xdd 2018-07-17 12:58:41

This will not work since it'll remove all duplicates without leaving the originals. OP is asking to preserve the original records.

@messed-up 2018-07-17 13:08:12

Not true, max will give you max ID that satisfy having condition. If that is not true, prove your case for down vote.

@j.hull 2018-03-23 12:52:03

If you have the ability to add a column to the table temporarily, this was a solution that worked for me:

ALTER TABLE dbo.DUPPEDTABLE ADD RowID INT NOT NULL IDENTITY(1,1)

Then perform a DELETE using a combination of MIN and GROUP BY

DELETE b
FROM dbo.DUPPEDTABLE b
WHERE b.RowID NOT IN (
                     SELECT MIN(RowID) AS RowID
                     FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
                     GROUP BY a.ITEM_NUMBER,
                              a.CHARACTERISTIC,
                              a.INTVALUE,
                              a.FLOATVALUE,
                              a.STRINGVALUE
                 );

Verify that the DELETE performed correctly:

SELECT a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE, COUNT(*)--MIN(RowID) AS RowID
FROM dbo.DUPPEDTABLE a WITH (NOLOCK)
GROUP BY a.ITEM_NUMBER,
    a.CHARACTERISTIC,
    a.INTVALUE,
    a.FLOATVALUE,
    a.STRINGVALUE
ORDER BY COUNT(*) DESC 

The result should have no rows with a count greater than 1. Finally, remove the rowid column:

ALTER TABLE dbo.DUPPEDTABLE DROP COLUMN RowID;

@rajibdotnet 2017-09-19 19:01:49

With reference to https://support.microsoft.com/en-us/help/139444/how-to-remove-duplicate-rows-from-a-table-in-sql-server

The idea of removing duplicate involves

  • a) Protecting those rows that are not duplicate
  • b) Retain one of the many rows that qualified together as duplicate.

Step-by-step

  • 1) First identify the rows those satisfy the definition of duplicate and insert them into temp table, say #tableAll .
  • 2) Select non-duplicate(single-rows) or distinct rows into temp table say #tableUnique.
  • 3) Delete from source table joining #tableAll to delete the duplicates.
  • 4) Insert into source table all the rows from #tableUnique.
  • 5) Drop #tableAll and #tableUnique

@Aamir 2016-10-30 07:22:57

Without using CTE and ROW_NUMBER() you can just delete the records just by using group by with MAX function here is and example

DELETE
FROM MyDuplicateTable
WHERE ID NOT IN
(
SELECT MAX(ID)
FROM MyDuplicateTable
GROUP BY DuplicateColumn1, DuplicateColumn2, DuplicateColumn3)

@marsze 2017-11-06 13:15:18

Works only if there is an id/unique field.

@Derek Smalls 2017-11-30 16:01:29

This query will delete non-duplicate records.

@monteirobrena 2017-12-07 13:08:11

This works fine, thank you. @DerekSmalls this not remove my non-duplicate records.

@Hasan Shouman 2016-10-25 08:16:21

-- this query will keep only one instance of a duplicate record.
;WITH cte
     AS (SELECT ROW_NUMBER() OVER (PARTITION BY col1, col2, col3-- based on what? --can be multiple columns
                                       ORDER BY ( SELECT 0)) RN
         FROM   Mytable)



delete  FROM cte
WHERE  RN > 1

@Debendra Dash 2016-10-09 18:41:20

with myCTE
as

(
select productName,ROW_NUMBER() over(PARTITION BY productName order by slno) as Duplicate from productDetails
)
Delete from myCTE where Duplicate>1

@Jithin Shaji 2016-09-28 10:11:31

Please see the below way of deletion too.

Declare @table table
(col1 varchar(10),col2 int,col3 int, col4 int, col5 int, col6 int, col7 int)
Insert into @table values 
('john',1,1,1,1,1,1),
('john',1,1,1,1,1,1),
('sally',2,2,2,2,2,2),
('sally',2,2,2,2,2,2)

Created a sample table named @table and loaded it with given data.

enter image description here

Delete  aliasName from (
Select  *,
        ROW_NUMBER() over (Partition by col1,col2,col3,col4,col5,col6,col7 order by col1) as rowNumber
From    @table) aliasName 
Where   rowNumber > 1

Select * from @table

enter image description here

Note: If you are giving all columns in the Partition by part, then order by do not have much significance.

I know, the question is asked three years ago, and my answer is another version of what Tim has posted, But posting just incase it is helpful for anyone.

@Tolga Gölelçin 2014-06-30 08:41:52

Another way of removing dublicated rows without loosing information in one step is like following:

delete from dublicated_table t1 (nolock)
join (
    select t2.dublicated_field
    , min(len(t2.field_kept)) as min_field_kept
    from dublicated_table t2 (nolock)
    group by t2.dublicated_field having COUNT(*)>1
) t3 
on t1.dublicated_field=t3.dublicated_field 
    and len(t1.field_kept)=t3.min_field_kept

@Shoja Hamid 2014-08-11 14:55:13

DELETE from search
where id not in (
   select min(id) from search
   group by url
   having count(*)=1

   union

   SELECT min(id) FROM search
   group by url
   having count(*) > 1
)

@Brent 2016-02-10 16:01:01

Couldn't you re-write to: where id in (select max(id) ... having count(*) > 1) ?

@Christopher Yang 2016-03-07 20:14:05

I don't believe there's any need to use having or union, this will suffice: delete from search where id not in (select min(id) from search group by url)

@oabarca 2014-06-05 14:41:35

Microsoft has a vey ry neat guide on how to remove duplicates. Check out http://support.microsoft.com/kb/139444

In brief, here is the easiest way to delete duplicates when you have just a few rows to delete:

SET rowcount 1;
DELETE FROM t1 WHERE myprimarykey=1;

myprimarykey is the identifier for the row.

I set rowcount to 1 because I only had two rows that were duplicated. If I had had 3 rows duplicated then I would have set rowcount to 2 so that it deletes the first two that it sees and only leaves one in table t1.

Hope it helps anyone

@Fearghal 2014-06-06 09:20:45

How do i know how many rows i have duplicated if i have 10k rows?

@oabarca 2014-06-07 15:15:13

@Fearghal try "select primaryKey, count(*) from myTable group by primaryKey;"

@thermite 2014-11-04 16:16:30

But what if there are varying numbers of duplicate rows? ie row a has 2 records and row b has 5 records and row c has no duplicate records

@oabarca 2014-11-04 16:59:50

@thermite I dont think I understand your point

@thermite 2014-11-04 17:20:19

@user2070775 What if only a subset of all the rows have duplicates, and of those duplicates some are duplicated twice and some three or four times?

@thermite 2014-11-04 17:27:19

@user2070775 I missed the part where you said "just a few rows to delete". Also there is a warning on the page about set rowcount that in future versions of sql it wont affect update or delete statements

@oabarca 2014-11-04 17:27:20

@thermite in that case, I guess it would be more convinient to write a script and not trying to do it manually in SQL. The script would select the duplicated rows, count the number of duplicates and execute "SET rowcount $numer_of_dulicated-1; DELETE FROM t1 WHERE myprimarykey=$duplicated_id_value;"

Related Questions

Sponsored Content

29 Answered Questions

47 Answered Questions

37 Answered Questions

40 Answered Questions

[SOLVED] How to return only the Date from a SQL Server DateTime datatype

29 Answered Questions

[SOLVED] Finding duplicate values in a SQL table

  • 2010-04-07 18:17:29
  • Alex
  • 2565207 View
  • 1734 Score
  • 29 Answer
  • Tags:   sql duplicates

15 Answered Questions

[SOLVED] How to Delete using INNER JOIN with SQL Server?

28 Answered Questions

[SOLVED] How can I prevent SQL injection in PHP?

33 Answered Questions

[SOLVED] How do I UPDATE from a SELECT in SQL Server?

4 Answered Questions

[SOLVED] Inserting multiple rows in a single SQL query?

37 Answered Questions

[SOLVED] How can I remove duplicate rows?

Sponsored Content