By Anthony Greco


2011-11-28 02:29:30 8 Comments

I need to delete about 2 million rows from my PG database. I have a list of IDs that I need to delete. However, any way I try to do this is taking days.

I tried putting them in a table and doing it in batches of 100. 4 days later, this is still running with only 297268 rows deleted. (I had to select 100 id's from an ID table, delete where IN that list, delete from ids table the 100 I selected).

I tried:

DELETE FROM tbl WHERE id IN (select * from ids)

That's taking forever, too. Hard to gauge how long, since I can't see it's progress till done, but the query was still running after 2 days.

Just kind of looking for the most effective way to delete from a table when I know the specific ID's to delete, and there are millions of IDs.

7 comments

@Erwin Brandstetter 2011-11-28 02:42:46

It all depends ...

  • Delete all indexes (except the one on the ID which you need for the delete)
    Recreate them afterwards (= much faster than incremental updates to indexes)

  • Check if you have triggers that can safely be deleted / disabled temporarily

  • Do foreign keys reference your table? Can they be deleted? Temporarily deleted?

  • Depending on your autovacuum settings it may help to run VACUUM ANALYZE before the operation.

  • Assuming no concurrent write access to involved tables or you may have to lock tables exclusively or this route may not be for you at all.

  • Some of the points listed in the related chapter of the manual Populating a Database may also be of use, depending on your setup.

  • If you delete large portions of the table and the rest fits into RAM, the fastest and easiest way would be this:

SET temp_buffers = '1000MB'; -- or whatever you can spare temporarily

CREATE TEMP TABLE tmp AS
SELECT t.*
FROM   tbl t
LEFT   JOIN del_list d USING (id)
WHERE  d.id IS NULL;      -- copy surviving rows into temporary table

TRUNCATE tbl;             -- empty table - truncate is very fast for big tables

INSERT INTO tbl
SELECT * FROM tmp;        -- insert back surviving rows.

This way you don't have to recreate views, foreign keys or other depending objects. Read about the temp_buffers setting in the manual. This method is fast as long as the table fits into memory, or at least most of it. Be aware that you can lose data if your server crashes in the middle of this operation. You can wrap all of it into a transaction to make it safer.

Run ANALYZE afterwards. Or VACUUM ANALYZE if you did not go the truncate route, or VACUUM FULL ANALYZE if you want to bring it to minimum size. For big tables consider the alternatives CLUSTER / pg_repack:

For small tables, a simple DELETE instead of TRUNCATE is often faster:

DELETE FROM tbl t
USING  del_list d
WHERE  t.id = d.id;

Read the Notes section for TRUNCATE in the manual. In particular (as Pedro also pointed out in his comment):

TRUNCATE cannot be used on a table that has foreign-key references from other tables, unless all such tables are also truncated in the same command. [...]

And:

TRUNCATE will not fire any ON DELETE triggers that might exist for the tables.

@Anthony Greco 2011-11-28 06:12:31

Unfortunately I do have a few foreign keys, however I may do what you suggested by killing all keys / deleting / recreating. It is taking more time not doing this then to just do it. Thanks!

@Erwin Brandstetter 2011-11-28 06:43:30

@AnthonyGreco: You could delete the foreign keys and recreate afterwards. Of course you'll have to take care of references to deleted rows, too. And referential integrity is not guaranteed during this window.

@Anthony Greco 2011-11-28 06:45:02

Surely was not what I wanted to do, but deleting the index's made my deletes now fly... Now just gotta do this on all linked tables to deleted linked rows, but hell, beats all the time i spent trying to get it to work without

@Erwin Brandstetter 2011-11-28 14:19:58

@AnthonyGreco: Cool! Don't forget to recreate those indexes afterwards that you still need.

@Simon Lepkin 2016-02-03 17:19:25

Is it safe to do such a huge DELETE in one transaction, if the other rows (the ones not tagged for delete) are accessed frequently?

@Erwin Brandstetter 2016-02-03 18:42:50

@SimonLepkin: Please ask a new question (providing all relevant details). Comments are not the place. You can always link to this one for context.

@Simon Lepkin 2016-02-03 22:39:24

@ErwinBrandstetter Done: stackoverflow.com/questions/35188911/…

@Pencilcheck 2018-02-13 21:03:59

TRUNCATE helps! Thanks a lot!

@Pedro Borges 2018-04-08 15:00:48

This is a great solution, would just add that ignores delete cascades if not obvious for someone.

@Erwin Brandstetter 2018-04-09 22:40:30

@PedroBorges: I clarified some more, to point out caveats.

@FunctorSalad 2017-11-10 17:53:19

If the table you're deleting from is referenced by some_other_table (and you don't want to drop the foreign keys even temporarily), make sure you have an index on the referencing column in some_other_table!

I had a similar problem and used auto_explain with auto_explain.log_nested_statements = true, which revealed that the delete was actually doing seq_scans on some_other_table:

    Query Text: SELECT 1 FROM ONLY "public"."some_other_table" x WHERE $1 OPERATOR(pg_catalog.=) "id" FOR KEY SHARE OF x    
    LockRows  (cost=[...])  
      ->  Seq Scan on some_other_table x  (cost=[...])  
            Filter: ($1 = id)

Apparently it's trying to lock the referencing rows in the other table (which shouldn't exist, or the delete will fail). After I created indexes on the referencing tables, the delete was orders of magnitude faster.

@francs 2011-11-28 06:37:16

We know the update/delete performance of PostgreSQL is not as powerful as Oracle. When we need to delete millions or 10's of millions of rows, it's really difficult and takes a long time.

However, we can still do this in production dbs. The following is my idea:

First, we should create a log table with 2 columns - id & flag (id refers to the id you want to delete; flag can be Y or null, with Y signifying the record is successfully deleted).

Later, we create a function. We do the delete task every 10,000 rows. You can see more details on my blog. Though it's in Chinese, you can still can get the info you want from the SQL code there.

Make sure the id column of both tables are indexes, as it will run faster.

@Anthony Greco 2011-11-28 07:03:40

Well I was basically doing a logic of that to do it in batch, however it was taking far to long because of my indexes. I finally dropped all my indexes (was something I did not want to do) and the rows got purged quick as hell. Now building all my indexes back up. Thanks though!

@Mark Ransom 2011-11-28 02:42:10

First make sure you have an index on the ID fields, both in the table you want to delete from and the table you are using for deletion IDs.

100 at a time seems too small. Try 1000 or 10000.

There's no need to delete anything from the deletion ID table. Add a new column for a Batch number and fill it with 1000 for batch 1, 1000 for batch 2, etc. and make sure the deletion query includes the batch number.

@Anthony Greco 2011-11-28 07:04:17

Turned out no mater what I tried it was the keys that was killing me. Even just 15 was taking a minute or so, thats why i only did 100. Once I killed the index, it flew. Thanks though!

@Mikko Rantalainen 2019-04-25 05:22:09

@Zaldy Baguinon 2011-11-28 02:40:51

Two possible answers:

  1. Your table may have lots of constraint or triggers attached to it when you try to delete a record. It will incur much processor cycles and checking from other tables.

  2. You may need to put this statement inside a transaction.

@Anthony Greco 2011-11-28 06:09:05

1. I do have constrains (foreign keys) that are auto deleted when a row in the table deletes

@Saulius ┼Żemaitaitis 2011-11-28 02:34:45

You may try copying all the data in the table except the IDs you want to delete onto a new table, then renaming then swapping the tables (provided you have enough resources to do it).

This is not an expert advice.

@Anthony Greco 2011-11-28 06:08:14

Based off erwin answer, I may just do this. Thanks

@nclu 2017-11-20 17:09:07

Depending on the number of rows to be kept and how tricky other foreign keys are, this can work. Can also copy good rows to temp. Truncate current table. Then copy back from temp.

@Vincent Agnello 2011-11-28 02:34:15

The easiest way to do this would be to drop all your constraints and then do the delete.

@Anthony Greco 2011-11-28 06:07:46

I am really trying to avoid this because then I will just have to redo the process on all of it's foreign keys but I very well may have to. Thanks

Related Questions

Sponsored Content

14 Answered Questions

[SOLVED] Select first row in each GROUP BY group?

45 Answered Questions

11 Answered Questions

[SOLVED] Deleting millions of rows in MySQL

7 Answered Questions

13 Answered Questions

[SOLVED] Best way to get identity of inserted row?

  • 2008-09-03 21:32:02
  • Oded
  • 750360 View
  • 998 Score
  • 13 Answer
  • Tags:   sql sql-server tsql

4 Answered Questions

[SOLVED] Inserting multiple rows in a single SQL query?

15 Answered Questions

[SOLVED] How to Delete using INNER JOIN with SQL Server?

0 Answered Questions

16 Answered Questions

[SOLVED] How to delete duplicate entries?

5 Answered Questions

[SOLVED] How to delete multiple rows in SQL where id = (x to y)

  • 2013-04-16 05:31:15
  • balu zapps
  • 206292 View
  • 64 Score
  • 5 Answer
  • Tags:   sql sql-delete

Sponsored Content