I've written an API for a client that accepts orders from clients and converts them into print jobs that go to an HTTP bridge into the company's RIP software. On the backend, the bridge is reading and writing job data to an SQLite database. It stores the job in SQLite and then sends it to the RIP. The RIP communicates back to the bridge when the status changes. It updates the record in SQLite, and through a series of subscriptions and webhooks sends the status update back to the API.
Neither the bridge nor the RIP automatically cleans up after itself, and the bridge imposes a limit of 5,000 tracked jobs at a time. So if we try to send job #5001 while it's storing 5,000 additional jobs, it will fail.
So the solution, at least in theory, would be to clean up jobs as they complete. Which is what we're doing. When jobs complete my API enqueues a cleanup task, which is run from a separate process and requests the bridge remove that job.
The problem arises here. When we request cleanups the bridge starts failing job submissions due to DB lock errors. We're not deleting active jobs so there should be zero conflict between what is currently being INSERTed or UPDATed and what is being DELETEd as a result of our removal request. I can only assume SQLite is locking the entire database for the duration of the delete instead of locking only the rows in use. Would this make sense?
From the outside of the black box that is this software, is there anything I can do to affect this locking behavior in SQLite?
If not I essentially have to work around it by putting a buffer between everything our API does and the bridge that switches between a print mode and a cleanup mode but never does both synchronously. This will work but will slow production, and ultimately requires me to build a hack around a limitation in a $200,000 software product that just can't handle our volume.