Solid Queue & understanding UPDATE SKIP LOCKED

January 23, 2024

What is solid queue?

Recently, 37signals open sourced Solid Queue.

Solid Queue is database based queuing backend for Active Job. In contrast Sidekiq and Resque are Redis based queuing backends.

In her blog Rosa Gutiérrez mentioned following lines which captured our attention.

In our case, one feature that PostgreSQL has had for quite some time and that was finally introduced in MySQL 8 has been crucial to our implementation:

SELECT ... FOR UPDATE SKIP LOCKED

This allows Solid Queue’s workers to fetch and lock jobs without locking other workers.

As per her this feature had been in PostgreSQL for a while and now this feature has landed in MySQL making it possible to build Solid Queue.

We had never heard of UPDATE SKIP LOCKED feature either in PostgreSQL or in MySQL. We were wondering what is this UPDATE SKIP LOCKED without which it was not possible to build Solid Queue. So we decided to look into it.

Processing jobs from a queue

Consider a case where we need to build a system where a bunch of jobs need to be processed in the background.

There are a bunch of workers waiting to grab a job and start processing the moment a job becomes available. The challenge is when multiple workers attempt to claim the same job simultaneously how do we ensure that only one of the workers claims the job for processing. At any point of time a worker should claim only the "unclaimed" job and an "unclaimed" job should be claimed by one and only one worker.

Here is how one might go about it implementing it.

START TRANSACTION
SELECT * FROM JOBS WHERE processed='no' LIMIT 1;
-- Process the job
COMMIT;

With the above code it's possible that two workers might claim the same job.

One way to resolve this issue is by marking a particular row as locked for update.

START TRANSACTION;
SELECT * FROM JOBS WHERE processed='no' FOR UPDATE LIMIT 1;
-- Process the job
COMMIT

SELECT ... FOR UPDATE locks a particular row and hence no one else can lock that record.

As soon as a new job comes in multiple workers will execute the above query and will try to take a lock on that record. The database will ensure that only one of the workers gets the lock.

The first worker will take the lock on the record using FOR UPDATE. When other workers come to that record and they see that there is a lock FOR UPDATE, they will wait for the lock to be lifted. Yes, these workers will wait until the lock is released.

The lock will only be released when the transaction is committed. When the transaction is committed and the lock is released then other workers will get hold of the record only to find that the job has already been processed. As you can see this is a highly inefficient process.

That is where FOR UPDATE SKIP LOCKED comes in.

SKIP LOCKED skips locked rows

START TRANSACTION;
SELECT * FROM jobs_table FOR UPDATE SKIP LOCKED LIMIT 1;
-- Process the job
COMMIT;

Imagine the same scenario here. A job comes in. Multiple workers compete to claim the job. The database ensures that only one worker gets the lock. However in this case the other workers will move on to the next record. They will not wait. That's what SKIP LOCKED does.

MySQL has a detailed documentation on how SKIP LOCKED works if you want to read in more details.

Solid Queue uses FOR UPDATE SKIP LOCKED feature to ensure that a job is claimed by only one worker.

How GoodJob manages job processing without SKIP LOCKED

GoodJob burst into the scene around July, 2020. GoodJobs supports only PostgreSQL database because it uses advisory locks to guarantee that no two workers claim the same job.

PostgreSQL folks understand that the lock mechanism provided by the database would not satisfy all the variety of cases that might arise in an application. Advisory locks are a mechanism that allows applications to establish a communication channel to coordinate actions between different sessions or transactions. Unlike regular row-level locks enforced by the database system, advisory locks are implemented as a set of low-level functions that applications can use to acquire and release locks based on their requirements. We can read more about it here.

pg_advisory_lock function will lock the given resource. However if another session already holds a lock on the same resource then this function will wait. This is similar to the FOR UPDATE case we saw above.

However pg_try_advisory_lock function will either obtain the lock immediately and return true, or return false if the lock cannot be acquired immediately. As you can see the name has the word try. This function will try to get a lock. If it can't get the lock then it won't wait. Now this function can be utilized to build a queuing system.

Any usage of advisory lock means the application needs to coordinate action. It gives more power to the application but it also means more work by the application. In contrast FOR UPDATE SKIP RECORD is natively supported by PostgreSQL.

Based on the discussions here and here, it seems GoodJob is evaluating the possibility of migrating from advisory locks to using FOR UPDATE SKIP LOCKED for better performance. Going through these issues was quite revealing and I got to learn a lot about things I was unaware of.

Delayed job implementation

DelayedJob has been there since 2009, long before Sidekiq. It doesn't use SKIP LOCK. Instead it uses a row level locking system by updating a field in the job record to indicate that the job is being processed. In short DelayedJob ensures that no two workers take the same job at the application level without taking any help in this direction from the database.

What about SQLite

So far we discussed PostregSQL and MySQL. What about SQLite? Does it support SKIP LOCK. No it doesn't support it but it's ok. As per the documentation it supports only writer at any instant in time.

High Concurrency

SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.

NOWAIT

For the sake of completeness let's discuss NOWAIT feature. We saw earlier that if we take a lock on a row using FOR UPDATE then other workers will wait until the lock is released.

START TRANSACTION;
SELECT * FROM JOBS WHERE processed='no' FOR UPDATE NOWAIT LIMIT 1;
-- Process the job
COMMIT

NOWAIT feature allows other transactions to not wait for the lock to be released. In this case if a transaction is not able to get a lock on the given row then it will raise an error and application needs to handle the error.

In contrast SKIP LOCKED will allow the transaction to move on to the next row if a lock is already taken.

Redis backed queue vs database backed queue

Now that we looked at how FOR UPDATE SKIP LOCK helps build queuing system using database itself, let's see some pros and cons of each type of queuing system.

Simplicity and familiarity

Database-backed queues are often simpler to set up and manage, especially if your application is already using a relational database. There's no need for an additional dependency like Redis.

No Additional Infrastructure

Since the job information is stored in the same database as your application data, you don't need to set up and maintain a separate infrastructure like a Redis server.

Transactionality

Database-backed queues can leverage database transactions, ensuring that both the job creation and any related database operations are committed or rolled back together. This can be important in scenarios where data consistency is critical.

Modifiability

It is easy to modify the jobs stored in database than Redis, but doing so requires caution and it's generally not recommended. In Redis, jobs are often stored as serialized data, and modifying them directly is not as straightforward or common. Redis provides commands to interact with data, but modifying job data directly is not a standard practice and could result in data corruption.

If this blog was helpful, check out our full blog archive.