Recently, 37signals open sourced Solid Queue.
Solid Queue is database based queuing backend for Active Job. In contrast Sidekiq and Resque are Redis based queuing backends.
In her blog Rosa Gutiérrez mentioned following lines which captured our attention.
In our case, one feature that PostgreSQL has had for quite some time and that was finally introduced in MySQL 8 has been crucial to our implementation:
SELECT ... FOR UPDATE SKIP LOCKED
This allows Solid Queue’s workers to fetch and lock jobs without locking other workers.
As per her this feature had been in PostgreSQL for a while and now this feature has landed in MySQL making it possible to build Solid Queue.
We had never heard of UPDATE SKIP LOCKED
feature either in PostgreSQL or in
MySQL. We were wondering what is this UPDATE SKIP LOCKED
without which it was
not possible to build Solid Queue. So we decided to look into it.
Consider a case where we need to build a system where a bunch of jobs need to be processed in the background.
There are a bunch of workers waiting to grab a job and start processing the moment a job becomes available. The challenge is when multiple workers attempt to claim the same job simultaneously how do we ensure that only one of the workers claims the job for processing. At any point of time a worker should claim only the "unclaimed" job and an "unclaimed" job should be claimed by one and only one worker.
Here is how one might go about it implementing it.
START TRANSACTION
SELECT * FROM JOBS WHERE processed='no' LIMIT 1;
-- Process the job
COMMIT;
With the above code it's possible that two workers might claim the same job.
One way to resolve this issue is by marking a particular row as locked for update.
START TRANSACTION;
SELECT * FROM JOBS WHERE processed='no' FOR UPDATE LIMIT 1;
-- Process the job
COMMIT
SELECT ... FOR UPDATE
locks a particular row and hence no one else can lock
that record.
As soon as a new job comes in multiple workers will execute the above query and will try to take a lock on that record. The database will ensure that only one of the workers gets the lock.
The first worker will take the lock on the record using FOR UPDATE
. When other
workers come to that record and they see that there is a lock FOR UPDATE
, they
will wait for the lock to be lifted. Yes, these workers will wait until the lock
is released.
The lock will only be released when the transaction is committed. When the transaction is committed and the lock is released then other workers will get hold of the record only to find that the job has already been processed. As you can see this is a highly inefficient process.
That is where FOR UPDATE SKIP LOCKED
comes in.
START TRANSACTION;
SELECT * FROM jobs_table FOR UPDATE SKIP LOCKED LIMIT 1;
-- Process the job
COMMIT;
Imagine the same scenrio here. A job comes in. Multiple workers compete to claim
the job. The database ensures that only one worker gets the lock. However in
this case the other workers will move on to the next record. They will not wait.
That's what SKIP LOCKED
does.
MySQL has a detailed documentation on how SKIP LOCKED works if you want to read in more details.
Solid Queue uses FOR UPDATE SKIP LOCKED
feature to ensure that a job is
claimed by only one worker.
GoodJob burst into the scene around July, 2020. GoodJobs supports only PostgreSQL database because it uses advisory locks to guarantee that no two workers claim the same job.
PostgreSQL folks understand that the lock mechanism provided by the database would not satisfy all the variety of cases that might arise in an application. Advisory locks are a mechanism that allows applications to establish a communication channel to coordinate actions between different sessions or transactions. Unlike regular row-level locks enforced by the database system, advisory locks are implemented as a set of low-level functions that applications can use to acquire and release locks based on their requirements. We can read more about it here.
pg_advisory_lock function
will lock the given resource. However if another session already holds a lock on
the same resource then this function will wait. This is similar to the
FOR UPDATE
case we saw above.
However pg_try_advisory_lock function
will either obtain the lock immediately
and return true, or return false if the lock cannot be acquired immediately. As
you can see the name has the word try
. This function will try to get a lock.
If it can't get the lock then it won't wait. Now this function can be utilized
to build a queuing system.
Any usage of advisory lock means the application needs to coordinate action. It
gives more power to the application but it also means more work by the
application. In contrast FOR UPDATE SKIP RECORD
is natively supported by
PostgreSQL.
Based on the discussions
here and
here,
it seems GoodJob is evaluating the possibility of migrating from advisory locks
to using FOR UPDATE SKIP LOCKED
for better performance. Going through these
issues was quite revealing and I got to learn a lot about things I was unaware
of.
DelayedJob has been there since
2009, long before Sidekiq. It doesn't use SKIP LOCK
. Instead it uses a row
level locking system by
updating a field in the job record
to indicate that the job is being processed. In short DelayedJob ensures that no
two workers take the same job at the application level without taking any help
in this direction from the database.
So far we discussed PostregSQL and MySQL. What about SQLite? Does it support
SKIP LOCK
. No it doesn't support it but it's ok. As per the
documentation it supports
only writer at any instant in time
.
High Concurrency
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writers queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution.
For the sake of completeness let's discuss NOWAIT
feature. We saw earlier that
if we take a lock on a row using FOR UPDATE
then other workers will wait until
the lock is released.
START TRANSACTION;
SELECT * FROM JOBS WHERE processed='no' FOR UPDATE NOWAIT LIMIT 1;
-- Process the job
COMMIT
NOWAIT
feature allows other transactions to not wait for the lock to be
released. In this case if a transaction is not able to get a lock on the given
row then it will raise an error and application needs to handle the error.
In contrast SKIP LOCKED
will allow the transaction to move on to the next row
if a lock is already taken.
Now that we looked at how FOR UPDATE SKIP LOCK
helps build queuing system
using database itself, let's see some pros and cons of each type of queuing
system.
Database-backed queues are often simpler to set up and manage, especially if your application is already using a relational database. There's no need for an additional dependency like Redis.
Since the job information is stored in the same database as your application data, you don't need to set up and maintain a separate infrastructure like a Redis server.
Database-backed queues can leverage database transactions, ensuring that both the job creation and any related database operations are committed or rolled back together. This can be important in scenarios where data consistency is critical.
It is easy to modify the jobs stored in database than Redis, but doing so requires caution and it's generally not recommended. In Redis, jobs are often stored as serialized data, and modifying them directly is not as straightforward or common. Redis provides commands to interact with data, but modifying job data directly is not a standard practice and could result in data corruption.
If this blog was helpful, check out our full blog archive.