This is Part 4 of our blog series on scaling Rails applications.
When a Rails application needs to interact with a database it establishes a connection which is a dedicated communication channel between the application and the database server. When a new request comes to Rails, the operation can be handled like this.
When the next request comes then repeat the above process.
Creating new database connections is an expensive operation - it takes time to establish the connection, authenticate, and set up the communication channel. It means that every single time when a request comes we are spending time in setting up the connection.
Wouldn't it be better to store the established connection to somewhere and when a new request comes then get the pre-established connection from this pool. This expedites the process since we don't need to create a connection and close a connection in every single request.
The new process might look like this.
Now when a new request comes then the operation will look like this.
Database connection pooling is a performance optimization technique that maintains a set of reusable database connections.
Active Record manages a pool of database connections for each web and background process. Each process will have a connection pool of its own, which means a Rails application running with multiple processes (like Puma processes, Sidekiq processes) will have multiple independent connection pools. The pool is a set of database connections that are shared among threads in the same process.
Note that the pooling happens at the process level. A thread from process A can't get a connection from process B.
When a connection is needed, the thread checks out a connection from the pool, perform operations, and then returns the connection back to the pool. This is done at a query level now. For each individual query, a connection is leased, used, and then returned back to the pool.
Pre Rails 7.2, the connection used to be leased and held till the end of the request if it is a web request and till the job is done if it is a background job. This was a problem for applications that spent a lot of time doing I/O. The thread will hog the connection for the entire duration of the I/O operation limiting the number of queries that can be executed concurrently. To facilitate this change and make query caching work, the query cache has been updated to be owned by the pool.
This means that the query cache is now shared across all connections in the pool. Previously each connection had its own query cache. As the whole request used the same connection, this was fine. But now, as the connection is leased for each query, the query cache needs to be shared across all connections in the pool.
Active Record's connection pool behavior can be customized through several configuration options in the database.yml file:
RAILS_MAX_THREADS
, but you can set it to any value. There is a small
problem when you set it to RAILS_MAX_THREADS
which we'll discuss later.ActiveRecord::ConnectionTimeoutError
exception
will be raised.Database connections can sometimes become "dead" due to issues like database restarts, network problems etc. Active Record provides Reaper to handle this.
The Reaper periodically checks connections in the pool and removes dead connections as well as idle connections that have been checked out for a long time.
It acts somewhat like a garbage collector for database connections. The Reaper
uses the idle_timeout
setting to determine how long a connection can remain
idle before being removed, tracking idle time based on when connections were
last used.
There is another configuration option called reaping_frequency
that controls
how often the Reaper runs to remove dead or idle connections from the pool. By
default, this is set to 60 seconds. It means the Reaper will wake up once every
minute to perform its maintenance tasks.
If your application is spiky and receives a lot of traffic in surges, then set the reaping frequency and idle timeout to a lower value. This will ensure that the reaper runs more frequently and removes idle connections more quickly, helping to keep the connection pool healthy and responsive.
Idle database connections can significantly impact database performance for several interconnected reasons:
Memory Consumption: Each database connection, even when idle, maintains its own memory allocation. The database must reserve memory for session state, buffers, user context, and transaction workspace. This memory remains allocated even when the connection isn't doing any work. For example, if each connection uses 10 MB of memory, 100 idle connections would unnecessarily consume 1 GB of your database's memory that could otherwise be used for active queries, caching, or other productive work.
CPU overhead: While "idle" suggests no activity, the database still performs regular maintenance work for each connection. It must monitor connection health via keepalive checks, manage process tables etc.
The crucial issue is that this overhead of having idle connections scales non-linearly. As we add more idle connections, the database spends an increasing proportion of its CPU time just managing these connections rather than processing actual queries. Thankfully, the reaper handles this for us.
As we learned, the connection pool is managed at the process level. Each Rails process maintains its own pool.
Each Puma process is a separate process with its own connection pool. In a
process, each thread can check out one connection. Therefore, maximum
connections needed per process equals max_threads
setting in Puma.
Sidekiq runs as a separate process with its own connection pool. The Sidekiq
concurrency
setting determines the number of threads and therefore the maximum
connections needed equals the concurrency
value.
Note: If you're using Sidekiq swarm and running multiple Sidekiq processes, then take that it into account.
We can calcualte the total potential connections for a typical application as shown below.
Web connections = Number of web dynos * Number of Puma processes
* `max_threads` value
Background connections = Number of worker dynos * Number of Sidekiq processes
* Threads per process
Total number of connection = Web connections + Background connections
The key thing to note here is that the database needs to support at least these many simultaneous connections.
Note that if preboot is enabled, then the maximum number of connections will be double the above value. This is because during the release phase, there is a small window in which both the old dynos and new dynos are running.
In Rails 7,
load_async
was introduced which allows us to run database queries asynchronously in a
background thread. When load_async is in used, the calculation for maximum
number of connections needed changes a bit. First, let's understand how
load_async
works.
load_async
worksload_async
allows Rails to execute database queries asynchronously in
background threads. Unlike regular ActiveRecord queries which are lazily loaded,
load_async queries are always executed immediately in background threads and
joined to the main thread when results are needed.
The async executor is configured through the
config.active_record.async_query_executor
setting. There are three possible
configurations:
nil
(default): Async queries are disabled, and load_async will execute
queries synchronously.:global_thread_pool
: Uses a single thread pool for all database
connections.:multi_thread_pool
: Uses separate thread pools for each database
connection.Rails provides a configuration option named
global_executor_concurrency
(default: 4) that controls how many concurrent async queries can run per
process. So, the maximum number of connections per process when load_async
is
used.
Maximum connections per process = Process level concurrency
+ global_executor_concurrency + 1
Here Process level concurrency means max_threads
for the Puma process and
sidekiq_conurrency
for the Sidekiq process.
The "+1" accounts for the main control thread, which may occasionally need a connection (e.g., during model introspection at class load time).
There is a nice calculator created by the folks at Judoscale which can be used to calculate the maximum number of connections needed for your application.
Our database.yml
file has the following line.
pool: <%%= ENV.fetch("RAILS_MAX_THREADS") { 5 } %>
We know that a thread doesn't take more than one DB connection. So the maximum number of connections needed per pool is the equal to the total number of threads. So the above configuration looks fine.
However this doesn't take into account whether we use load_async or not. If we
use load_async, then the number of connections needed per process will be
RAILS_MAX_THREADS
+ global_executor_concurrency
+ 1.
Do we need to look into this much detail to find what should be the Pool Size. Turns out there is an much easier answer.
Most of the hosting providers provide database with a maximum number of connections. We can just set the pool config to the maximum number of connections supported by our database plan. Let us say we have a Standard-0 database on Heroku. It supports up to 120 connections. So we can set the pool config to 120.
pool: 120
We can do this because the database connections are lazily initialized in the pool. The application doesn't create more database connections than it needs. So we needn't be conservative here.
The only thing we need to ensure is that the maximum connection utilization doesn't exceed the database plan limit. If that happens then we have another solution - PgBouncer.
PgBouncer is a lightweight connection pooler for PostgreSQL. It sits between our application(s) and our PostgreSQL database and manages a pool of database connections.
While both PgBouncer and Active Record provide connection pooling, they operate at different levels and serve different purposes.
Active Record connection pool operates within a single Ruby process and manages
connections for threads within the process whereas PgBouncer is an external
connection pooler that sits between the application and the database and manages
connections across all the application processes.
This error comes up when a thread waits more than checkout_timeout
seconds to
acquire a connection. This usually happens when the the pool
size is set to a
value less than the concurrency.
For example, lets say we have set the Sidekiq concurrency to 10 and pool size to 5. If we have more than 5 threads wanting a connection at any point of time, the threads will have to wait.
What's the soution? As we discussed earlier setting the pool
to a really high
value should fix the error in most cases.
Even after setting the config correctly, ActiveRecord::ConnectionTimeoutError
can still happen and it could be puzzling. Let is discuss a few scenarios where
this can happen.
class SomeService
def process
threads = []
5.times do |index|
threads << Thread.new do
ActiveRecord::Base.connection.execute("select pg_sleep(5);")
end
end
threads.each(&:join)
end
end
Here 5 threads are spun up. Note that these threads also take up connections from the same pool allotted to the process.
Even if our application code is not spinning up new threads, Rails itself can sometimes spin up additional threads. For example Active Storage configured in proxy mode.
Active Storage’s proxy controllers 1, 2 generate responses as streams, which require dedicated threads for processing.
This means that when serving an Active Storage file through one of these proxy controllers, Rails actually utilizes two separate threads - one for the main request and another for the streaming process. Each of these threads requires its own separate database connection from the ActiveRecord connection pool.
rack-timeout is commonly used across Rails applications to automatically terminate long-running requests. While it helps prevent server resources from being tied up by slow requests, it can also cause a few issues.
Rack timeout uses Ruby's
Thread#raise API to terminate
requests that exceed the configured timeout. When a timeout occurs, rack-timeout
raises a Rack::Timeout::RequestTimeoutException
from another thread. If this
exception is raised while a thread is in the middle of database operations, it
can prevent proper cleanup of database connections.
If we still frequently see ActiveRecord::ConnectionTimeoutError
exceptions in
our application, we can get additional context by logging the connection pool
info to our error monitoring service. This can help identify which all threads
were holding onto the connections when the error occurred.
config.before_notify do |notice|
if notice.error_class == "ActiveRecord::ConnectionTimeoutError"
notice.context = { connection_pool_info: detailed_connection_pool_info }
end
end
def detailed_connection_pool_info
connection_info = {}
ActiveRecord::Base.connection_pool.connections.each_with_index do |conn, index|
connection_info["connection_#{index + 1}"] = conn.owner ? conn.owner.inspect : "[UNUSED]"
end
connection_info["current_thread"] = Thread.current.inspect
connection_info
end
<thread_obj>.inspect
gives us the name, id and status of the thread. For
example, if one entry in the hash looks like
#<Thread:0x00006a42eca73ba0@puma srv tp 002 /app/.../gems/puma-6.2.2/lib/puma/thread_pool.rb:106 sleep_forever>
then it means that the connection is taken up by a Puma thread.
If we want to monitor Active Record Connection Pool stats then periodically we need to send the stats to a service provider which can display the data graphically. For periodically checking the stat we are using rufus-scheduler gem. For collecting the data and showing the data we are using NewRelic but you can use any APM of your choice. We have configured to send the pool stat every 15 seconds.
Here is the gist which collects and sends data.
If any part of the blog is not clear to you then please write to us at LinkedIn, Twitter or our website. Our goal is to write in an easy to understand manner so that we all can understand how we can scale a Rails application.
If this blog was helpful, check out our full blog archive.