Caching result sets and collection in Rails 5

Mohit Natoo

Mohit Natoo

February 2, 2016

This blog is part of our  Rails 5 series.

Often while developing a Rails application you may look to have one of these caching techniques to boost the performance. Along with these, Rails 5 now provides a way of caching a collection of records, thanks to the introduction of the following method:

1
2ActiveRecord::Relation#cache_key
3

What is collection caching?

Consider the following example where we are fetching a collection of all users belonging to city of Miami.

1
2@users = User.where(city: 'miami')
3

Here @users is a collection of records and is an object of class ActiveRecord::Relation.

Whether the result of the above query would be same depends on following conditions.

  • The query statement doesn't change. If we change city name from "Miami" to "Boston" then result might change.
  • No record is deleted. The count of records in the collection should be same.
  • No record is added. The count of records in the collection should be same.

Rails community implemented caching for a collection of records . Method cache_key was added to ActiveRecord::Relation which takes into account many factors including query statement, updated_at column value and the count of the records in collection.

Understanding ActiveRecord::Relation#cache_key

We have object @users of class ActiveRecord::Relation. Now let's execute cache_key method on it.

1
2 @users.cache_key
3 => "users/query-67ed32b36805c4b1ec1948b4eef8d58f-3-20160116111659084027"
4

Let's try to understand each piece of the output.

users represents what kind of records we are holding. In this example we have collection of records of class User. Hence users is to illustrate that we are holding users records.

query- is hardcoded value and it will be same in all cases.

67ed32b36805c4b1ec1948b4eef8d58f is a digest of the query statement that will be executed. In our example it is MD5( "SELECT "users".* FROM "users" WHERE "users"."city" = 'Miami'")

3 is the size of collection.

20160116111659084027 is timestamp of the most recently updated record in the collection. By default, the timestamp column considered is updated_at and hence the value will be the most recent updated_at value in the collection.

Using ActiveRecord::Relation#cache_key

Let's see how to use cache_key to actually cache data.

In our Rails application, if we want to cache records of users belonging to "Miami" then we can take following approach.

1
2# app/controllers/users_controller.rb
3
4class UsersController < ApplicationController
5
6  def index
7    @users = User.where(city: 'Miami')
8  end
9end
10
11# users/index.html.erb
12
13<% cache(@users) do %>
14  <% @users.each do |user| %>
15    <p> <%= user.city %> </p>
16  <% end %>
17<% end %>
18
19# 1st Hit
20Processing by UsersController#index as HTML
21  Rendering users/index.html.erb within layouts/application
22   (0.2ms)  SELECT COUNT(*) AS "size", MAX("users"."updated_at") AS timestamp FROM "users" WHERE "users"."city" = ?  [["city", "Miami"]]
23Read fragment views/users/query-37a3d8c65b3f0f9ece7f66edcdcb10ab-4-20160704131424063322/30033e62b28c83f26351dc4ccd6c8451 (0.0ms)
24  User Load (0.1ms)  SELECT "users".* FROM "users" WHERE "users"."city" = ?  [["city", "Miami"]]
25Write fragment views/users/query-37a3d8c65b3f0f9ece7f66edcdcb10ab-4-20160704131424063322/30033e62b28c83f26351dc4ccd6c8451 (0.0ms)
26Rendered users/index.html.erb within layouts/application (3.7ms)
27
28# 2nd Hit
29Processing by UsersController#index as HTML
30  Rendering users/index.html.erb within layouts/application
31   (0.2ms)  SELECT COUNT(*) AS "size", MAX("users"."updated_at") AS timestamp FROM "users" WHERE "users"."city" = ?  [["city", "Miami"]]
32Read fragment views/users/query-37a3d8c65b3f0f9ece7f66edcdcb10ab-4-20160704131424063322/30033e62b28c83f26351dc4ccd6c8451 (0.0ms)
33  Rendered users/index.html.erb within layouts/application (3.0ms)
34

From above, we can see that for the first hit, a count query is fired to get the latest updated_at and size from the users collection.

Rails will write a new cache entry with a cache_key generated from above count query.

Now on second hit, it again fires count query and checks if cache_key for this query exists or not.

If cache_key is found, it loads data without firing SQL query.

What if your table doesn't have updated_at column?

Previously we mentioned that cache_key method uses updated_at column. cache_key also provides an option of passing custom column as a parameter and then the highest value of that column among the records in the collection will be considered.

For example if your business logic considers a column named last_bought_at in products table as a factor to decide caching, then you can use the following code.

1
2 products = Product.where(category: 'cars')
3 products.cache_key(:last_bought_at)
4 => "products/query-211ae6b96ec456b8d7a24ad5fa2f8ad4-4-20160118080134697603"
5

Edge cases to watch out for

Before you start using cache_key there are some edge cases to watch out for.

Consider you have an application where there are 5 entries in users table with city Miami.

Using limit puts incorrect size in cache key if collection is not loaded.

If you want to fetch three users belonging to city "Miami" then you would execute following query.

1
2 users = User.where(city: 'Miami').limit(3)
3 users.cache_key
4 => "users/query-67ed32b36805c4b1ec1948b4eef8d58f-3-20160116144936949365"
5

Here users contains only three records and hence the cache_key has 3 for size of collection.

Now let's try to execute same query without fetching the records first.

1
2 User.where(name: 'Sam').limit(3).cache_key
3 => "users/query-8dc512b1408302d7a51cf1177e478463-5-20160116144936949365"
4

You can see that the count in the cache is 5 this time even though we have set a limit to 3. This is because the implementation of ActiveRecord::Base#collection_cache_key executes query without limit to fetch the size of the collection.

Cache key doesn't change when an existing record from a collection is replaced

I want 3 users in the descending order of ids.

1
2 users1 = User.where(city: 'Miami').order('id desc').limit(3)
3 users1.cache_key
4 => "users/query-57ee9977bb0b04c84711702600aaa24b-3-20160116144936949365"
5

Above statement will give us users with ids [5, 4, 3].

Now let's remove the user with id = 3.

1
2 User.find(3).destroy
3
4 users2 = User.where(first_name: 'Sam').order('id desc').limit(3)
5 users2.cache_key
6 => "users/query-57ee9977bb0b04c84711702600aaa24b-3-20160116144936949365"
7

Note that cache_key both users1 and users2 is exactly same. This is because none of the parameters that affect the cache key is changed i.e., neither the number of records, nor the query statement, nor the timestamp of the latest record.

There is a discussion undergoing about adding ids of the collection records as part of the cache key. This might help solve the problems discussed above.

Using group query gives incorrect size in the cache key

Just like limit case discussed above cache_key behaves differently when data is loaded and when data is not loaded in memory.

Let's say that we have two users with first_name "Sam".

First let's see a case where collection is not loaded in memory.

1
2 User.select(:first_name).group(:first_name).cache_key
3 => "users/query-92270644d1ec90f5962523ed8dd7a795-1-20160118080134697603"
4

In the above case, the size is 1 in cache_key. For the system mentioned above, the sizes that you will get shall either be 1 or 5. That is, it is size of an arbitrary group.

Now let's see when collection is first loaded.

1
2 users = User.select(:first_name).group(:first_name)
3 users.cache_key
4 => "users/query-92270644d1ec90f5962523ed8dd7a795-2-20160118080134697603"
5

In the above case, the size is 2 in cache_key. You can see that the count in the cache key here is different compared to that where the collection was unloaded even though the query output in both the cases will be exactly same.

In case where the collection is loaded, the size that you get is equal to the total number of groups. So irrespective of what the records in each group are, we may have possibility of having the same cache key value.

If this blog was helpful, check out our full blog archive.

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.