July 30, 2019
Rails 6 has added support to provide optimizer hints.
Many relational database management systems (RDBMS) have a query optimizer. The job of the query optimizer is to determine the most efficient and fast plan to execute a given SQL query. Query optimizer has to consider all possible query execution plans before it can determine which plan is the optimal plan for executing the given SQL query and then compile and execute that query.
An optimal plan is chosen by the query optimizer by calculating the cost of each possible plans. Typically, when the number of tables referenced in a join query increases, then the time spent in query optimization grows exponentially which often affects the system's performance. The fewer the execution plans the query optimizer needs to evaluate, the lesser time is spent in compiling and executing the query.
As an application designer, we might have more context about the data stored in our database. With the contextual knowledge about our database, we might be able to choose a more efficient execution plan than the query optimizer.
This is where the optimizer hints or optimizer guidelines come into picture.
Optimizer hints allow us to control the query optimizer to choose a certain query execution plan based on the specific criteria. In other words, we can hint the optimizer to use or ignore certain optimization plans using optimizer hints.
Usually, optimizer hints should be provided only when executing a complex query involving multiple table joins.
Note that the optimizer hints only affect an individual SQL statement. To alter the optimization strategies at the global level, there are different mechanisms supported by different databases. Optimizer hints provide finer control over other mechanisms which allow altering optimization plans by other means.
Optimizer hints are supported by many databases such as
MySQL,
PostgreSQL with the help of pg_hint_plan
extension,
Oracle,
MS SQL,
IBM DB2,
etc. with varying syntax and options.
Before Rails 6, we have to execute a raw SQL query to use the optimizer hints.
query = "SELECT
/*+ JOIN_ORDER(articles, users) MAX_EXECUTION_TIME(60000) */
articles.*
FROM articles
INNER JOIN users
ON users.id = articles.user_id
WHERE (published_at > '2019-02-17 13:15:44')
".squish
ActiveRecord::Base.connection.execute(query)
In the above query, we provided two optimizer hints to MySQL .
/*+ HINT_HERE ANOTHER_HINT_HERE ... */
Another approach to use optimizer hints prior to Rails 6 is to use a monkey patch like this.
In Rails 6, using optimizer hints is easier.
The same example looks like this in Rails 6.
Article
.joins(:user)
.where("published_at > ?", 2.months.ago)
.optimizer_hints(
"JOIN_ORDER(articles, users)",
"MAX_EXECUTION_TIME(60000)"
)
This produces the same SQL query as above but the result is of type
ActiveRecord::Relation
.
In PostgreSQL (using the pg_hint_plan
extension), the optimizer hints have a
different syntax.
Article
.joins(:user)
.where("published_at > ?", 2.months.ago)
.optimizer_hints("Leading(articles users)", "SeqScan(articles)")
Please checkout the documentation of each database separately to learn the support and syntax of optimizer hints.
To learn more, please
checkout this PR which introduced
the #optimization_hints
method to Rails 6.
Consider that we have articles
table with some indexes.
class CreateArticles < ActiveRecord::Migration[6.0]
def change
create_table :articles do |t|
t.string :title, null: false
t.string :slug, null: false
t.references :user
t.datetime :published_at
t.text :description
t.timestamps
t.index :slug, unique: true
t.index [:published_at]
t.index [:slug, :user_id]
t.index [:published_at, :user_id]
t.index [:title, :slug]
end
end
end
Let's try to fetch all the articles which have been published in the last 2 months.
>> Article.joins(:user).where("published_at > ?", 2.months.ago)
# Article Load (10.5ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:38:18.647296')
=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>
Let's use EXPLAIN
to investigate why it is taking 10.5ms to execute this
query.
>> Article.joins(:user).where("published_at > ?", 2.months.ago).explain
# Article Load (13.9ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
=> # EXPLAIN for: SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
# +--------+----------+----------------+-----------+------+----------+-------+
# | select | table | possible_keys | key | rows | filtered | Extra |
# | _type | | | | | | |
# +--------+----------+----------------+-----------+------+----------+-------+
# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
# | | | | | | | index |
# +--------+----------+----------------+-----------+------+----------+-------+
# | SIMPLE | articles | index | index | 9866 | 10.0 | Using |
# | | | _articles | _articles | | | where |
# | | | _on_user_id, | _on | | | |
# | | | index | _user_id | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at, | | | | |
# | | | index | | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at | | | | |
# | | | _and_user_id | | | | |
# +--------+----------+----------------+-----------+------+----------+-------+
According to the above table, it appears that the query optimizer is considering
users
table first and then the articles
table.
The rows
column indicates the estimated number of rows the query optimizer
must examine to execute the query.
The
filtered
column indicates an estimated percentage of table rows that will be filtered by
the table condition.
The formula rows x filtered
gives the number of rows that will be joined with
the following table.
Also,
users
table, the number of rows to be joined with the following table is
2 x 100% = 2
,articles
table, the number of rows to be joined with the following table
is 500 * 7.79 = 38.95
.Since the articles
tables contain more records which references very few
records from the users
table, it would be better to consider the articles
table first and then the users
table.
We can hint MySQL to consider the articles
table first as follows.
>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)")
# Article Load (2.2ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:54:06.230651')
=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>
Note that it took 2.2ms now to fetch the same records by providing
JOIN_ORDER(articles, users)
optimization hint.
Let's try to EXPLAIN
what changed by using this JOIN_ORDER(articles, users)
optimization hint.
>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)").explain
# Article Load (4.1ms) SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
=> # EXPLAIN for: SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
# +--------+----------+----------------+-----------+------+----------+--------+
# | select | table | possible_keys | key | rows | filtered | Extra |
# | _type | | | | | | |
# +--------+----------+----------------+-----------+------+----------+--------+
# | SIMPLE | articles | index | index | 769 | 100.0 | Using |
# | | | _articles | _articles | | | index |
# | | | _on_user_id, | _on | | | condi |
# | | | index | _publish | | | tion; |
# | | | _articles | ed_at, | | | Using |
# | | | _on | | | | where |
# | | | _published_at, | | | | |
# | | | index | | | | |
# | | | _articles | | | | |
# | | | _on | | | | |
# | | | _published_at | | | | |
# | | | _and_user_id | | | | |
# +--------+----------+----------------+-----------+------+----------+--------+
# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
# | | | | | | | index |
# +--------+----------+----------------+-----------+------+----------+--------+
The result of the EXPLAIN
query shows that the articles
table was considered
first and then the users
table as expected. We can also see that the
index_articles_on_published_at
index key was considered from the possible keys
to execute the given query. The filtered
column for both tables shows that the
number of filtered rows was 100% which means no filtering of rows occurred.
We hope this example helps in understanding how to use #explain
and
#optimization_hints
methods in order to investigate and debug the performance
issues and then fixing it.
If this blog was helpful, check out our full blog archive.