We write about Ruby on Rails, React.js, React Native, remote work, open source, engineering and design.
Rails 6 has added support to provide optimizer hints.
Many relational database management systems (RDBMS) have a query optimizer. The job of the query optimizer is to determine the most efficient and fast plan to execute a given SQL query. Query optimizer has to consider all possible query execution plans before it can determine which plan is the optimal plan for executing the given SQL query and then compile and execute that query.
An optimal plan is chosen by the query optimizer by calculating the cost of each possible plans. Typically, when the number of tables referenced in a join query increases, then the time spent in query optimization grows exponentially which often affects the system's performance. The fewer the execution plans the query optimizer needs to evaluate, the lesser time is spent in compiling and executing the query.
As an application designer, we might have more context about the data stored in our database. With the contextual knowledge about our database, we might be able to choose a more efficient execution plan than the query optimizer.
This is where the optimizer hints or optimizer guidelines come into picture.
Optimizer hints allow us to control the query optimizer to choose a certain query execution plan based on the specific criteria. In other words, we can hint the optimizer to use or ignore certain optimization plans using optimizer hints.
Usually, optimizer hints should be provided only when executing a complex query involving multiple table joins.
Note that the optimizer hints only affect an individual SQL statement. To alter the optimization strategies at the global level, there are different mechanisms supported by different databases. Optimizer hints provide finer control over other mechanisms which allow altering optimization plans by other means.
Optimizer hints are supported by many databases
such as
MySQL,
PostgreSQL with the help of pg_hint_plan
extension,
Oracle,
MS SQL,
IBM DB2,
etc.
with varying syntax and options.
Before Rails 6, we have to execute a raw SQL query to use the optimizer hints.
1query = "SELECT
2 /*+ JOIN_ORDER(articles, users) MAX_EXECUTION_TIME(60000) */
3 articles.*
4 FROM articles
5 INNER JOIN users
6 ON users.id = articles.user_id
7 WHERE (published_at > '2019-02-17 13:15:44')
8 ".squish
9
10ActiveRecord::Base.connection.execute(query)
In the above query, we provided two optimizer hints to MySQL .
1/*+ HINT_HERE ANOTHER_HINT_HERE ... */
Another approach to use optimizer hints prior to Rails 6 is to use a monkey patch like this.
In Rails 6, using optimizer hints is easier.
The same example looks like this in Rails 6.
1Article
2 .joins(:user)
3 .where("published_at > ?", 2.months.ago)
4 .optimizer_hints(
5 "JOIN_ORDER(articles, users)",
6 "MAX_EXECUTION_TIME(60000)"
7 )
This produces the same SQL query as above
but
the result is of type ActiveRecord::Relation
.
In PostgreSQL (using the pg_hint_plan
extension),
the optimizer hints have a different syntax.
1Article
2 .joins(:user)
3 .where("published_at > ?", 2.months.ago)
4 .optimizer_hints("Leading(articles users)", "SeqScan(articles)")
Please checkout the documentation of each database separately to learn the support and syntax of optimizer hints.
To learn more,
please
checkout this PR
which introduced the #optimization_hints
method
to Rails 6.
Consider that we have articles
table
with some indexes.
1class CreateArticles < ActiveRecord::Migration[6.0]
2 def change
3 create_table :articles do |t|
4 t.string :title, null: false
5 t.string :slug, null: false
6 t.references :user
7 t.datetime :published_at
8 t.text :description
9
10 t.timestamps
11
12 t.index :slug, unique: true
13 t.index [:published_at]
14 t.index [:slug, :user_id]
15 t.index [:published_at, :user_id]
16 t.index [:title, :slug]
17 end
18 end
19end
Let's try to fetch all the articles which have been published in the last 2 months.
1>> Article.joins(:user).where("published_at > ?", 2.months.ago)
2# Article Load (10.5ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:38:18.647296')
3=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>
Let's use EXPLAIN
to investigate
why it is taking 10.5ms
to execute this query.
1>> Article.joins(:user).where("published_at > ?", 2.months.ago).explain
2# Article Load (13.9ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
3=> # EXPLAIN for: SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:39:05.380577')
4# +--------+----------+----------------+-----------+------+----------+-------+
5# | select | table | possible_keys | key | rows | filtered | Extra |
6# | _type | | | | | | |
7# +--------+----------+----------------+-----------+------+----------+-------+
8# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
9# | | | | | | | index |
10# +--------+----------+----------------+-----------+------+----------+-------+
11# | SIMPLE | articles | index | index | 9866 | 10.0 | Using |
12# | | | _articles | _articles | | | where |
13# | | | _on_user_id, | _on | | | |
14# | | | index | _user_id | | | |
15# | | | _articles | | | | |
16# | | | _on | | | | |
17# | | | _published_at, | | | | |
18# | | | index | | | | |
19# | | | _articles | | | | |
20# | | | _on | | | | |
21# | | | _published_at | | | | |
22# | | | _and_user_id | | | | |
23# +--------+----------+----------------+-----------+------+----------+-------+
According to the above table,
it appears that the query optimizer
is considering users
table first
and
then the articles
table.
The rows
column indicates
the estimated number of rows the query optimizer must examine
to execute the query.
The
filtered
column indicates
an estimated percentage of table rows
that will be filtered by the table condition.
The formula rows x filtered
gives the number of rows
that will be joined with the following table.
Also,
users
table,
the number of rows to be joined
with the following table is 2 x 100% = 2
,articles
table,
the number of rows to be joined
with the following table is 500 * 7.79 = 38.95
.Since the articles
tables contain more records
which references very few records from the users
table,
it would be better to consider the articles
table first
and
then the users
table.
We can hint MySQL to consider the articles
table first as follows.
1>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)")
2# Article Load (2.2ms) SELECT `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:54:06.230651')
3=> #<ActiveRecord::Relation [#<Article id: 20, title: "Article 20", slug: "article-20", user_id: 1, ...]>
Note that it took 2.2ms now
to
fetch the same records
by providing JOIN_ORDER(articles, users)
optimization hint.
Let's try to EXPLAIN
what changed
by using this JOIN_ORDER(articles, users)
optimization hint.
1>> Article.joins(:user).where("published_at > ?", 2.months.ago).optimizer_hints("JOIN_ORDER(articles, users)").explain
2# Article Load (4.1ms) SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
3=> # EXPLAIN for: SELECT /*+ JOIN_ORDER(articles, users) */ `articles`.* FROM `articles` INNER JOIN `users` ON `users`.`id` = `articles`.`user_id` WHERE (published_at > '2019-02-17 11:55:24.335152')
4# +--------+----------+----------------+-----------+------+----------+--------+
5# | select | table | possible_keys | key | rows | filtered | Extra |
6# | _type | | | | | | |
7# +--------+----------+----------------+-----------+------+----------+--------+
8# | SIMPLE | articles | index | index | 769 | 100.0 | Using |
9# | | | _articles | _articles | | | index |
10# | | | _on_user_id, | _on | | | condi |
11# | | | index | _publish | | | tion; |
12# | | | _articles | ed_at, | | | Using |
13# | | | _on | | | | where |
14# | | | _published_at, | | | | |
15# | | | index | | | | |
16# | | | _articles | | | | |
17# | | | _on | | | | |
18# | | | _published_at | | | | |
19# | | | _and_user_id | | | | |
20# +--------+----------+----------------+-----------+------+----------+--------+
21# | SIMPLE | users | PRIMARY | PRIMARY | 2 | 100.0 | Using |
22# | | | | | | | index |
23# +--------+----------+----------------+-----------+------+----------+--------+
The result of the EXPLAIN
query shows that
the articles
table was considered first
and
then the users
table as expected.
We can also see that
the index_articles_on_published_at
index key
was considered from the possible keys
to execute the given query.
The filtered
column for both tables shows that
the number of filtered rows was 100%
which means no filtering of rows occurred.
We hope this example helps
in understanding
how to use #explain
and
#optimization_hints
methods
in order to investigate and debug the performance issues
and
then fixing it.