BigBinary Blog
We write about Ruby on Rails, React.js, React Native, remote work, open source, engineering and design.
Rails 6.0 was recently released.
Before Rails 6, we have to provide a custom block to perform custom logging and monitoring around retries and discards of the jobs defined using Active Job framework.
class Container::DeleteJob < ActiveJob::Base
retry_on Timeout::Error, wait: 2.seconds, attempts: 3 do |job, error|
message = "Stopped retrying #{job.class} (JID #{job.job_id})
with #{job.arguments.join(', ')} due to
'#{error.class} - #{error.message}'.
This job was retried for #{job.executions} times.".squish
BackgroundJob::ErrorLogger.log(message)
end
discard_on Container::NotFoundError do |job, error|
message = "Discarded #{job.class} (JID #{job.job_id})
with #{job.arguments.join(', ')} due to
'#{error.class} - #{error.message}' error.".squish
BackgroundJob::ErrorLogger.log(message)
end
def perform(container_id)
Container::DeleteService(container_id).process
# Will raise Container::NotFoundError
# if no container is found with 'container_id'.
# Might raise Timeout::Error when the remote API is not responding.
end
end
Notice the custom blocks provided
to retry_on
and discard_on
methods
to an individual job
in the above example.
Extracting such custom logic to a base class or to a 3rd-party gem is possible but it will be non-standard and will be a bit difficult task.
An alternative approach is
to subscribe
to the hooks instrumented using Active Support Instrumentation API
which is a standard and recommended way.
Prior versions of Rails 6 already instruments
some hooks
such as
enqueue_at.active_job
, enqueue.active_job
, perform_start.active_job
,
and perform.active_job
.
Unfortunately no hook is instrumented
around retries and discards of an Active Job
prior to Rails 6.
Rails 6 has introduced hooks to Active Job around retries and discards to which one can easily subscribe using Active Support Instrumentation API to perform custom logging and monitoring or to collect any custom information.
The newly introduced hooks are
enqueue_retry.active_job
, retry_stopped.active_job
and discard.active_job
.
Let's discuss each of these hooks in detail.
Note that whenever we say a job
,
it means a job of type ActiveJob
.
enqueue_retry.active_job
The enqueue_retry.active_job
hook
is
instrumented when a job
is
enqueued to retry again
due to occurrence of an exception which
is configured using the retry_on
method
in the job's definition.
This hook is triggered
only when above condition is satisfied
and
the number of executions of the job
is
less than the number of attempts
defined using the retry_on
method.
The number of attempts
is by default set to 5
if not defined explicitly.
This is how we would subscribe to this hook and perform custom logging in our Rails application.
ActiveSupport::Notifications.subscribe "enqueue_retry.active_job" do |*args|
event = ActiveSupport::Notifications::Event.new *args
payload = event.payload
job = payload[:job]
error = payload[:error]
message = "#{job.class} (JID #{job.job_id})
with arguments #{job.arguments.join(', ')}
will be retried again in #{payload[:wait]}s
due to error '#{error.class} - #{error.message}'.
It is executed #{job.executions} times so far.".squish
BackgroundJob::Logger.log(message)
end
Note that the BackgroundJob::Logger
above is our custom logger.
If we want, we can add any other logic instead.
We will change the definition of Container::DeleteJob
job as below.
class Container::DeleteJob < ActiveJob::Base
retry_on Timeout::Error, wait: 2.seconds, attempts: 3
def perform(container_id)
Container::DeleteService(container_id).process
# Will raise Timeout::Error when the remote API is not responding.
end
end
Let's enqueue this job.
Container::DeleteJob.perform_now("container-1234")
Assume that this job keeps throwing Timeout::Error
exception
due to a network issue.
The job will be retried twice
since it is configured to retry
when a Timeout::Error
exception occurs
up to maximum 3 attempts.
While retrying this job,
Active Job will instrument enqueue_retry.active_job
hook
along with the necessary job payload.
Since we have already subscribed to this hook,
our subscriber would log something like this
with the help of BackgroundJob::Logger.log
.
Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 1 times so far.
Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 2 times so far.
retry_stopped.active_job
The retry_stopped.active_job
hook is triggered
when a job is retried
up to the available number of attempts
.
Let's see how this hook is triggered.
Along with the subscription for the enqueue_retry.active_job
hook,
let's subscribe to the retry_stopped.active_job
hook, too.
ActiveSupport::Notifications.subscribe "retry_stopped.active_job" do |*args|
event = ActiveSupport::Notifications::Event.new *args
payload = event.payload
job = payload[:job]
error = payload[:error]
message = "Stopped processing #{job.class} (JID #{job.job_id})
further with arguments #{job.arguments.join(', ')}
since it failed due to '#{error.class} - #{error.message}' error
which reoccurred #{job.executions} times.".squish
BackgroundJob::Logger.log(message)
end
Let's keep the Container::DeleteJob
job's definition unchanged
and enqueue it again.
Container::DeleteJob.perform_now("container-1234")
We will assume that the job will keep throwing Timeout::Error
exception
due to a network issue.
In the logs recorded using BackgroundJob::Logger.log
,
we should see something like this.
Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 1 times so far.
Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) with arguments container-1234 will be retried again in 2s due to error 'Timeout::Error - Couldn't establish connection to cluster within 10s'. It is executed 2 times so far.
Stopped processing Container::DeleteJob (JID 770f4f2a-daa7-4c1e-be51-24adc04b4cb2) further with arguments container-1234 since it failed due to 'Timeout::Error - Couldn't establish connection to cluster within 10s' error which reoccurred 3 times.
Notice the last entry in the logs above and its order.
discard.active_job
The discard.active_job
hook is triggered
when a job's further execution is discarded
due to occurrence of an exception
which is configured using discard_on
method.
To see how this hook is triggered,
we will subscribe to the discard.active_job
hook.
ActiveSupport::Notifications.subscribe "discard.active_job" do |*args|
event = ActiveSupport::Notifications::Event.new *args
payload = event.payload
job = payload[:job]
error = payload[:error]
message = "Discarded #{job.class} (JID #{job.job_id})
with arguments #{job.arguments.join(', ')}
due to '#{error.class} - #{error.message}' error.".squish
BackgroundJob::Logger.log(message)
end
We will configure our Container::DeleteJob
job
to discard
when Container::NotFoundError
exception occurs
while executing the job.
class Container::DeleteJob < ActiveJob::Base
discard_on Container::NotFoundError
retry_on Timeout::Error, wait: 2.seconds, attempts: 3
def perform(container_id)
Container::DeleteService(container_id).process
# Will raise Container::NotFoundError
# if no container is found with 'container_id'.
# Will raise Timeout::Error when the remote API is not responding.
end
end
Let's enqueue this job
and
assume that it would throw Container::NotFoundError
exception.
Container::DeleteJob.perform_now("unknown-container-9876")
We should see following in the logs recorded by BackgroundJob::Logger.log
.
Discarded Container::DeleteJob (JID e9b1cb5c-6d2d-49ae-b1d7-fef44f09ab8d) with arguments unknown-container-9876 due to 'Container::NotFoundError - Container 'unknown-container-9876' was not found' error.
These new hooks are also instrumented
for the jobs
which are enqueued using perform_later
method
since both perform_now
and perform_later
calls perform
method
under the hood.
Active Job already subscribes to the these hooks and writes them using the Rails' default logger.
If a block is provided to retry_on
or discard_on
methods
then an applicable hook is instrumented first
and
then the given block is yielded.