September 12, 2022
In this blog we would be discussing three separate case studies of Redis running out of memory. All the three case studies have videos demonstrating how the debugging was done.
All the three videos were prepared for my team members to show how to go about debugging. The videos are being presented "as it was recorded".
When a job fails in Sidekiq, Sidekiq puts that job in RetrySet and retries that job until the job succeeds or the job reaches the maximum number of retries. By default the maximum number of retries is 25. If a job fails 25 times then that job is moved to the DeadSet. By default Sidekiq will store up to 10,000 jobs in the deadset.
We had a situation where Redis was running out of memory. Here is how the debugging was done.
ds = Sidekiq::DeadSet.new
ds.each do |job|
puts "Job #{job['jid']}: #{job['class']} failed at #{job['failed_at']}"
end
Running the following to view the latest entry to the dataset:
ds.first
ds.count
To see the memory usage following commands were executed in the Redis console.
> memory usage dead
30042467
> type dead
zset
As discussed in the video large amount of payload was being sent. This is not
the right way to send data to the worker. Ideally some sort of id
should be
sent to the worker and the worker should be able to get the necessary data from
the database based on the received id
.
In this case the Redis instance of neetochat was running out of memory. The Redis instance had 50MB capacity but we were getting the following error.
ERROR: heartbeat: OOM command not allowed when used memory > 'maxmemory'.
We were pushing too many geo info records to Redis and that caused the memory to fill up. Here is the video capturing the debugging session.
Followings are the commands that were executed while debugging.
> ping
PONG
> info
> info memory
> info keyspace
> keys *failed*
> keys *process*
> keys *geocoder*
> get getocoder:http://ipinfo.io/41.174.30.55/geo?
In this case the authentication service of neeto was failing because of memory exhaustion.
Here the number of keys was limited but the payload data was huge and all that payload data was hogging the memory. Here is the video capturing the debugging session.
Followings are the commands that were executed while debugging.
> ping
> info keyspace
db0:keys=106, expires=86,avg_ttl=1233332728573
> key * (to see all the keys)
Last command listed all the 106 keys. Next we needed to find how much memory each of these keys are using. For that the following commands were executed.
> memory usage organizations/subdomains/bigbinary/neeto_app_links
736 bytes
> memory usage failed
10316224 (10MB)
> memory usage dead
29871174 (29MB)
If this blog was helpful, check out our full blog archive.