Resque is a Ruby job queue. You can find it and all you need to know about coding for it in the readme for Resque.
Importat note: the discussion below covers how to run Resque on Engine Yard, while the examples regarding custom Chef are tailored to the v4 stack. The content is still valid for stack v5, but custom Chef examples & specifics for Resque on v5 can be found here.
Anatomy of our Resque configuration
The files used to configure Resque in your application are:
File | Description |
---|---|
/data/app_name/shared/config/resque.yml | Points to redis database, is symlinked to your $RAILS_ROOT/config directory |
resque_?.conf | One of each of these per instance to list the queues for that worker |
/engineyard/bin/resque | Monit wrapper provided by engineyard |
/etc/monit.d/resque_app_name.monitrc | Monit configuration file for Resque |
Quick start guide
If you want to get going and work the rest out later, here’s the quick version.
To configure and deploy Resque for an Engine Yard Cloud environment
- Be familiar with Engine Yard docs about:
- The "redis" recipe to install Redis on the "redis" utility instance.
- The "redis-yml" recipe to add a
redis.yml
file to your app instances. - The "resque" recipe to install the Resque gem and the
resque_x.conf
config files.For example:
include_recipe "redis"
include_recipe "redis-yml"
include_recipe "resque"
if node[:instance_role] == 'util'
worker_count = 3
worker_count.times do |count|
run "sudo monit restart resque_APPNAME_{count}"
end
end
config/initializers/resque.rb
:
redis_config = YAML.load_file("#{Rails.root}/config/redis.yml")
Resque.redis = Redis.new(redis_config[Rails.env])
You can now queue up Resque jobs in your application and they will be processed on your utility instance.
For more information on configuring Resque, see the Readme.
Thinking through the configuration
Redis configuration
All our environments use redis on the master database instance (or on the solo instance if just one slice is used with out a separate database) as part of our cloud application infrastructure. Typically for Resque, you use this redis instance. However, you can use a custom chef recipe to put a redis instance on your utility or any slice you want and use that.
Queues vs workers
First decide how many workers you need and how you will allocate those workers.
When writing your Resque code, you have assigned jobs to queues. It is important to note that queues and workers aren’t synonymous. A worker can service many queues, and each queue can be serviced by many workers.
You can have as many workers as your resources allow. Each worker in our default setup is monitored by monit and so has one stanza in our monit configuration per worker.
Each intended worker has a conf file in /data/app_name/shared/config
called resque_conf_name.conf
.
So, for three workers, in an application called myapp, you might have:
/data/myapp/shared/config/resque_0.conf
/data/myapp/shared/config/resque_1.conf
/data/myapp/shared/config/resque_2.conf
Each of these has a QUEUE
statement as described in the Resque readme. The default is QUEUE=*
. However, you may customize it to list the queues you’d like handled by that worker. By choosing how you allocate your queues to your workers, you essentially prioritize the queues.
Each worker, when run, has a memory footprint approximately the size of one of your Unicorn or Passenger workers at start up. Every time it gets a job, it forks a child which is also be about that size, and it grows as big as it needs to.
Stopping jobs
At different times, you need to stop or restart your workers. Perhaps a job has exceeded its allowed memory, or you need to deploy new code or for any number of other reasons.
Workers can be asked to stop one of two ways, with either a SIGTERM
or a SIGQUIT
(kill -15
or kill -3
).
If they receive a SIGQUIT
, they allow an already running job to finish before quitting. If they receive a SIGTERM
, then, if there is a job running, that is killed immediately, along with the worker.
So, the two things that need consideration are how long your job will run for and what are the consequences of a job being terminated during processing.
To TERM or to QUIT
If terminating your job mid-process leaves your databases in a consistent state, doesn’t result in half drawn thumbnails, or cause other embarrassing mishaps, then SIGTERM
is the way forward.
This involves a line in the monit configuration like:
stop program "/engineyard/bin/resque myapp term production resque_0.conf"
If for any reason the worker doesn’t stop, the script checks for and kill its child and then itself with kill -9
.
If, however, your job can’t be interrupted, you need to ask it to stop with QUIT. This involves a line in the monit configuration like:
stop program "/engineyard/bin/resque myapp quit production resque_0.conf"
This allows your script 60 seconds to finish its job before the wrapper script ensures that it has, in fact, died. Note that for the sake of following conventions used in other monit wrapper scripts, quit
and stop
are synonyms.
Time to die
However, we have customers with jobs that 5, 10, and 30 minutes, and even up to 12 hours.
To cater for this, you can set a GRACE_TIME
environment variable:
stop program "/bin/env GRACE_TIME=300 /engineyard/bin/resque myapp stop production resque_0.conf"
This causes the wrapper script to wait 300 seconds before forcing the death of the worker.
Deploy time considerations
It is important that Resque gets restarted when you deploy. Firstly, because, if you don’t, your Resque jobs are carried out with redundant code, possibly against the wrong database schema. Secondly, because only three releases are kept (by default), after the third deploy, the jobs are running on code that is deleted from the disk. This is likely the case if you are intermittently seeing NameError: Uninitialized Constant
.
The correct way to have Resque restarted on each deploy is to have a line like:
run "monit restart all -g app_name_resque"
in your after_symlink deploy hook (where app_name is the name of your application).
However, it is also likely that you don’t want your deploy to run while there are jobs still in action or for Resque to start a new job while the deploy is underway. So, in either your before_symlink
or before_migrate
deploy hook, code like this is in order:
Case 1. We have monit configured to use SIGQUIT and want the workers to stop when they’ve finished the current job. We also don’t want the deploy to proceed if jobs are running.
run "sudo monit stop all -g fractalresque_resque"
if %x[ps axo command|grep resque[-]|grep -c Forked].to_i > 0
raise "Resque Workers Working!!"
end
Case 2. Monit is configured using SIGTERM
- but we want the workers to stop when they’ve finished the current job and we don’t want the deploy to proceed if jobs are running. However if they’re not running, we want the workers stopped.
if %x[ps axo command|grep resque[-]|grep -c Forked].to_i > 0
raise "Resque Workers Working!!"
else
run "sudo monit stop all -g fractalresque_resque"
end
In both cases, make sure to explicitly start Resque after your deploy has finished. Add a before_restart
deploy hook similar to this:
run "sudo monit start all -g fractalresque_resque"
These are suggested starting points, you need to consider what needs to happen in your own situation.
Debugging
Resque logs its activity to: /data/app_name/shared/log/resque_?.log
So, for the worker associated resque_0.conf
, its activity can be seen in /data/app_name/shared/log/resque_0.log
Resque changes the verbosity of its logging, when VERBOSE
or VVERBOSE
environment variables are set. To set these your monit config’s start line will look like
start program “/bin/env VERBOSE=1 /engineyard/bin/resque my_app start production resque_0.conf”
On top of that the monit resque script logs its handling of Resque to
/var/log/syslog
Frequent small jobs
If you have a queue that is servicing frequent small jobs, we’ve experienced bottle neck that you may need to grapple with.
Class caching
So, you’re in production, you’ve followed the Resque readme and loaded the environment in your rake file, and you’ve got config.cache_classes = true
(which is the default).
In case you’re not aware, this setting in config/environments/production.rb
is why you don’t see all those pesky SHOW FIELDS
(assuming MySQL) statements in your production.log, like you do while you’re developing. Its also why you need to restart your application server when you deploy code, unlike in development. In development, the appropriate models are loaded on each request (complete with changes), in production they’re loaded on demand, the first time their called.
So why is that a problem? Because in Resque the worker doesn’t do the work, the child it forks does. For all useful purposes, it has a copy of the worker with your Rails application. At this stage no models have been accessed, and this is what the forked child inherits.
After this child starts processing your job, as each model pertinent to that job is touched, the class code defining that model is ran. This involves issuing a SHOW FIELDS
for each model involved to the database which has locking implications for your database. Further some fat models, may also have a substantial time cost spent in ruby itself. In fact for a quick job, most of the time could be spent instantiating your models.
A simple solution is to modify your Rakefile or, wherever you set your environment up, to change this line:
task "resque:setup" => :environment
to something like this:
task "resque:setup" => :environment do
User.columns
Post.columns
end
Or perhaps as a way to hit all your models at once:
task "resque:setup" => :environment do
ActiveRecord::Base.send(:subclasses).each { |klass| klass.columns }
end
If you have feedback or questions about this page, add a comment below. If you need help, submit a ticket with Engine Yard Support.
The Frequent small jobs section has not been needed since resque 1.18. That version started to automatically preload classes. I do however recommend using >= 1.19 as you can see from history that it was patched multiple times.
https://github.com/defunkt/resque/blob/master/HISTORY.md
Hi Petteri,
Great catch! I'll notify our documentation team of that!
Best,
John
John: Further investigation shows that one part of the frequent small jobs chapter is still relevant. The information that follows is based on investigating Rails 3.1. Rails eager loading of the class files does not load the columns so the part about preloading column information is still accurate. The current solution for it does have a couple shortcomings / issues though. The first is that calling ActiveRecord::Base.send(subclasses) is likely to come out with an empty array unless your initializers are loading your models. The resque call to Rails.application.eager_load! comes after running resque:setup. The second problem is that klass.columns is not the only method going to the database to query schema information. The other I saw when looking into what happens during a single worker is .primary_key. In the end here's the code I put to an initializer to take load schema information when Resque calls to eager_load!
<code>
class ActiveRecord::Base
module SchemaPreload
def inherited(subclass)
super subclass
subclass.primary_key
subclass.columns
end
end
extend SchemaPreload
end
</code>
John, can you please confirm if Petteri's comment is correct of if we can safely ignore the section on Frequent Small Jobs?
Hi Dave,
Newer versions of Resque should behave better with smaller jobs, however, as Petteri pointed out it is still relevant.
If you experience issues with class loading, trying the initializer Petteri mentioned may work for you. Another alternative you can try for lots of small jobs is using resque-multi-job-forks as well. We've had too luck with that, but it's not for every app.
Thanks!
When I restart my server I occasionally get the following error:
Redis::InheritedError: Tried to use a connection from a child process without reconnecting. You need to reconnect to Redis after forking.
[GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:285:in `ensure_connected'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:177:in `block in process'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:256:in `logging'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:176:in `process'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:84:in `call'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:1159:in `block in sadd'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:36:in `block in synchronize'
/usr/lib64/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:36:in `synchronize'
[GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:1158:in `sadd'
[GEM_ROOT]/gems/redis-namespace-1.2.1/lib/redis/namespace.rb:257:in `method_missing'
[GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:227:in `watch_queue'
[GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:172:in `push'
[GEM_ROOT]/gems/resque-1.23.1/lib/resque/job.rb:51:in `create'
[GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:271:in `enqueue_to'
[GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:252:in `enqueue'
lib/later.rb:37:in `later'
app/services/metrics_service.rb:11:in `async_record'
app/services/places_service.rb:8:in `find_places'
app/controllers/api/general/places_controller.rb:11:in `index'
Any ideas as to why this is happening? It seems to have started after I installed resque-scheduler and occurs when I try to add a background job from my web app to redis. The failure happens a few times an then there are no more exceptions. This makes me think that there is some kind of issue with the passenger workers when they get forked.
Thanks!
Hi Ilya,
Please open a support ticket so that we can take a closer look into this--it definitely sounds like an interesting issue.
-Don
I noticed that the stop commands are either TERM or QUIT, but that the latest recipes have this entered:
stop program = "/engineyard/bin/resque <%= @app\_name %> stop <%= @rails\_env %> resque\_<%= num %>.conf" with timeout 90 seconds
Is the TERM/QUIT issue being handled automatically or something now?
Hey Brian - According to the docs, STOP is an alias for QUIT.
Can i setup resque without redis utility instance? It seems that redis-server is not installed by default now on typical solo/app_master server (it was before)
Hello Pawel,
You can run redis via a recipe on the solo instance. We removed the running redis from the solo instance because by default the database wasn't being backed up so it was removed to discourage customers using the default instances and having data loss. Our cloud recipe can be modified to install redis on the solo instance.
For anyone trying to set this up using the v4 stack and a utility server, I had to make the following changes whcih are not documented above to the cookbooks:
redis-yml cookbook: change line 1 to include the util server.to be 'if ['app_master', 'app', 'util'].include?(node[:instance_role])'
redis cookbook: follow instructions at the top of the readme file
Guess it would be nice if both of these were setup as default in the cookbooks too.
Any EY specific recommendations for standing up resque-web in Cloud? Some clients like to have the dashboard, which at least in one possible configuration requires setting up resque-web itself and the nginx rules (say on app-master) to securely access it.
Christopher,
Users who want to configure either resque-web or sidekiq-web typically do so by mounting the app within the Rails application. That way, there doesn't really need to be any Nginx configuration and it should be possible to just hit the URL.
Evan
There's a typo that caught me out and meant the resque wokers weren't getting restarted when I deployed.
run "sudo monit restart resque_APPNAME_{count}"
is missing a # in front of the count variable. It should be
run "sudo monit restart resque_APPNAME_#{count}"
What's the proper parameter order for restart jobs with monit? I see this line for resque (above)
"monit restart all -g app_name_resque"
and I see this line on the delayed jobs cookbook README
"monit -g dj_<app\_name> restart all"
Does parameter order matter and why is it app_name_resque and dj_app_name? Should that be consistent?