Configure and Deploy Resque

Resque is a Ruby job queue. You can find it and all you need to know about coding for it in the readme for Resque.

Importat note:  the discussion below covers how to run Resque on Engine Yard, while the examples regarding custom Chef are tailored to the v4 stack.  The content is still valid for stack v5, but custom Chef examples & specifics for Resque on v5 can be found here.

Anatomy of our Resque configuration

The files used to configure Resque in your application are:

File Description
/data/app_name/shared/config/resque.yml                                     Points to redis database, is symlinked to your $RAILS_ROOT/config directory              
resque_?.conf One of each of these per instance to list the queues for that worker
/engineyard/bin/resque Monit wrapper provided by engineyard
/etc/monit.d/resque_app_name.monitrc Monit configuration file for Resque

Quick start guide

If you want to get going and work the rest out later, here’s the quick version.

To configure and deploy Resque for an Engine Yard Cloud environment

    1. Be familiar with Engine Yard docs about:
  • Boot a utility instance and give it the name "redis". Then enable the following recipes in your custom chef cookbooks:
    • The "redis" recipe to install Redis on the "redis" utility instance.
    • The "redis-yml" recipe to add a redis.yml file to your app instances.
    • The "resque" recipe to install the Resque gem and the resque_x.conf config files.

      For example:

      include_recipe "redis"

      include_recipe "redis-yml"

      include_recipe "resque"

  • Modify the redis-yml recipe to point at the correct instance (see the comments at the top of cookbooks/redis-yml/recipes/default.rb).
  • Add the following to your after_symlink deploy hook to ensure that Resque restarts when you deploy:
    if node[:instance_role] == 'util' 
      worker_count = 3
      worker_count.times do |count|
        run "sudo monit restart resque_APPNAME_{count}"
      end
    end
  • Connect Resque to Redis by adding the following into config/initializers/resque.rb:
    redis_config = YAML.load_file("#{Rails.root}/config/redis.yml") 
    Resque.redis = Redis.new(redis_config[Rails.env])
  • You can now queue up Resque jobs in your application and they will be processed on your utility instance.

    For more information on configuring Resque, see the Readme.

    Thinking through the configuration

    Redis configuration

    All our environments use redis on the master database instance (or on the solo instance if just one slice is used with out a separate database) as part of our cloud application infrastructure. Typically for Resque, you use this redis instance. However, you can use a custom chef recipe to put a redis instance on your utility or any slice you want and use that.

    Queues vs workers

    First decide how many workers you need and how you will allocate those workers.

    When writing your Resque code, you have assigned jobs to queues. It is important to note that queues and workers aren’t synonymous. A worker can service many queues, and each queue can be serviced by many workers.

    You can have as many workers as your resources allow. Each worker in our default setup is monitored by monit and so has one stanza in our monit configuration per worker.

    Each intended worker has a conf file in /data/app_name/shared/config called resque_conf_name.conf.

    So, for three workers, in an application called myapp, you might have:

    /data/myapp/shared/config/resque_0.conf 
    /data/myapp/shared/config/resque_1.conf
    /data/myapp/shared/config/resque_2.conf

    Each of these has a QUEUE statement as described in the Resque readme. The default is QUEUE=*. However, you may customize it to list the queues you’d like handled by that worker. By choosing how you allocate your queues to your workers, you essentially prioritize the queues.

    Each worker, when run, has a memory footprint approximately the size of one of your Unicorn or Passenger workers at start up. Every time it gets a job, it forks a child which is also be about that size, and it grows as big as it needs to.

    Stopping jobs

    At different times, you need to stop or restart your workers. Perhaps a job has exceeded its allowed memory, or you need to deploy new code or for any number of other reasons.

    Workers can be asked to stop one of two ways, with either a SIGTERM or a SIGQUIT (kill -15 or kill -3).

    If they receive a SIGQUIT, they allow an already running job to finish before quitting. If they receive a SIGTERM, then, if there is a job running, that is killed immediately, along with the worker.

    So, the two things that need consideration are how long your job will run for and what are the consequences of a job being terminated during processing.

    To TERM or to QUIT

    If terminating your job mid-process leaves your databases in a consistent state, doesn’t result in half drawn thumbnails, or cause other embarrassing mishaps, then SIGTERM is the way forward.

    This involves a line in the monit configuration like:

    stop program "/engineyard/bin/resque myapp term production resque_0.conf" 

    If for any reason the worker doesn’t stop, the script checks for and kill its child and then itself with kill -9.

    If, however, your job can’t be interrupted, you need to ask it to stop with QUIT. This involves a line in the monit configuration like:

    stop program "/engineyard/bin/resque myapp quit production resque_0.conf" 

    This allows your script 60 seconds to finish its job before the wrapper script ensures that it has, in fact, died. Note that for the sake of following conventions used in other monit wrapper scripts, quit and stop are synonyms.

    Time to die

    However, we have customers with jobs that 5, 10, and 30 minutes, and even up to 12 hours.

    To cater for this, you can set a GRACE_TIME environment variable:

    stop program "/bin/env GRACE_TIME=300 /engineyard/bin/resque myapp stop production resque_0.conf" 

    This causes the wrapper script to wait 300 seconds before forcing the death of the worker.

    Deploy time considerations

    It is important that Resque gets restarted when you deploy. Firstly, because, if you don’t, your Resque jobs are carried out with redundant code, possibly against the wrong database schema. Secondly, because only three releases are kept (by default), after the third deploy, the jobs are running on code that is deleted from the disk. This is likely the case if you are intermittently seeing NameError: Uninitialized Constant.

    The correct way to have Resque restarted on each deploy is to have a line like:

    run "monit restart all -g app_name_resque" 

    in your after_symlink deploy hook (where app_name is the name of your application).

    However, it is also likely that you don’t want your deploy to run while there are jobs still in action or for Resque to start a new job while the deploy is underway. So, in either your before_symlink or before_migrate deploy hook, code like this is in order:

    Case 1. We have monit configured to use SIGQUIT and want the workers to stop when they’ve finished the current job. We also don’t want the deploy to proceed if jobs are running.

    run "sudo monit stop all -g fractalresque_resque" 
    if %x[ps axo command|grep resque[-]|grep -c Forked].to_i > 0
    raise "Resque Workers Working!!"
    end

    Case 2. Monit is configured using SIGTERM - but we want the workers to stop when they’ve finished the current job and we don’t want the deploy to proceed if jobs are running. However if they’re not running, we want the workers stopped.

    if %x[ps axo command|grep resque[-]|grep -c Forked].to_i > 0 
    raise "Resque Workers Working!!"
    else
    run "sudo monit stop all -g fractalresque_resque"
    end

    In both cases, make sure to explicitly start Resque after your deploy has finished. Add a before_restart deploy hook similar to this:

    run "sudo monit start all -g fractalresque_resque" 

    These are suggested starting points, you need to consider what needs to happen in your own situation.

    Debugging

    Resque logs its activity to: /data/app_name/shared/log/resque_?.log

    So, for the worker associated resque_0.conf, its activity can be seen in /data/app_name/shared/log/resque_0.log

    Resque changes the verbosity of its logging, when VERBOSE or VVERBOSE environment variables are set. To set these your monit config’s start line will look like

    start program “/bin/env VERBOSE=1 /engineyard/bin/resque my_app start production resque_0.conf”


    On top of that the monit resque script logs its handling of Resque to

    /var/log/syslog

      

    Frequent small jobs

    If you have a queue that is servicing frequent small jobs, we’ve experienced bottle neck that you may need to grapple with.

    Class caching

    So, you’re in production, you’ve followed the Resque readme and loaded the environment in your rake file, and you’ve got config.cache_classes = true (which is the default).

    In case you’re not aware, this setting in config/environments/production.rb is why you don’t see all those pesky SHOW FIELDS (assuming MySQL) statements in your production.log, like you do while you’re developing. Its also why you need to restart your application server when you deploy code, unlike in development. In development, the appropriate models are loaded on each request (complete with changes), in production they’re loaded on demand, the first time their called.

    So why is that a problem? Because in Resque the worker doesn’t do the work, the child it forks does. For all useful purposes, it has a copy of the worker with your Rails application. At this stage no models have been accessed, and this is what the forked child inherits.

    After this child starts processing your job, as each model pertinent to that job is touched, the class code defining that model is ran. This involves issuing a SHOW FIELDS for each model involved to the database which has locking implications for your database. Further some fat models, may also have a substantial time cost spent in ruby itself. In fact for a quick job, most of the time could be spent instantiating your models.

    A simple solution is to modify your Rakefile or, wherever you set your environment up, to change this line:

    task "resque:setup" => :environment 

    to something like this:

    task "resque:setup" => :environment do 
    User.columns
    Post.columns
    end

    Or perhaps as a way to hit all your models at once:

    task "resque:setup" => :environment do 
    ActiveRecord::Base.send(:subclasses).each { |klass| klass.columns }
    end

    If you have feedback or questions about this page, add a comment below. If you need help, submit a ticket with Engine Yard Support.

    Comments

    • Avatar
      Petteri Räty

      The Frequent small jobs section has not been needed since resque 1.18. That version started to automatically preload classes. I do however recommend using >= 1.19 as you can see from history that it was patched multiple times.

      https://github.com/defunkt/resque/blob/master/HISTORY.md

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Hi Petteri,

      Great catch!  I'll notify our documentation team of that!

      Best,

      John

      0
      Comment actions Permalink
    • Avatar
      Petteri Räty

      John: Further investigation shows that one part of the frequent small jobs chapter is still relevant. The information that follows is based on investigating Rails 3.1.  Rails eager loading of the class files does not load the columns so the part about preloading column information is still accurate. The current solution for it does have a couple shortcomings / issues though. The first is that calling ActiveRecord::Base.send(subclasses) is likely to come out with an empty array unless your initializers are loading your models. The resque call to Rails.application.eager_load! comes after running resque:setup. The second problem is that klass.columns is not the only method going to the database to query schema information. The other I saw when looking into what happens during a single worker is .primary_key. In the end here's the code I put to an initializer to take load schema information when Resque calls to eager_load!

      <code>

      class ActiveRecord::Base

        module SchemaPreload

          def inherited(subclass)

            super subclass

            subclass.primary_key

            subclass.columns

          end

        end

        extend SchemaPreload

      end

      </code>

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      John, can you please confirm if Petteri's comment is correct of if we can safely ignore the section on Frequent Small Jobs?

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Hi Dave,

      Newer versions of Resque should behave better with smaller jobs, however, as Petteri pointed out it is still relevant.

      If you experience issues with class loading, trying the initializer Petteri mentioned may work for you.  Another alternative you can try for lots of small jobs is using resque-multi-job-forks as well.  We've had too luck with that, but it's not for every app.

      Thanks!

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      When I restart my server I occasionally get the following error:

      Redis::InheritedError: Tried to use a connection from a child process without reconnecting. You need to reconnect to Redis after forking.

       

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:285:in `ensure_connected'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:177:in `block in process'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:256:in `logging'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:176:in `process'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis/client.rb:84:in `call'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:1159:in `block in sadd'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:36:in `block in synchronize'

      /usr/lib64/ruby/1.9.1/monitor.rb:211:in `mon_synchronize'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:36:in `synchronize'

      [GEM_ROOT]/gems/redis-3.0.3/lib/redis.rb:1158:in `sadd'

      [GEM_ROOT]/gems/redis-namespace-1.2.1/lib/redis/namespace.rb:257:in `method_missing'

      [GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:227:in `watch_queue'

      [GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:172:in `push'

      [GEM_ROOT]/gems/resque-1.23.1/lib/resque/job.rb:51:in `create'

      [GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:271:in `enqueue_to'

      [GEM_ROOT]/gems/resque-1.23.1/lib/resque.rb:252:in `enqueue'

      lib/later.rb:37:in `later'

      app/services/metrics_service.rb:11:in `async_record'

      app/services/places_service.rb:8:in `find_places'

      app/controllers/api/general/places_controller.rb:11:in `index'

       

      Any ideas as to why this is happening? It seems to have started after I installed resque-scheduler and occurs when I try to add a background job from my web app to redis. The failure happens a few times an then there are no more exceptions. This makes me think that there is some kind of issue with the passenger workers when they get forked.

       

      Thanks!

       

      • Ilya
      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Hi Ilya, 

      Please open a support ticket so that we can take a closer look into this--it definitely sounds like an interesting issue. 

      -Don

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      I noticed that the stop commands are either TERM or QUIT, but that the latest recipes have this entered:

       

      stop program = "/engineyard/bin/resque <%= @app\_name %> stop <%= @rails\_env %> resque\_<%= num %>.conf" with timeout 90 seconds

       

      Is the TERM/QUIT issue being handled automatically or something now?

      0
      Comment actions Permalink
    • Avatar
      Tom Hoen

      Hey Brian - According to the docs, STOP is an alias for QUIT.

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Can i setup resque without redis utility instance? It seems that redis-server is not installed by default now on typical solo/app_master server (it was before)

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Hello Pawel,

       

      You can run redis via a recipe on the solo instance. We removed the running redis from the solo instance because by default the database wasn't being backed up so it was removed to discourage customers using the default instances and having data loss. Our cloud recipe can be modified to install redis on the solo instance.

      0
      Comment actions Permalink
    • Avatar
      Scott Sherwood

      For anyone trying to set this up using the v4 stack and a utility server, I had to make the following changes whcih are not documented above to the cookbooks:

      1. redis-yml cookbook: change line 1 to include the util server.to be 'if ['app_master', 'app', 'util'].include?(node[:instance_role])'

      2. redis cookbook: follow instructions at the top of the readme file

      Guess it would be nice if both of these were setup as default in the cookbooks too.

      0
      Comment actions Permalink
    • Avatar
      Christopher Haupt

      Any EY specific recommendations for standing up resque-web in Cloud? Some clients like to have the dashboard, which at least in one possible configuration requires setting up resque-web itself and the nginx rules (say on app-master) to securely access it.

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      Christopher,

      Users who want to configure either resque-web or sidekiq-web typically do so by mounting the app within the Rails application. That way, there doesn't really need to be any Nginx configuration and it should be possible to just hit the URL.

      Evan

      0
      Comment actions Permalink
    • Avatar
      Dan Moore

      There's a typo that caught me out and meant the resque wokers weren't getting  restarted when I deployed. 

      run "sudo monit restart resque_APPNAME_{count}"

      is missing a # in front of the count variable. It should be 

      run "sudo monit restart resque_APPNAME_#{count}"

       

       

      0
      Comment actions Permalink
    • Avatar
      Permanently deleted user

      What's the proper parameter order for restart jobs with monit?  I see this line for resque (above)

      "monit restart all -g app_name_resque" 

      and I see this line on the delayed jobs cookbook README

      "monit -g dj_<app\_name> restart all"

      Does parameter order matter and why is it app_name_resque and dj_app_name? Should that be consistent?

       

      0
      Comment actions Permalink

    Please sign in to leave a comment.

    Powered by Zendesk