Martian Chronicles
Evil Martians’ team blog
Ops

An annoying Capistrano, Unicorn and Bundler issue

In short: always use an absolute path to Unicorn in Capistrano scripts; otherwise your application server will fail to restart after a number of deploys because of Capistrano release cleanups.

Last month, we came across an interesting problem: sometimes after a successful deploy of one of our Ruby on Rails 4 applications, a Unicorn server could not be restarted using zero downtime deployment signal. That affected all environments we run the application in — unfortunately, including production. As a result, we got annoying exceptions like ActionView::MissingTemplate in the production environment. Only a hard restart of Unicorn would fix that kind of exceptions.

We had no idea what could be the source of the issue, so we decided to check this case out using our monitoring system (Zabbix) to catch such incidents immediately and act on it.

Here is a simple Rails code snippet we’ve used to detect if Unicorn was properly restarted:

# app/controllers/monitoring_controller.rb
class MonitoringController < ActionController::Metal
  include ActionController::Head

  def unicorn
    if unicorn_revision.eql?(current_revision)
      head 200
    else
      head 500
    end
  end

  private

  def unicorn_revision
    File.read(Rails.root.join('REVISION').chomp
  end

  def current_revision
    File.read(Rails.root.join('../../current/REVISION')).chomp
  end
end
# config/routes.rb
Rails.application.routes.draw do
  # ...
  get '/monitoring/unicorn' => 'monitoring#unicorn'
end

Note that the REVISION file is written by Capistrano on every deploy, so we can simply compare the contents of the file with the current application revision we have in memory.

And here is a snippet of Zabbix configuration we have used:

UserParameter=unicorn.revision.check,/usr/bin/curl -H 'Host: example.com' -sL -w '%{http_code}' http://<unicorn ip address>/monitoring/unicorn

Next, we’ve created a Zabbix agent item (“Revision status check”) and used it for a trigger:

( {TRIGGER.VALUE}=0 and
  {Unicorn revision template:unicorn.revision.check.count(#3,200,"ne")}=3 )
  or
( {TRIGGER.VALUE}=1 and
  {Unicorn revision template:unicorn.revision.check.count(#2,200,"ne")}=2 )

Now, how does this trigger work, exactly? It checks for a non-200 HTTP response and ensures that it persists during a period, to prevent false alerts during deploys and rebouncing. Google “Hysteresis in Zabbix” for more details, if you’re interested.

The first alert we’ve got led us to the following output in the Unicorn log file:

I, [2015-03-13T11:41:29.862602 #1285]  INFO -- : executing ["/home/project/releases/20150312084505/vendor/bundle/ruby/2.1.0/bin/unicorn_rails", "-c", "config/unicorn.rb", "-E", "production", "-D", {12=>#<Kgio::UNIXServer:fd 12>}] (in /home/project/releases/20150313084000)
/home/project/current/vendor/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:475:in `exec': No such file or directory - /home/project/releases/20150312084505/vendor/bundle/ruby/2.1.0/bin/unicorn_rails (Errno::ENOENT)

Last month, we came across an interesting problem: sometimes after a successful deploy of one of our Ruby on Rails 4 applications, a Unicorn server could not be restarted using zero downtime deployment signal.

And at that moment I’ve started to get a clear understanding of what’s going on.

Here is how zero-downtime Unicorn deploy works: it starts the master process and replaces it with a fork on the next deploy.

On the first deploy, bundle exec unicorn_rails command was mapped to /home/project/releases/1/vendor/bundle/ruby/2.1.0/bin/unicorn_rails, which started properly. But later, let’s say on 100th deploy, the original Unicorn still pointed to the /home/project/releases/1 directory from the first release. By default, Capistrano 3 keeps only the last five releases and cleans up older releases (deploy:cleanup task runs automatically by default), and obviously /home/project/releases/1 did not exist anymore.

Since /home/project/releases/1/vendor/bundle is just a symlink to /home/project/shared/vendor/bundle, we had to configure Bundler to map commands to the shared directory — which exists forever.

$ bundle install --deployment
$ cat .bundle/config
#> BUNDLE_FROZEN: '1'
#> BUNDLE_PATH: "vendor/bundle"

Here is a problem: BUNDLE_PATH is relative, and it maps to /home/project/releases/1/vendor/bundle, which would be destroyed later after a few deploys.

And the fix:

$ bundle install --deployment --path /home/project/shared/vendor/bundle
$ cat .bundle/config
#> BUNDLE_FROZEN: '1'
#> BUNDLE_PATH: "/home/project/shared/vendor/bundle"

In Capistrano 3, this patch can be applied by setting the bundle_path option:

set :bundle_path, -> { shared_path.join('vendor/bundle') }

Done.


In spite of my history of contributions to Capistrano and Bundler, it was one of the most interesting bugs I’ve found recently.
We’ve decided to keep monitoring Unicorn revisions to catch any failed restarts in future.