An annoying Capistrano, Unicorn and Bundler issue
In short: always use an absolute path to Unicorn in Capistrano scripts; otherwise your application server will fail to restart after a number of deploys because of Capistrano release cleanups.
Last month, we came across an interesting problem: sometimes after a successful deploy of one of our Ruby on Rails 4 applications, a Unicorn server could not be restarted using zero downtime deployment signal. That affected all environments we run the application in — unfortunately, including production. As a result, we got annoying exceptions like ActionView::MissingTemplate
in the production environment. Only a hard restart of Unicorn would fix that kind of exceptions.
We had no idea what could be the source of the issue, so we decided to check this case out using our monitoring system (Zabbix) to catch such incidents immediately and act on it.
Here is a simple Rails code snippet we’ve used to detect if Unicorn was properly restarted:
# app/controllers/monitoring_controller.rb
class MonitoringController < ActionController::Metal
include ActionController::Head
def unicorn
if unicorn_revision.eql?(current_revision)
head 200
else
head 500
end
end
private
def unicorn_revision
File.read(Rails.root.join('REVISION')).chomp
end
def current_revision
File.read(Rails.root.join('../../current/REVISION')).chomp
end
end
# config/routes.rb
Rails.application.routes.draw do
# ...
get '/monitoring/unicorn' => 'monitoring#unicorn'
end
Note that the REVISION
file is written by Capistrano on every deploy, so we can simply compare the contents of the file with the current application revision we have in memory.
And here is a snippet of Zabbix configuration we have used:
UserParameter=unicorn.revision.check,/usr/bin/curl -H 'Host: example.com' -sL -w '%{http_code}' http://<unicorn ip address>/monitoring/unicorn
Next, we’ve created a Zabbix agent item (“Revision status check”) and used it for a trigger:
( {TRIGGER.VALUE}=0 and
{Unicorn revision template:unicorn.revision.check.count(#3,200,"ne")}=3 )
or
( {TRIGGER.VALUE}=1 and
{Unicorn revision template:unicorn.revision.check.count(#2,200,"ne")}=2 )
Now, how does this trigger work, exactly? It checks for a non-200 HTTP response and ensures that it persists during a period, to prevent false alerts during deploys and rebouncing. Google “Hysteresis in Zabbix” for more details, if you’re interested.
The first alert we’ve got led us to the following output in the Unicorn log file:
I, [2015-03-13T11:41:29.862602 #1285] INFO -- : executing ["/home/project/releases/20150312084505/vendor/bundle/ruby/2.1.0/bin/unicorn_rails", "-c", "config/unicorn.rb", "-E", "production", "-D", {12=>#<Kgio::UNIXServer:fd 12>}] (in /home/project/releases/20150313084000)
/home/project/current/vendor/bundle/ruby/2.1.0/gems/unicorn-4.8.3/lib/unicorn/http_server.rb:475:in `exec': No such file or directory - /home/project/releases/20150312084505/vendor/bundle/ruby/2.1.0/bin/unicorn_rails (Errno::ENOENT)
And at that moment I’ve started to get a clear understanding of what’s going on.
Here is how zero-downtime Unicorn deploy works: it starts the master process and replaces it with a fork on the next deploy.
On the first deploy, bundle exec unicorn_rails command was mapped to /home/project/releases/1/vendor/bundle/ruby/2.1.0/bin/unicorn_rails
, which started properly. But later, let’s say on 100th deploy, the original Unicorn still pointed to the /home/project/releases/1
directory from the first release. By default, Capistrano 3 keeps only the last five releases and cleans up older releases (deploy:cleanup
task runs automatically by default), and obviously /home/project/releases/1
did not exist anymore.
Since /home/project/releases/1/vendor/bundle
is just a symlink to /home/project/shared/vendor/bundle
, we had to configure Bundler to map commands to the shared
directory — which exists forever.
$ bundle install --deployment
$ cat .bundle/config
#> BUNDLE_FROZEN: '1'
#> BUNDLE_PATH: "vendor/bundle"
Here is a problem: BUNDLE_PATH
is relative, and it maps to /home/project/releases/1/vendor/bundle
, which would be destroyed later after a few deploys.
And the fix:
$ bundle install --deployment --path /home/project/shared/vendor/bundle
$ cat .bundle/config
#> BUNDLE_FROZEN: '1'
#> BUNDLE_PATH: "/home/project/shared/vendor/bundle"
In Capistrano 3, this patch can be applied by setting the bundle_path
option:
set :bundle_path, -> { shared_path.join('vendor/bundle') }
Done.
In spite of my history of contributions to Capistrano and Bundler, it was one of the most interesting bugs I’ve found recently.
We’ve decided to keep monitoring Unicorn revisions to catch any failed restarts in future.