Martian Chronicles
Evil Martians’ team blog
Back-end

Lefthook, Crystalball, and git magic for smooth development experience

From this step-by-step tutorial, you will learn how to setup Lefthook git hooks manager, Crystalball test selection library, and also how to automatically install missing gems and migrate your database when you’re switching to a feature branch and back to the master.

Every project should have a CI installed. But CI builds sometimes can queue up, you need to wait for the notification. And anyway, is there a method to reduce the time for “implement-test-fix” cycle, save keystrokes and mouse clicks, but don’t let broken code to get into the project repository? Yes, it exists long ago and is well-known: git hooks. Plain executable scripts in your local .git/hooks/ directory. But it is so bothersome to set them up, to update them, and to sync with your collaborators because you can’t commit them to the repository itself.

Setting up Lefthook

Lefthook is our own git hook manager written in Go. Single dependency-free binary. Fast, reliable, feature-rich, language-agnostic.

  1. Install it via gem, npm, or from source or your OS package manager.

  2. Define your hooks in config file lefthook.yml

    pre-push:
      parallel: true
      commands:
        rubocop:
          tags: backend
          run: bundle exec rubocop
        rspec:
          tags: rspec backend
          run: bundle exec rspec --fail-fast
    
  3. And run lefthook install.

And voila, starting from now RSpec and Rubocop will be run in parallel on every git push, and push will be aborted if they would find any issues.

But why you’ve chosen pre-push over pre-commit?

First of all, sometimes, during refactorings, you want to make a lot of small commits locally. Most of them won’t pass linters. Why execute all this machinery if you know that it won’t pass? Afterward, you will squash and reorder them with git rebase --interactive and push clean code to the repo.

More importantly, some things are not executing fast. Wait a minute on every git commit is meh. You lose speed. So why not to move long operations to the much more rare push event?

However, running whole test suite may take a too long time (and we have CI exactly for that anyway), so we need some method to run only specs that we probably might break by our changes.

Setting up Crystalball

Crystalball is a Ruby library by Toptal which implements Regression Test Selection mechanism. Its main purpose is to select a minimal subset of your test suite, which should be run to ensure your changes didn’t break anything. It is a tricky problem in Ruby applications in general and especially in Rails applications because of Ruby on Rails constant autoloading mechanism.

Crystalball solves this problem by tracing code execution and tracking dependencies between files when your test suite is running. Using this profiling data, it could tell which files affect which. Take a look at slides about Crystalball from the talk at RubyKaigi 2019.

  • Install it
# Gemfile
gem "crystalball"
  • Configure for our pre-push case (by default crystalball is configured to be used in pre-commit hooks)
# config/crystalball.yml
---
map_expiration_period: 604800 # 1 week
diff_from: origin/master
  • Setup your test suite to collect code coverage information:
# spec/spec_helper.rb
if ENV['CRYSTALBALL'] == 'true'
  require 'crystalball'
  require 'crystalball/rails'

  Crystalball::MapGenerator.start! do |config|
    config.register Crystalball::MapGenerator::CoverageStrategy.new
    config.register Crystalball::Rails::MapGenerator::I18nStrategy.new
    config.register Crystalball::MapGenerator::DescribedClassStrategy.new
  end
end
  • Generate code execution maps:
CRYSTALBALL=true bundle exec rspec
  • Replace RSpec with crystalball in lefthook.yml:
-      run: bundle exec rspec --fail-fast
+      run: bundle exec crystalball --fail-fast

And from now every push will be accelerated dramatically if your changes are small.

But crystalball needs up-to-date code execution maps to work correctly. Can we automate these maps refreshing, too? Sure, we can!

Keeping Crystalball up-to-date

For that sake, git’s post-checkout hook fits very well. We can run code with updating crystalball data logic. “Logic” implies complexity as there is no single command for that. To cover such cases, Lefthook allows having separate executable script files. We can put our logic to .lefthook/post-checkout/crystalball-update file, make it executable, and declare in lefthook configuration like this:

# lefthook.yml

post-checkout:
  scripts:
    crystalball-update:
      tags: rspec backend

And there, in crystalball-update script, we need a bit of magic.

First of all, we don’t need to do anything when a developer uses git checkout -- path command to reject some changes from the working tree. Because this is not switching between commits and repository is possibly “dirty.” Yes, Git CLI sometimes can feel weird as checkout command is used for two different tasks.

Per git docs, git will always pass to post-checkout hook three arguments: previous HEAD commit identifier (SHA1 sum), current HEAD commit identifier and flag whether it was checkout between branches (1) or file checkout to the state of another commit (0). Lefthook will catch these arguments and will carefully pass them to every managed script.

#!/usr/bin/env ruby

_prev_head, _curr_head, branch_change, * = ARGV

exit if branch_change == "0" # Don't run on file checkouts

Next, we want to update crystalball profiling data only on the master branch as recommended in the Crystalball docs. To do so, we need to ask git what branch we’ve checked out:

# Rails.root if we look from .lefthook/post-checkout dir
app_dir = File.expand_path("../..", __dir__)
ENV["BUNDLE_GEMFILE"] ||= File.join(app_dir, "Gemfile")
require "bundler/setup"

require "git"
exit unless Git.open(app_dir).current_branch == "master"

git gem is a dependency of Crystalball, so we don’t have to install it.

And finally we need to do most heavy part: ask Crystalball, “Are your profiling data up-to-date?”

require "crystalball"
config = Crystalball::RSpec::Runner.config
prediction_builder = Crystalball::RSpec::Runner.prediction_builder

exit if File.exist?(config["execution_map_path"]) && !prediction_builder.expired_map?

And if it is not fresh we need to run the whole test suite with special environment variable set:

puts "Crystalball Ruby code execution maps are out of date. Performing full test suite to update them…"

ENV["CRYSTALBALL"] = "true"
RSpec::Core::Runner.run([app_dir])

And we’re done. Are we?

Automate other routine tasks

But running specs require that we have:

  1. installed gems, and
  2. actual database state.

And in actively developing application gems are frequently updated, added, and removed, database schema sometimes can be changed several times a day in different branches. It is so typical to pull fresh master at morning and get updated gems and new database migrations. In that case, RSpec would fail, and Crystalball execution path maps won’t be complete. So we need to ensure that our specs always can run beforehand.

Install missing gems on a git checkout

This task is quite simple and can be achieved by a simple bash script. Most of it will consist of checks to avoid calling bundler when it’s not needed. Bundler is quite heavy as it runs by noticeable time.

Two first of these checks are same, but just rewritten to shell: is this branch checkout? Did we actually move between commits?

#!/bin/bash

BRANCH_CHANGE=$3
[[ $BRANCH_CHANGE -eq 0 ]] && exit

PREV_HEAD=$1
CURR_HEAD=$2
[ $PREV_HEAD == $CURR_HEAD ] && exit

Next one is more tricky:

# Don't run bundler if there were no changes in gems
git diff --quiet --exit-code $PREV_HEAD $CURR_HEAD -- Gemfile.lock && exit;

We’re asking here, “Did a set of required gems change between commits?” If Gemfile.lock was changed, we need to check do we have all of the gems installed by invoking bundler.

bundle check || bundle install

Again, if you have up-to-date gems (and in most checkouts, it will be so), only bundle check will be executed.

Automatically rollback and apply database migrations

Next task is much more interesting.

When we’re switching from branch A to branch B, we need to ensure that database schema actually is compatible with our specs. To do so, we must rollback every migration that exists in branch A but not in branch B and then to apply every migration that exists in B and still isn’t applied. Rollback part is required because migrations that remove or rename columns and tables are not backward compatible.

The problem here is that there is no pre-checkout hook in git, only post-checkout one. And after checkout, there are no more migration files left that existed only in the branch we’re switched from. How to rollback them?

But this is git! The files are out there. Why not just take them from git itself?

To do so programmatically let’s use gem git to access our git repository. Crystalball already uses it under the hood so there will be no new dependency, but it is a good idea to add it to the Gemfile explicitly.

Let’s start from the check that we really have any migrations to run (either up or down):

require "git"

# Rails.root if we look from .lefthook/post-checkout dir
app_dir = File.expand_path("../..", __dir__)

git = Git.open(app_dir)

# Don't run if there were no database changes between revisions
diff = git.diff(prev_head, curr_head).path("db/migrate")
exit if diff.size.zero?

Then, to be able to use migrations, we need to load our rails application and connect to the database:

require File.expand_path("config/boot", app_dir)
require File.expand_path("config/application", app_dir)
require "rake"
Rails.application.load_tasks

Rake::Task["db:load_config"].invoke

Then we can take files and save them somewhere:

# migrations added in prev_head (deleted in curr_head) that we need to rollback
rollback_migration_files = diff.select { |file| file.type == "deleted" }

if rollback_migration_files.any?
  require "tmpdir"
  MigrationFilenameRegexp = ActiveRecord::Migration::MigrationFilenameRegexp
  versions = []

  Dir.mktmpdir do |directory|
    rollback_migration_files.each do |diff_file|
      filename = File.basename(diff_file.path)
      contents = git.gblob("#{prev_head}:#{diff_file.path}").contents
      File.write(File.join(directory, filename), contents)
      version = filename.scan(MigrationFilenameRegexp).first&.first
      versions.push(version) if version
    end

    # Now, when we have files for migrations that need to be rolled back we can rollback them
    begin
      old_migration_paths = ActiveRecord::Migrator.migrations_paths
      ActiveRecord::Migrator.migrations_paths.push(directory)

      versions.sort.reverse_each do |version|
        ENV["VERSION"] = version
        Rake::Task["db:migrate:down"].execute
      end
    ensure
      ENV.delete("VERSION")
      ActiveRecord::Migrator.migrations_paths = old_migration_paths
    end
  end
end

Here we’re adding our temporary directory with another branch migrations to the ActiveRecord’s migrations_paths. This setting is available since Rails 5, but not widely known. Now ActiveRecord can see our ghost migration files, and we can simply invoke rake db:migrate:down VERSION=number for every migration to rollback it.

And after that we can just migrate not yet applied migrations:

Rake::Task["db:migrate"].invoke

And that’s it!

Composing it together

Now we only need to invoke these scripts in the right order: install gems, run migrations and run specs (if required). To do so, we need to name files in alphabetical order, place them in .lefthook/post-checkout directory and declare them in lefthook.yml:

post-checkout:
  piped: true
  scripts:
    01-bundle-checkinstall:
      tags: backend
    02-db-migrate:
      tags: backend
    03-crystalball-update:
      tags: rspec backend

The piped option will abort the rest of the commands if the preceding command fails. For example, if you forget to launch a database server, the second step will fail, and lefthook will skip the third step altogether.

Frontend devs, not interested in running RSpec can exclude rspec tag in their lefthook.local.yml, and they will only get always installed gems and migrated database. Automagically.

Any gotchas?

From now on you always have to write reversible migrations. Look at example application to learn how to do it.

Conclusion

Now we not only have checks that will prevent us from pushing broken code to the repository (and will check this really fast) but also, as a side-effect, we always will have installed gems, and our database will be in the migrated state. You will forget about bundle install (unless you are the one who updates gems). And no more “Okay, now I need to rollback X and Y first, and checkout back to master only then.”

Check out an experiment with example application published on GitHub: lefthook-crystalball-example.

Happy coding!

Humans! We come in peace and bring cookies. We also care about your privacy: if you want to know more or withdraw your consent, please see the Privacy Policy.