Lefthook: refactoring the Git Hooks automation tool back into shape


Translations

If you’re interested in translating or adapting this post, please contact us first.

The need to refactor. The feeling starts creeping in whenever adding changes to existing code becomes painful. That was our exact situation while trying to bring some new features and fix existing issues working with Lefthook v0.8.0. In this quick post, we’ll talk about our Lefthook refactoring journey and share a few general signs to look out for to know when you might consider refactoring your own project.

Lefthook is a command line tool for automating Git Hooks actions. It uses a declarative approach to define what should be executed (and when) upon calling git commit -m 'fix typo'. It is simply a binary utility that can be used with zero overhead.

Lefthook allows us to describe Git hooks in a plain YAML file, which is much more convenient than writing shell scripts. It comes with parallelization, files filtering, skip options, and many other features from the box. Lefthook is delivered as an NPM package, Ruby gem, .rpm and .deb package, Snap package, Homebrew formula, and, of course, as a plain binary.

The main issues with Lefthook in 2022

By 2022, some issues had started to pile up: for newcomers starting to contribute, it was hard to add features or fix bugs. Changing some configuration options was becoming increasingly painful. And, adding new options was unclear—imagine you have to define a constant with an option name, a function to fetch its value, and some have kind of a validation somewhere else.

Eventually, we felt it was time to stop development for a few months and refactor Lefthook entirely.

After all, it had been a while since work on Lefthook started—more than 3 years ago at the time of this writing. With every new feature, the complexity of the code increased. It eventually became impossible to understand the structure of lefthook.yml by looking at the code itself.

At this point, the prospect of applying new features had just become too scary: you were liable to break something because, well, there were no unit tests. Further, the init function was being overused, which also made it difficult to properly test Lefthook—everything was already initialized by the time testing would get underway. For example, there was no room to substitute in-memory FS.

Lefthook uses the viper module. As such, it was fetching the config keys without unmarshalling. This was OK when the config file was small, but once it grew to the existing structure it became painful to fetch config options.

Big files are hard to read—and if they use functions defined in other files, it’s twice as hard!

And, having a file that imports a ton of modules is a point of concern. Importing 20 modules in a .go file is too much.

Imagine how difficult it will be to understand the code from this file. Or further, how complicated it could get to change/maintain the underlying logic:

// Too much
import (
  "bytes"
  "errors"
  "fmt"
  "io"
  "log"
  "os"
  "os/exec"
  "path/filepath"
  "regexp"
  "sort"
  "strconv"
  "strings"
  "sync"
  "time"

  "github.com/evilmartians/lefthook/context"

  arrop "github.com/adam-hanna/arrayOperations"
  "github.com/creack/pty"
  "github.com/gobwas/glob"
  "github.com/spf13/afero"
  "github.com/spf13/cobra"
  "github.com/spf13/viper"
  "gopkg.in/alessio/shellescape.v1"
)

This is when we should think about single-responsibility principle of SOLID. If you see something like this in a project, it’s probable you need to refactor.

Refactoring goals

As part of refactoring, these were our primary objectives:

  • Make the config structure possible to understand by looking at the code itself.
  • Split responsibility into obvious parts—config processing, CLI options parsing, and the Lefthook logic itself.
  • Ensure the ability to cover Lefthook’s logic with tests.

All in all, this doesn’t seem like too much, but it actually required rewriting the entire code base.

File structure and tests

Lefthook deals with files. This means it should interact with the OS. But with the os package, it’s hard to write tests. So, that’s why we used Afero.

Having everything in a project’s root folder is not convenient. So, we thought about the various responsibilities in the project and ended up following file structure:

├── cmd/             # Everything about CLI
└─── internal/       # "internal", shouldn't be used by other Go progs
    ├── config/      # lefthook.yml and lefthook-local.yml parsing
    ├── git/         # All git wrappers used by lefthook
    ├── lefthook/    # Internal logic
    │   └── runner/
    ├── log/         # All about printing and STDOUT
    ├── templates/   # Hook file and config file templates
    └── version/     # Lefthook version constant and version check logic

Dropping global variables

In the previous version of Lefthook there were a lot of global variables.

The code was hard to read because these global vars were used in many places. This is not convenient for medium-sized projects (like, for example, Lefthook).

You always have to lookup the place where the its value is assigned, and you need to be sure the value is correctly assign at the moment of referencing the variable. So, the next step was to drop global variables.

Going from procedural to OOP style

We had some procedural style code (because it was easy to write and understand). But after adding more features, this style became difficult to maintain: when the codebase grows, it can be hard to keep things small and controllable. You end up having huge files, long functions, and many global variables.

Consider the following code:

var (
  rootPath, gitHooksPath string
)

...

func initGitConfig() {
  setRootPath()
  setGitHooksPath(getHooksPathFromGitConfig())
}

func setRootPath() {
  cmd := exec.Command("git", "rev-parse", "--show-toplevel")

  outputBytes, err := cmd.CombinedOutput()

  if err != nil {
    log.Fatal(au.Brown(gitInitMessage))
  }

  rootPath = strings.TrimSpace(string(outputBytes))
}

func getRootPath() string {
  return rootPath
}

func getGitHooksPath() string {
  return gitHooksPath
}

func setGitHooksPath(path string) {
  if exists, _ := afero.DirExists(appFs, filepath.Join(getRootPath(), path)); exists {
    gitHooksPath = filepath.Join(getRootPath(), path)
    return
  }

  gitHooksPath = path
}

func getHooksPathFromGitConfig() string {
  cmd := exec.Command("git", "rev-parse", "--git-path", "hooks")

  ...
}

There are 6 functions just to set and get 2 global variables gitHooksPath and rootPath. And there are getters for global variables! Also, some logic is duplicated (git command execution, for example).

This is how we rewrote this logic and put it into internal/git/repository.go file:

const (
  cmdRootPath  = "git rev-parse --show-toplevel"
  cmdHooksPath = "git rev-parse --git-path hooks"
  ...
)

type Repository struct {
  HooksPath string
  RootPath  string
  ...
}

func NewRepository(fs afero.Fs) (*Repository, error) {
  rootPath, err := execGit(cmdRootPath)
  if err != nil {
    return nil, err
  }

  hooksPath, err := execGit(cmdHooksPath)
  if err != nil {
    return nil, err
  }
  if exists, _ := afero.DirExists(fs, filepath.Join(rootPath, hooksPath)); exists {
    hooksPath = filepath.Join(rootPath, hooksPath)
  }
  ...

  return &Repository{
    HooksPath: hooksPath,
    RootPath:  rootPath,
    ...
  }, nil
}

func execGit(command string) (string, error) {
  args := strings.Split(command, " ")
  cmd := exec.Command(args[0], args[1:]...)

  out, err := cmd.CombinedOutput()
  if err != nil {
    return "", err
  }

  return strings.TrimSpace(string(out)), nil
}

The main idea here is: while procedural style fits for small projects with small codebase, using OOP design helps with making it easy to add new features, grow, and maintain a suitably big codebase.

Rather than using a procedural approach with inits, setters, getters, and global variables, here, it’s better to use OOP approach and use constructors to build an object containing all the required fields.

Creating a config structure

Consider the code fetching config properties:

func getRoot(hooksGroup string, executableName string) string {
  key := strings.Join([]string{hooksGroup, commandsConfigKey, executableName, rootConfigKey}, ".")
  return viper.GetString(key)
}

func getCommandIncludeRegexp(hooksGroup, executableName string) string {
  key := strings.Join([]string{hooksGroup, commandsConfigKey, executableName, includeConfigKey}, ".")
  return viper.GetString(key)
}

func getCommandExcludeRegexp(hooksGroup, executableName string) string {
  key := strings.Join([]string{hooksGroup, commandsConfigKey, executableName, excludeConfigKey}, ".")
  return viper.GetString(key)
}

...

These repetitive calls to viper and the magic joins are hard to maintain. If you add a new config property, you will have to write one more getSomething method and use it everywhere to get the value.

Instead we created a config structure:

type Command struct {
  Run string `mapstructure:"run"`

  Skip  interface{} `mapstructure:"skip"`
  Tags  []string    `mapstructure:"tags"`
  Glob  string      `mapstructure:"glob"`
  Files string      `mapstructure:"files"`

  Root    string `mapstructure:"root"`
  Exclude string `mapstructure:"exclude"`
  ...
}

This change required us to add special logic for merging lefthook.yml and lefthook-local.yml, which viper was doing implicitly, but we ended up with a plain and simple structure containing all the properties of a command.

Avoiding code repetition

When you see the repetition of code there might be a place for refactoring. (If reasonable, of course.) Take a look at the following code:

if !isScriptExist(hooksGroup, executableName) {
  mutex.Lock()
  spinner.RestartWithMsg(sprintSuccess("\n", au.Bold(executableName), au.Brown("(SKIP BY NOT EXIST IN CONFIG)")))
  mutex.Unlock()
  return
}
if isSkipScript(hooksGroup, executableName) {
  mutex.Lock()
  spinner.RestartWithMsg(sprintSuccess("\n", au.Bold(executableName), au.Brown("(SKIP BY SETTINGS)")))
  mutex.Unlock()
  return
}
if result, _ := arrop.Intersect(getExcludeTags(hooksGroup), getTags(hooksGroup, scriptsConfigKey, executableName)); len(result.Interface().([]string)) > 0 {
  mutex.Lock()
  spinner.RestartWithMsg(sprintSuccess("\n", au.Bold(executableName), au.Brown("(SKIP BY TAGS)")))
  mutex.Unlock()
  return
}

The problem is that you have to stop the spinner when you print something to STDOUT. Instead, let’s delegate this to a printing module!

// internal/log/log.go

type Logger struct {
  level   Level
  aurora  aurora.Aurora
  out     io.Writer
  mu      sync.Mutex
  spinner *spinner.Spinner
}

func New() *Logger {
  return &Logger{
    level:  InfoLevel,
    out:    os.Stdout,
    aurora: aurora.NewAurora(true),
    spinner: spinner.New(
      spinner.CharSets[spinnerCharSet],
      spinnerRefreshRate,
      spinner.WithSuffix(" waiting"),
    ),
  }
}

func (l *Logger) Info(args ...interface{}) {
  l.Log(InfoLevel, args...)
}

func (l *Logger) Log(level Level, args ...interface{}) {
  if l.IsLevelEnabled(level) {
    l.Println(args...)
  }
}

func (l *Logger) Println(args ...interface{}) {
  l.mu.Lock()
  defer l.mu.Unlock()

  if l.spinner.Active() {
    l.spinner.Stop()
    defer l.spinner.Start()
  }

  _, _ = fmt.Fprintln(l.out, args...)
}

// internal/lefthook/runner/runner.go: Using Logger

script, ok := r.hook.Scripts[file.Name()]
if !ok {
  logSkip(file.Name(), "(SKIP BY NOT EXIST IN CONFIG)")
  continue
}
...
if script.DoSkip(r.repo.State()) {
  logSkip(file.Name(), "(SKIP BY SETTINGS)")
  return
}
if intersect(r.hook.ExcludeTags, script.Tags) {
  logSkip(file.Name(), "(SKIP BY TAGS)")
}

...

func logSkip(name, reason string) {
  log.Info(fmt.Sprintf("%s: %s", log.Bold(name), log.Yellow(reason)))
}

Now we don’t care about spinner and mutexes, and we delegate it to the Logger.

Consider reviewing that parts of code that repeat each other. It’s easier to maintain some logic in one place.

This is more robust because you embed this logic to the printing method and you never even think about handling spinner.

Where do we end up?

After refactoring, Lefthook kept all its features (and its bugs 😃). But, it became much easier to find where apply a new feature. Having tests prevented accidental regressions, and having a config structure in the code made it easier to understand how options are being processed and used.

See Lefthook on Github for more!

And please, don’t hesitate to contribute to our open source projects. You’re always welcome at Lefthook, check the issues page!

What’s next for Lefthook?

Moving forward with Lefthook, we plan to implement new features and make the tool more convenient to use:

  • Improve LFS support.
  • Improve Windows support.
  • Apply commands only on staged parts of files.
  • Stabilize the current implementation.
  • Improve packages support and add more ways to deliver Lefthook.

Stay tuned!

Besides Lefthook, Evil Martians is behind many open source and open core projects: AnyCable, imgproxy, PostCSS, and more! If you’ve got a project trying to get off the ground, or that needs consulting assistance, give us a shout!

Join our email newsletter

Get all the new posts delivered directly to your inbox. Unsubscribe anytime.

In the same orbit

How can we help you?

Martians at a glance
17
years in business

We transform growth-stage startups into unicorns, build developer tools, and create open source products.

If you prefer email, write to us at surrender@evilmartians.com