Fighting the Hydra of N+1 queries — Martian Chronicles, Evil Martians’ team blog
Martian Chronicles
Evil Martians’ team blog
Back-end

Fighting the Hydra of N+1 queries

Let’s talk about the N+1 problem in Rails. We will go through a short intro for beginners, speak of the ways to tame the problem (specifically, using the bullet gem), ActiveSupport instrumentation, and introduce the rspec-sqlimit gem.

The Hydra

In the Rails ecosystem, every developer knows about the so called N+1 query problem and how to deal with it. It is about impossible today to overlook tons of articles, blog posts, and tutorials mentioning both the problem and the solution to it.

The N+1 problem is a common performance anti-pattern. It happens when we fetch records from an associated table not with a single SQL query but instead using individual queries for every single record.

class User < ActiveRecord::Base
  has_many :messages
end

class Message < ActiveRecord::Base
  belongs_to :user
end
Message.where(id: 1..3).each do |message|
  puts message.user.name
end

We have executed 1 query to retrieve 3 messages and then did 3 other queries for retrieving a user of every single message.

SELECT * FROM "messages" WHERE "messages"."id" IN (1, 2, 3)
SELECT * FROM "users" INNER JOIN "messages" ON "users"."id" = "messages"."user_id" WHERE "messages"."id" = 1
SELECT * FROM "users" INNER JOIN "messages" ON "users"."id" = "messages"."user_id" WHERE "messages"."id" = 2
SELECT * FROM "users" INNER JOIN "messages" ON "users"."id" = "messages"."user_id" WHERE "messages"."id" = 3

Ineffective communication with the database will take a toll on the performance of your application. The solution per se is simple: you should eagerly preload associated records:

Message.where(id: 1..3).includes(:user).each do |message|
  puts message.user.name
end

This time, ActiveRecord runs only 2 queries instead of N. Here is how it uses the result of the first request to retrieve all associated users at once:

SELECT * FROM "messages" WHERE "messages"."id" IN (1, 2, 3)
SELECT * FROM "users" INNER JOIN "messages" ON "users"."id" = "messages"."user_id" WHERE "messages"."id" IN (1, 2, 3)

However simple and well-known the solution it is, preventing N+1 in real-world long-lasting projects can be tricky. Sometimes the problem acts like the Lernaean Hydra with a capacity to regenerate lost heads every time a developer distracts. Like Hercules, we need special instrumentation, a sort of “sword and fire” to keep the monster under our control.

A Simple Example

Let’s take a look at an oversimplified example—enough to illustrate the heart of the problem.

To start, we create two models for users and messages between them:

class User < ActiveRecord::Base
  has_many :incomings, class_name: "Message", foreign_key: :addressee_id
  has_many :outgoings, class_name: "Message", foreign_key: :addresser_id

  validates :name, presence: true # just to show that there is a name
end

class Message < ActiveRecord::Base
  belongs_to :addresser, class_name: "User"
  belongs_to :addressee, class_name: "User"

  validates :text, presence: true # just to show that there is a text
end

Yes, that simple.

   +---> User <--+
   |             |
addressor     addressee
   |             |
   +-- Message --+

Next, we create two classes responsible for representing messages and users. From the very beginning, we will separate concerns between two different classes. Why? Because in real-world projects exposition of every model can be complex and deserve its own representer.

UserPage = Struct.new(:user) do
  def to_h
    { name: user.name }
  end
end

MessagePage = Struct.new(:message) do
  def to_h
    {
      text:      message.text,
      addresser: UserPage.new(message.addresser).to_h,
      addressee: UserPage.new(message.addressee).to_h
    }
  end
end

Let’s check how that works so far:

joe = User.create name: "Joe"
ann = User.create name: "Ann"
message = Message.create addresser: joe, addressee: ann, message: "Hi!"

MessagePage.new(message).to_h
# => { text: "Hi!", addresser: { name: "Joe" }, addressee: { name: "Ann" } }

Now, what about the N+1 problem? Well, to display all the messages we should do something like this:

class MessagesPage
  def to_h
    Message.includes(:addresser, :addressee)          # here we prevent a N+1 query
           .map { |item| MessagePage.new(item).to_h } # and get "preloaded" users
  end
end

MessagesPage.new.to_h
# => [{ text: "Hi!", addresser: { name: "Joe" }, addressee: { name: "Ann" } }]

Here be Hydra

Time passes, and a new developer joins our project. He needs to add some details to our users by creating a new model for a Country and assigning it to every User:

class Country < ActiveRecord::Base
  has_many :users
  validates :name, presence: true # just to make it visible here
end

class User < ActiveRecord::Base
  # ... all the previous stuff
  belongs_to :country
end

Just a small extension… (a soft hiss is barely audible for now):

        Country
           ^
           |
           |
   +---> User <--+
   |             |
addressor     addressee
   |             |
   +-- Message --+

Then we make a simple addition to the representer:

UserPage = Struct.new(:user) do
  def to_h
    { name: user.name, country: user.country.name }
  end
end

And you can guess what happens next.

Because a new dependency has not been reflected in a query, now we have 2N+3 queries (1 for the list of messages, 2 for lists of assigned users, and another 2N for countries of addresser/addressee for every single message).

We can even go further. Following the Law of Demeter we could make several delegations:

UserPage = Struct.new(:user) do
  delegate :country, to: :user
  delegate :name,    to: :user,    prefix: true
  delegate :name,    to: :country, prefix: true, allow_nil: true

  def to_h
    { name: user_name, country: country_name }
  end
end

Or implement some memoization:

UserPage = Struct.new(:user) do
  delegate :name, to: :user, prefix: true
  delegate :name, :code, to: :country, prefix: true, allow_nil: true

  def country
    @country ||= user.country
  end

  def to_h
    { name: user_name, country: { name: country_name, code: country_code } }
  end
end

While these changes simplify the code and make it more readable, they hide its N+1 vulnerabilities at the same time.

The above examples show that the problem has deeper roots than developer’s carelessness. It is the breaking up of a “complex” query to separate classes and DRY-ing code that made the whole picture less observable. The more our structure grows, the more additional effort is needed to prevent this sort of mistakes.

As mentioned above, this particular example is oversimplified for the mere purpose of illustration. Take a look at a real-world structure below and try to think of all the possible places where N+1 could arise inadvertently during any code update.

                Shop <------------ ShippingService
                  ^                    ^
                  |                    |
                  |                    |
Account <---- Showcase <------------+  |
   ^              ^                 |  |
   |              |                 |  |
   |              |                 |  |
Product <----- Listing ----> ShippingProfile
   ^              ^
   |              |
   |              |
Variation <--- ListingVariation

While we know what to do to prevent N+1, it can take a lot of time and effort to discover, track and hunt the problem, either existing or potential.

Instead of reacting to problems after they have already occurred, we need a proactive approach to deal with such cases in more coherent and efficient way.

Swords, Shields, and Mirrors

There is a bunch of tools you can use to fight the beast.

You should have proper instrumentation tools in production. By that, I mean both log collectors (there are tons of those—just Heroku offers six different addons with various pricing plans) and monitoring. It is worth spending some time to search for spikes in response times, especially after deploying significant changes to the codebase and checking corresponding logs.

For several years already, the bullet gem is considered the Rails community’s gold standard for guarding against suboptimal queries. In the next section, I dwell on the gem in more detail.

But my whole point is that neither good production-, no development-specific tools are sufficient. We should prevent N+1 from the very beginning—our application’s test coverage.

That is what I am going to describe at the end of this post.

bullet

The most popular tool for fighting the N+1 problem in Ruby is the bullet gem by Richard Huang.

The bullet gem was designed to function as you browse through the application in the development environment. It will watch your queries while you develop your application and notify you when you should add eager loading (N+1 queries), when you use eager loading that is not necessary—and when you should use counter caching.

To use it, just add the gem to the development environment of your Gemfile:

# ./Gemfile
gem "bullet", group: "development"

Then configure it:

# ./config/environments/development.rb
config.after_initialize do
  Bullet.enable = true
  Bullet.rails_logger = true
end

Bullet has several ways to notify you about problems—JavaScript alerts, logging into files or messaging via the browser console. You can learn more from the gem’s README.

The problem with this approach is its human-dependency. Your application codebase grows, and some day you can no longer manually test all the possible cases after every single change. In the Ruby community—which is obsessed with testing—it is natural to solve problems by writing test cases.

Bullet provides support for testing environment as well to warn you every time N+1 occurs:

expect(MessagesPage.new.to_h.count).to eq 2

# 1.2) Failure/Error: Bullet.perform_out_of_channel_notifications if Bullet.notification?
#
#      Bullet::Notification::UnoptimizedQueryError:
#        user: nepalez
#
#      USE eager loading detected
#          User => [:country]
#          Add to your finder: :includes => [:country]
#        Call stack
#          ./app/pages/user_page.rb:3:in `to_h'
#          ./app/pages/message_page.rb:5:in `to_h'
#          ./app/pages/messages_page.rb:4:in `block in to_h'
#          ./app/pages/messages_page.rb:4:in `to_h'

Recently Vladimir Dementiev has pointed out to me that you can call bullet on demand with the following trick:

# spec/support.rb
shared_context "bullet", bullet: true do
  before(:each) do
    Bullet.enable = true
    Bullet.bullet_logger = true
    Bullet.raise = true # raise an error if N+1 query occurs
    Bullet.start_request
  end

  after(:each) do
    Bullet.perform_out_of_channel_notifications if Bullet.notification?
    Bullet.end_request
    Bullet.enable = false
    Bullet.bullet_logger = false
    Bullet.raise = false
  end
end
# spec/rails_helper.rb
require_relative "support"

config.alias_example_to :bulletify, bullet: true

Now you can use bulletify (in the same way as RSpec’s it) to check if a block is “N+1 safe”:

# spec/controllers/my_controller_spec.rb
context 'N+1' do
  bulletify { get :index }
end

There is no Silver Bullet

In many cases just switching bullet coverage on is not sufficient to catch all problems. The reason for that is that no tool can detect the problem unless you provide enough sufficient data. Speaking about tests, not only do you have to start using the tool, but also to provide good examples where a N+1 query could show itself.

However, even after you provide said test examples, debugging the source of N+1 requests can require much more detailed reports than bullet can provide. How about saving a nested hash containing data for several associated models:

{
  product: {
    id: 1,
    title: "The Hood"
    variations: [
      {
        id: 1,
        title: "The Red Hood",
        color: "Red"
      }
      # destroy all the other variations of the product
    ]
  }
}

Internally, after saving a top-level product, you should reload its variations before processing updates to them. Even when something goes south, and bullet tells that you have missed a preloader (actually it will not), you need a much more verbose description of “what is going on here”.

rspec-sqlimit

After bumping into the problem some time ago, I wrote a tiny rspec matcher, rspec-sqlimit. With it, you could make sure that your code makes a reasonable number of queries in the following way:

require "rspec"
require "rspec-sqlimit"

RSpec.describe "messages" do
  let(:joe) { User.create name: "Joe" }
  let(:ann) { User.create name: "Ann" }

  before do
    Message.create addresser: joe, addressee: ann, text: "Hi!"
    Message.create addressee: ann, addressee: joe, text: "Hola!"
  end

  subject { MessagesPage.new.to_h }.not_to exceed_query_limit(3).with(/SELECT/)
end

After adding Country (see examples above), our test suite will produce something like this:

Failure/Error: expect { MessagesPage.new.to_h }.not_to exceed_query_limit(3).with(/^SELECT/)

  Expected to run maximum 3 queries that match (?-mix:^SELECT)
  The following 7 queries were invoked among others (see mark ->):
  -> 1) SELECT "messages".* FROM "messages" (0.133 ms)
  -> 2) SELECT "users".* FROM "users" WHERE "users"."id" IN (2, 1) (0.154 ms)
  -> 3) SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2) (0.125 ms)
  -> 4) SELECT  "countries".* FROM "countries" WHERE "countries"."id" = ? LIMIT ? (0.21 ms)
  -> 5) SELECT  "countries".* FROM "countries" WHERE "countries"."id" = ? LIMIT ? (0.15 ms)
  -> 6) SELECT  "countries".* FROM "countries" WHERE "countries"."id" = ? LIMIT ? (0.118 ms)
  -> 7) SELECT  "countries".* FROM "countries" WHERE "countries"."id" = ? LIMIT ? (0.112 ms)

This way, not only you have a notification about the problem, but a full description of SQL queries that have been made against your expectations as well. This time, you are armed against the Hydra. rspec-sqlimit will display what is going wrong, and bullet will recommend you a remedy. Just make this kind of testing a habit, and you will keep the neverending N+1 nightmare under control.

Going a Bit Deeper

For those who are interested in how the rspec-sqlimit gem works, I will add some technical details with references for further investigation.

Under the hood, the gem uses the pub/sub pattern via ActiveSupport Notification and Instrumentation features. Its implementation is a thin wrapper around the old Ryan Bigg’s answer on the Stack Overflow.

To cite Rails documentation:

There is a hook provided within Active Record that is called every time Active Record uses an SQL query on a database. This hook could be subscribed to, and used to track the number of queries during a certain action.

Following the docs, to instrument an event you just need to wrap some block of code into an instrumenter with a unique name for the event and some extra info:

ActiveSupport::Notifications.instrument('event_name', extra: :information) do
  # do something useful
end

Every time the block is called, the instrumenter invokes a corresponding event.

Next, you can subscribe to a specific type of event using ActiveSupport::Notifications#subscribe:

ActiveSupport::Notifications.subscribe('event_name') do |name, start, finish, id, payload|
  # handle the event
end

You can see how the sql.active_record hook is defined, and how rspec-sqlimit is subscribed to notifications to register executed queries.

All we have to do is wrap the results into a custom RSpec matcher. You can see the detailed RSpec documentation to learn more about how to build your own.

By following this scenario, you can build matchers to instrument all the meaningful metrics of your application in TDD-style. For example, you could create a matcher to control the number of model instances to prevent GC overloading, or make a custom counter for the number of emails being sent or delivered.

You can even provide custom instrumentation for the critical parts of your Rails application, and then add test coverage for defined custom events.

With all that matchers you can write a distinctive “Here be Dragons” message to future developers in a form they could neither miss nor ignore.


To recap,

  • Always remember about the N+1 problem and the necessity of running queries eagerly;
  • Periodically check your code using bullet—gracefully integrated into the test suite;
  • Cover complex queries with specs limiting the total number of requests. rspec-sqlimit will help. Do it every time you spread business logic among several classes;
  • Use ActiveSupport notifications to provide RSpec instrumentation matchers of your own.

Take care!