TestProf II: Factory therapy for your Ruby tests
Translations
- ChineseRuby测试的“工厂疗法”
- JapaneseTestProf(2) Rubyテストの遅いfactoryを診断治療する
Learn how to bring your Ruby test suite back to full health (and full speed) with TestProf—a bag of powerful tools for diagnosing all test‑related problems. This time, we’ll talk about factories: how they can slow down your tests, how to measure that negative impact, how to avoid it, and how to make your factories as fast as fixtures.
Other parts:
- TestProf: a good doctor for slow Ruby tests
- TestProf II: Factory therapy for your Ruby tests
- TestProf III: guided and automated Ruby test profiling
NOTE: This post was updated in August 2024 to reflect changes in TestProf usage.
TestProf is used on many Evil Martians’ projects to shorten the TDD feedback loop. It’s a must-have tool for any Rails (or otherwise Ruby-based) application whose tests are taking more than a minute to run. TestProf works with both RSpec and Minitest by extending their functionality.
In our introductory article, we promised to dedicate a whole post to an often overlooked problem with testing Ruby web applications: factory cascades. This article makes good on that promise!
It’s better to explore TestProf by running it on your actual tests, so if you happen to have an RSpec-covered Rails project with factory_bot
factories close at hand—we recommend installing the gem before reading on, as this is going to be an interactive walk-through!
Installing TestProf is as easy as adding a single line of code to your Gemfile
’s :test
group:
group :test do
gem 'test-prof'
end
Crumbling factories
Whenever we’re testing our applications, we need to generate test data, adn two common ways of doing this are by using factories and fixtures.
A factory is an object that generates other objects (that may or may not persist) according to a predefined schema, and it does it dynamically.
Fixtures represent a different approach: they declare a static state for the data loaded into a test database right away and this usually persists between test runs.
In the world of Rails, we have both built-in fixtures as well as a selection of popular third-party factory tools (like factory_bot, Fabrication, and some others). Fixtures are fast by design. Factories are more popular. We believe factories can compete in performance with fixtures, and can even be used as fixtures. Read on to see how.
EventProf
While the “factories vs. fixtures” debate never seems to cease, we consider factories to be a more flexible and a more maintainable way to deal with test data.
However, with great power comes great responsibility: factories make it easier to shoot yourself in the foot and bring your test suite to a painful crawl.
So how can we tell if we’re misuing that power or not, and what can we do about it anyway? First, let’s see how much time our test suite spends working in the factories.
Irina Nazarova CEO at Evil Martians
For that, we should call our doctor: TestProf! This gem is chock full of diagnostic tools, EventProf being one of them. The name gives away what it does: it’s an event profiler that can be made to track a factory.create
event, which is fired every time a factory-generated object is saved to a database (e.g., via the FactoryBot.create
call).
EventProf works with both RSpec and Minitest and has a command-line interface, so fire up your terminal in any Rails project folder (it has to have tests and factories, of course, and all examples in this article assume RSpec) and run this line:
$ EVENT_PROF="factory.create" bundle exec rspec
...
[TEST PROF INFO] EventProf results for factory.create
Total time: 00:09.515 00:11.152 (85.00%)
Total events: 4891
Top 5 slowest suites (by time):
UsersController (users_controller_spec.rb:3) – 00:10.119 (581 / 248)
DocumentsController (documents_controller_spec.rb:3) – 00:07.494 (71 / 24)
RolesController (roles_controller_spec.rb:3) – 00:04.972 (181 / 76)
...
In the output, you see the total time spent on creating records from factories and top five slowest specs. The most surprising piece of the output might be that 85% of the total time was occupied by factories. Yes, that’s not a joke, and see this kind of high percentage spent in factories all the time.
So, run EventProf, check your numbers, keep calm and continue reading. We know how to fix this!
The name of the game is “cascade”
From years of observations and working on TestProf and profiling pretty much anything test-related, one reason for slow tests stands out the most: factory cascades.
Let’s play a little game:
factory :comment do
sequence(:body) { |n| "Awesome comment ##{n}" }
author
answer
end
factory :answer do
sequence(:body) { |n| "Awesome answer ##{n}" }
author
question
end
factory :question do
sequence(:title) { |n| "Awesome question ##{n}" }
author
account # suppose it's our tenant in SaaS application
end
factory :author do
sequence(:name) { |n| "Awesome author ##{n}" }
account
end
factory :account do
sequence(:name) { |n| "Awesome account ##{n}" }
end
Now, try to guess how many records are created in the database once you call create(:comment)
? Once you’ve got an answer in mind, read on.
- First, we generate a
body
for thecomment
. No records created yet, so our score is zero. - Next, we need an
author
for thecomment
. Theauthor
should belong to anaccount
; thus we create two records. Score: 2. - Every comment needs a commentable object, right? In our case, it’s an
answer
. Ananswer
itself needs anauthor
with anaccount
. That is three more records. Score: 2 + 2 = 4. - The
answer
also needs aquestion
, which has its ownauthor
with its ownaccount
. Furthermore, our:question
factory also contains anaccount
association. Score: 4 + 4 = 8. - Now we can create the
answer
and, finally, thecomment
itself. Score: 8 + 2 = 10.
And that’s it! Creating a comment with create(:comment)
yields ten database records.
Do we need multiple accounts and different authors to test a single comment? Unlikely.
You can imagine what happens when we create multiple comments, say, create_list(:comment, 10)
. Houston, we have a problem.
The factory cascade—an uncontrollable process where excess data is generated through nested factory invocations.
We can represent a cascade as a tree:
comment
|
|-- author
| |
| |-- account
|
|-- answer
|
|-- author
| |
| |-- account
|
|-- question
| |
| |-- author
| | |
| | |-- account
| |
| |--account
Let’s call this representation a factory tree. We are going to use it later in our analysis.
Fire walk with me: FactoryProf
EventProf only shows us the total time spent in factories, so we can tell that something goes wrong. However, we still have no idea where to look, unless we dig through the code and do the guessing game. With another tool out of TestProf’s doctor bag, we don’t have to.
Meet yet another profiler: FactoryProf. You can run it like this:
$ FPROF=1 bundle exec rspec
[TEST PROF INFO] Factories usage
total top-level name
1298 2 account
1275 69 city
524 516 room
551 549 user
396 117 membership
524 examples, 0 failures
The resulting report (above) lists all factories and their usage statistics. How are our total
and top-level
results different? The total
value is the number of times a factory has been used to generate a record either explicitly (through create
call), or implicitly, within another factory (through associations and callbacks); the top-level
value only considers explicit calls.
So, a noticeable difference between top-level
and total
values might indicate a factory cascade: it tells us that a factory is more often invoked from other factories then by itself.
But how do we pinpoint those “other factories”? With the help of the factory trees discussed earlier! Let’s flatten our tree (using pre-order traversal) and call the resulting list a factory stack:
// factory stack built from the factory tree above
[:comment, :author, :account, :answer, :author, :account, :question, :author, :account, :account]
Here’s how a factory stack can be built programmatically:
- Every time
FactoryBot.create(:thing)
is called, a new stack is initialized (with:smth
as the first element). - Every time another factory is used within a
:thing
, we push it to the stack.
Why are stacks cool? Because exactly as with call stacks, we can draw flame graphs! And what is cooler than a flame graph?!
FactoryProf knows how to generate interactive HTML flame graph reports out of the box. Here’s another command line invocation:
$ FPROF=flamegraph bundle exec rspec
...
[TEST PROF INFO] FactoryFlame report generated: tmp/test_prof/factory-flame.html
Open the FactoryFlame report in your browser to see something like this:
How do we read this?
Every column represents a factory stack. The wider the column, the more times this stack had occurred in a test suite. The root
cell shows the total number of top-level create
calls.
If your FactoryFlame report looks like a photo of the New York Сity skyline, that means you have a lot of factory cascades (where each “skyscraper” represents a cascade):
Though this image can be a joy to behold in some respects, this is raelly not how the ideal cascade-less report should look. Instead, you should aim for something more flat, like the Dutch countryside:
Doctor, am I going to live?
Knowing how to find cascades is not enough—we also need to eliminate them. Let’s consider several techniques for doing that.
Explicit associations
The first thing that comes to mind is to remove all (or almost all) associations from our factories:
factory :comment do
sequence(:body) { |n| "Awesome comment ##{n}" }
# do not declare associations
# author
# answer
end
With this approach, you have to explicitly specify all required associations when using a factory:
create(:comment, author: user, answer: answer)
# But!
create(:comment) # => raises ActiveRecord::InvalidRecord
One may ask: aren’t we using factories precisely to avoid specifying all the required arguments every time? Yes, we are. With this approach, factories become faster, but also less useful.
Association inference
Sometimes (usually when dealing with denormalization) it’s possible to infer associations from other ones:
factory :question do
sequence(:title) { |n| "Awesome question ##{n}" }
author
account do
# infer account from author
author&.account
end
end
Now we can write create(:question)
or create(:question, author: user)
and do not need to create a separate account.
We can also use lifecycle callbacks:
factory :question do
sequence(:title) { |n| "Awesome question ##{n}" }
transient do
author :undef
account :undef
end
after(:build) do |question, _evaluator|
# if only author is specified, set account to author's account
question.account ||= author.account unless author == :undef
# if only account is specified, set author to account's owner
question.author ||= account.owner unless account == :undef
end
end
This approach can be very efficient but requires a lot of refactoring (and, frankly speaking, makes factories less readable).
FactoryDefault
TestProf provides yet another way to eliminate cascades: FactoryDefault. It’s an extension for factory libraries, like FactoryBot or [Fabrication][], that enables more succinct and less error-prone DSL for creating defaults with associations by allowing you to re-use records inside the factory implicitly. Consider this example:
describe 'PATCH #update' do
let!(:account) { create_default(:account) }
let!(:author) { create_deafult(:author) } # implicitly uses account defined above
let!(:question) { create_default(:question) } # implicitly uses account and author defined above
let(:answer) { create(:answer) } # implicitly uses question and author defined above
let(:another_question) { create(:question) } # uses the same account and author
let(:another_answer) { create(:answer) } # uses the same question and author
# ...
end
The main advantage of this approach is that you don’t have to modify your factories. All you need is to replace some create(…)
calls with create_default(…)
in your tests.
FactoryDefault also comes with a profiler that can help you to identify potential associated factory records (so you do not refactor blindly):
$ FACTORY_DEFAULT_PROF=1 bin/rspec
...
[TEST PROF INFO] Factory associations usage:
factory count total time
user 17 00:12.010
user[admin] 15 00:11.560
Total associations created: 33
Total uniq associations created: 3
Total time spent: 01:13.775
In the report, you can see which factories have been used to create associated records (and, thus, potentially, can be re-used via FactoryDefault).
FactoryDefault introduces a bit of magic to your tests, so use it with caution, as tests should stay as human-readable as possible. It’s a good idea to just use defaults for top-level entities (such as tenants in multi-tenancy apps) and keep the default definitions at the top of the test file.
Bonus: AnyFixture
So far, we’ve only talked about factory cascades. What else could we learn from TestProf reports? Let’s take a look at the FactoryProf report again:
[TEST PROF INFO] Factories usage
total top-level name
1298 2 account
1275 69 city
524 516 room
551 549 user
524 examples, 0 failures
Take note that room
and user
factories are used about the same number of times as the total number of tests. Thus, it is likely that we need both in every example. What about creating those records once, and for all tests? To do that, we can use fixtures!
Since we already have factories, it would be great to re-use them to generate fixtures. So, here comes AnyFixture.
You can use any block of code for data generation, and AnyFixture takes care of cleaning out the database at the end of the run.
AnyFixture works perfectly with RSpec’s shared contexts:
# Activate AnyFixture DSL (fixture) through refinements
using TestProf::AnyFixture::DSL
RSpec.shared_context "shared:user" do
let(:room) { fixture(:room) }
let(:user) { fixture(:user) }
end
RSpec.configure do |config|
# We recommend creating fixtures once at startup
config.before(:suite) do
fixture(:room) { create(:room) }
fixture(:user) { create(:user, room: fixture(:room)) }
end
config.include_context "shared:user", user: true
end
And then, activate this shared context as follows:
describe CitiesController, :user do
before { sign_in user }
# ...
end
With AnyFixture enabled, the FactoryProf report could look like this:
total top-level name
1298 2 account
1275 69 city
8 1 room
2 1 user
524 examples, 0 failures
Looks good, doesn’t it?
There’s no need to choose between factories and fixtures—use both!
Let’s wrap up with a few more notes.
Factories bring simplicity and flexibility to your test data generation, but they are very fragile—cascades can come out of nowhere, repetitive creation can consume too much time.
Take care of your factories and take them to a doctor (TestProf) regularly. Make tests faster and developers happier!
Read the TestProf introduction to learn more about the motivation behind the project and other use cases.
Thank you for reading!