Let there be docs! Generating an OpenAPI schema across the Rails stack

Cover for Let there be docs! Generating an OpenAPI schema across the Rails stack

When can an implementation-first approach to documentation be preferred over documentation-first? Then, read how to do it with an existing application by leveraging some tools in some unexpected ways (including Martian ones!) Plus, AI-assisted migration tips.

In API development, documentation is often treated as an afterthought—something tackled once the code’s written and tested. But there’s a growing movement advocating for a documentation-first approach, where API specifications are crafted before a single line of code.

The other post in this series presented compelling arguments for this, but I’d like to share our experience with the opposing approach: using existing implementation to generate documentation while maintaining the benefits of schema validation—and making sure that it reflects the reality of your implementation.

Other parts:

  1. Let there be docs! A documentation-first approach to Rails API development
  2. Let there be docs! Generating an OpenAPI schema across the Rails stack

Documentation-first vs. implementation-first

So which approach is best for you? Both are valid! But frankly, the choice depends on your project’s priorities and constraints.

Big companies with multiple teams and external consumers might prefer the docs first way, but an implementation-first workflow has one big advantage for smaller companies: your documentation reflects what has actually been built without making you to slow down to think about API beforehand.

This is valuable for startups as they must often move fast and can also afford to break API compatibility while they’re still looking for product-market fit.

Still, let’s break it down a bit more:

Documentation-first is preferable if:

  • API stability is a top priority (more important than delivery speed).
  • There are external consumers you’ll need to coordinate with.
  • You want a clear specification before implementation begins, for example, because multiple teams will coordinate on the API contract.

An implementation-first approach works well when:

  • Faster iteration is a top priority (more important than API stability).
  • You can afford breaking changes and can easily inform consumers.
  • You have a small team and can communicate changes easily.
  • You can negotiate API changes directly with consumers.

Reflecting implementation realities in documentation

Let’s examine this with Rails. Most Rails API projects follow a familiar pattern:

  1. Creating models and database structures to store data
  2. Building controllers to handle requests
  3. Implementing serializers to format responses
  4. Writing tests to validate behavior
  5. Finally, documenting the API (hopefully, anyway!)

The challenge is avoiding repetition of implementation details (like response fields and their types). This can lead to documentation that becomes out of sync with the actual implementation.

But here’s the good news: we’re on Ruby on Rails! This means we have full introspective capabilities for all corners of the tech stack, from controllers to the database. This means the documentation generation process can be fully DRY and error-free. Let’s do it!

Choosing tooling

There are bunch of tools to generate OpenAPI specs, and we did the research. These didn’t fit the bill, but it’s worth mentioning why:

  • swagger-blocks is a DSL for generating OpenAPI documentation from Rails controllers. Unfortunately, it has limited support for actual OpenAPI 3 specification and seems to have been abandoned.
  • apipie-rails is another DSL for controllers, maintained but with support only for Swagger 2.0 and no OpenAPI 3 support.

Then, we have these:

  • rspec-openapi is a gem that generates OpenAPI documentation from RSpec tests, but without extra DSL.
  • rswag, which is a popular and feature-rich gem for generating OpenAPI documentation from RSpec tests. It uses custom syntax on top of request specs to describe API endpoints.

While both rswag and rspec-openapi are great tools that would cover our needs, we ultimately chose rswag for its expressive syntax and its ability to exercise greater control over the generated OpenAPI schema.

Inside RSwag

A typical RSwag request spec look like this:

# spec/requests/api/v1/foos_spec.rb
RSpec.describe "/api/v1/foos", openapi_spec: "v1/schema.yml" do
  path "/v1/foos/{id}" do
    get "Get a Foo" do
      parameter name: :page, in: :query, schema: { type: :integer, default: 1 }, required: false, description: "The page number to retrieve"

      response "200", "A successful response" do
        schema type: :object,
               properties: {
                 id: { type: :integer },
                 full_name: { type: :string },
                 bar: { type: :object, properties: {}}
               },
               required: [ 'id', 'full_name' ]

        run_test!
      end
    end
  end
end

The only unsatisfactory part here is the manual definition of the response schema; it’s error-prone and can easily become out of sync with the actual implementation. Not good!

Enter the Typelizer Gem!

Alas… if only we could retrieve information about API fields and their types from our serializers to automatically generate OpenAPI documentation.

But wait, there is a gem that generates TypeScript typings from serializers. And it can also be used to do exactly what we’re looking for! (Although its author didn’t plan it to be used like this.)

Typelizer was created by Svyatoslav Kryukov, a Martian engineer, and it generates TypeScript type definitions from Ruby serializers. It works with popular serialization libraries like ActiveModel::Serializer and Alba.

Here’s how a typical serializer looks with Typelizer annotations:

 class FooSerializer < ActiveModel::Serializer
   include Typelizer::DSL

   attribute :id # Will be inferred from the model Foo

+  typelize :string
   attribute :full_name do
     first_name + ' ' + last_name
   end

   has_one :bar, serializer: BarSerializer
 end

From Serializers to OpenAPI Schema

The key insight here is that the type information already present in our serializers could be converted into OpenAPI schema components. The journey involves to do this involves the following steps.

Step 1: Add annotations to serializers

We enhanced our existing serializers with type information using Typelizer’s DSL. This doesn’t change how they work, but enriches them with type metadata.

This only needs to be done for custom methods, as Typelizer will automatically infer the type for fields that directly uses data from the database table columns. So probably there will be only a handful of places where you need to add the type information.

Step 2: Define an RSwag schema template

Here we implement an RSwag global metadata (as excellently demonstrated in its documentation if you want to read more), but instead of defining the schema definitions for our “models” manually, we insert some Typelizer magic to generate components from the serializers:

 # spec/swagger_helper.rb
 RSpec.configure do |config|
   config.openapi_specs = {
     "schema.yml" => {
       openapi: "3.1.0",
       paths: {}, # RSwag will fill this in
       components: {
         schemas: {
+          Typelizer::Generator.new.interfaces.to_h do |interface|
+            [
+              interface.name,
+              # Magic is here, see the next step
+            ]
+          end
         }
       }
     }
   }
 end

Step 3: Convert Typelizer data to OpenAPI

The magic part for every typelizer interface is as follows, take a look, and then we’ll unpack it below:

{
  type: :object,
  properties: interface.properties.to_h do |property|
    definition = case property.type
                 when Typelizer::Interface
                   { :$ref => "#/components/schemas/#{property.type.name}" }
                 else
                   { type: property.type.to_s }
                 end

    definition[:nullable] = true if property.nullable
    definition[:description] = property.comment if property.comment
    definition[:enum] = property.enum if property.enum
    definition = { type: :array, items: definition } if property.multi
    [
      property.name,
      definition
    ]
  end,
  required: interface.properties.reject(&:optional).map(&:name)
}

Above, we leverage Rails’ introspection capabilities through Typelizer to get the type information from across the stack and convert it into OpenAPI schema components:

  • We use ActiveRecord to access underlying database schema to understand the types of fields: be it number, string, Boolean, and so on. Additionally, we see whether it’s nullable (or not), or an array (or not).
  • If a database column has an associated comment in the schema, we use it as a description for the field in the OpenAPI schema.
  • We access enum defined in the model to understand possible values for enumerated fields.
  • We use Typelizer to understand the types of complex fields in serializers and generate OpenAPI schema components from them.

Step 4: Write RSwag specs as usual

With that, we mostly carry on as expected and described in the RSwag documentation, but we do want to replace the manual schema definitions with references to the generated schema components using $ref:

 # spec/requests/api/v1/foos_spec.rb
 require "swagger_helper"

 RSpec.describe "/api/v1/foos", openapi_spec: "v1/schema.yml" do
   path "/v1/foos" do
     get "List Foos" do
       produces "application/json"
       description "Returns a collection of foos"

       parameter name: :page, in: :query, schema: { type: :integer, default: 1 }, required: false, description: "The page number to retrieve"

       response "200", "A successful response" do
-        schema type: :object,
-               properties: {
-                 id: { type: :integer },
-                 full_name: { type: :string },
-                 bar: { type: :object, properties: {}}
-               },
-               required: [ 'id', 'full_name' ]
+        schema type: :array, items: { "$ref" => "#/components/schemas/Foo" }

         run_test!
       end
     end
   end
 end

The result: an auto-generated OpenAPI schema

Running one simple command, bundle exec rails rswag, we get a complete OpenAPI specification that perfectly matches our implementation:

# This is generated by RSwag
paths:
  /v1/foos:
    get:
      responses:
        '200':
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/Foo'
# This part is generated by Typelizer
components:
  schemas:
    Foo:
      type: object
      properties:
        id:
          type: integer
        full_name:
          type: string
        bar:
          $ref: '#/components/schemas/Bar'

AI-assisted migration

Here’s a fun little diversion: when migrating an existing codebase using this approach, we found that leveraging AI (in particular, Claude) significantly sped up the process of rewriting tests in order to follow the RSwag format.

  • To start, rewrite one typical test using RSwag by hand
  • Then attach it along with the source test file to AI context and describe which file is the source and which is the result. If you have original handcrafted schema file, include it too.
  • In following messages ask AI to rewrite the test using RSwag and generate the OpenAPI schema from the serializer.
  • If you have a lot of tests, you can ask AI to rewrite them in batches, but be careful with the context size limit.
  • Manually check each generated test for correctness and completeness, pay attention that all test variants were converted (sometimes AI “simplified” the test by “forgetting” to include some RSpec contexts from the source to the result).

Though we had to manually check and fix the generated specs, the AI-assisted migration saved us a lot of time and effort, and this helped us shift from regular controller tests to documentation-generating RSwag tests without changing our implementation.

Maintaining schema integrity

To make sure the schema stays up-to-date with implementation changes, we added two key validation steps to our CI/CD pipeline:

1. Validating the schema

Use Spectral to validate OpenAPI schema, and add the following check to your Github Actions:

- uses: stoplightio/spectral-action@latest
  with:
    file_glob: 'openapi/**/schema.yaml'
    spectral_ruleset: 'openapi/.spectral.yml'

2. Regenerating the schema in CI and checking for changes

To verify that a developer didn’t forget to regenerate the schema, add the following check to your Github Actions:

- name: Re-generate OpenAPI spec and check if it is up-to-date
  run: |
    bundle exec rails rswag
    if [[ -n $(git status --porcelain openapi/) ]]; then
      echo "::error::OpenAPI documentation is out of date. Please run `rails rswag` locally and commit the changes."
      git status
      git diff
      exit 1
    fi

3. Detecting breaking changes

With our schema under version control, we can easily generate changelog using tools like oasdiff:

docker run --rm -t -v $(pwd):/specs:ro tufin/oasdiff changelog \
  old_schema.yml new_schema.yml -f html > oas_diff.html

Execute oasdiff breaking with --fail-on ERR in your CI/CD pipeline to fail the build if there are any breaking changes.
This will help you catch unexpected breaking changes before they reach production. See oasdiff documentation for breaking changes for more details.

Did we use this approach in production?

Yes we did, and for Whop.com!

Whop is a rapidly growing social commerce platform, so development speed is their top priority, and the ability to generate an OpenAPI schema from serializers has provided some major pain relief for them.

To elaborate a bit, we migrated their main application from a hand-crafted OpenAPI schema to an implementation-first approach as described in this post.

The result? A significant reduction in time spent on documentation and a much more accurate and up-to-date API specification.

Pull request with OpenAPI schema generation for Whop.com: stats header

Let there be docs!

Documentation-first? Implementation-first? Or hybrid?

In our case, by leveraging the type information already present in serializers, we’ve really found a pragmatic middle ground that gives us the benefits of documentation-first development while maintaining a close connection to the actual implementation.

This approach keeps your docs accurate, you know your API is well-typed, and your development workflow remains efficient!

Schedule call

Irina Nazarova CEO at Evil Martians

We've scaled databases, optimized APIs, and fixed performance bottlenecks for dozens of tech startups. Hire Evil Martians to solve your toughest engineering challenges!