Apollo launch: Building a migration architecture for 2U
The microservice stitching gateway at 2U, once designed to empower a distributed development process for multiple standalone engineering teams, suddenly made it too hard due to growing inefficiency and complexity. Martians helped company’s services to declare independence again, migrating the platform to Apollo GraphQL Federation.
2U, Inc. is a global leader in education technology and a trusted partner and brand steward of great universities. 2U builds, delivers, and supports more than 400 digital and in-person educational offerings, including graduate degrees, professional certificates, Trilogy-powered boot camps, and GetSmarter short courses. Together with their partners, 2U has positively transformed the lives of more than 225,000 students and lifelong learners.
Stitching kit
The 2U team defines their identity as “a diverse collection of innovators, dreamers, and doers all working to improve lives through higher education technology.” This authentic and vibrant diversity culture embraces both the organizational and technical levels.
Basically, 2U is a massive distributed platform with many advanced features, deployed to the Amazon Web Servers cloud. A query language they use to power the platform is, of course, GraphQL, since this schema-based technology simplifies and pushes on all engineering processes. This factor is crucial for 2U, as their development is driven by multiple self-organized engineering teams. They work on different microservices and juggle multiple languages from JavaScript and Python to Ruby, Elixir, and Clojure—whatever fits best the task at hand.
Seamless service integration is a king in this situation. This process was initially grounded on a single unified GraphQL API, powered by a schema stitching technique and deployed to cloud cluster (the team is planning to migrate to AWS Elastic Kubernetes Service). But something went wrong.
Perfect for distributed services at the start, this technological choice was gradually blunting effectiveness in keeping the stitching gateway in good shape. Almost any change in underlying services required a symmetrical gateway update. All that did not contribute much to the separation of concerns principle and eventually turned the distributed services into the notorious Monolith.
Apollo-gizing
Afer the Apollo GraphQL Federation had been announced, 2U decided to adopt it and move away from the monumental central planner to the bright distributed future. This cutting-edge technology leverages a declarative programming model to organize proper coordination between multiple GraphQL services, granting them independence from the single gateway. Each team can now foster its service and domain logic without investigating the neighboring services and their bottlenecks.
There was just one little thing left: turn a stitched GraphQL monolith into the Apollo Federation, cover it fully with tests and experience no failures during the migration. Big things have small beginnings. The 2U team asked Martians to plan the first-ever service to be federalized against the backdrop of a 1-month deadline and strict performance requirements: less than 10ms response time at the 95th percentile and a low memory footprint under thousands of requests per minute. To do it in a genuinely federated manner, we chose Node.js since Apollo Federation is primarily a technology based on the JavaScript stack.
In four weeks, Martians implemented the very first federalized service and immediately faced a new challenge: this service required the fully-functioning federated environment to work, but all we had was the outdated stitched gateway. The situation called for a flawless architecture to create a federalized landscape.
A straightforward approach recommends adding Apollo Federation support to existing services one-by-one, moving stitched logic away from the gateway and towards the implemented services. With reams of services and the extremely sophisticated gateway, it could take too long and seemed too stressful even for making the first move. Moreover, this approach entailed complex orchestration of teams and downtimes in maintenance, which was unrealistic. Martians immersed in searching for solutions by talking to teams and scrutinizing codebase. Eventually, we designed a practical plan and functional architecture for the whole migration that required fewer expenses to maintain.
Before: Single GraphQL schema with schema stitching. Consumers query Stitching Gateway directly.
In tandem with 2U’s Data Infrastructure team, we started by turning the stitched gateway itself into the first federated service to live behind the unified Federated Gateway. Since that moment, each service could be independently extracted from the stitched gateway and added to the Federation according to internal priorities and demands.
After: Stitching Gateway is put behind Apollo Federation Gateway. Consumers query Apollo Federation Gateway.
More things to do
Then, we federalized a few more services to confirm we had enough migration experience to design some documentation on how to join services to the Federation smoothly. We supported other teams and communicated with them to make sure they clearly understand how to federalize their services and move them from the monolith platform with zero downtime. We also improved a default service template to make the development of brand-new services easier.
Finally, we designed and implemented authorization patterns for a graph with distributed nature. For this, we built a new GraphQL tool to help with authorization policies that inherited concepts of our open source Action Policy solution for Rails and Ruby applications but was specially tailored for services running on the original Apollo Server. We are going to make it open source too, so stay tuned for it to be released soon.
Evil Martians have been instrumental in guiding many of 2U’s Academic-Product teams through a difficult transition to new technology and paradigms as we migrated from a schema-stitched implementation to Apollo’s Federation. Through technical expertise both in system-design and implementation, we were able to achieve our goals of freeing up our different teams to work more independently.
Michael Schechter
Director of Engineering at 2U
Results
Now, 2U has it all: without a full GraphQL logic rewriting and any code freezes, the platform was safely migrated to the Federation. The team now has all the means to keep enhancing it further, step by step.
The migration we performed was a challenging and impactful technical task to benefit the entire business for years to come. And we had all the support from 2U we could dream about, including reviews, discussions, and active contributions to the migration process.
The last question to ask is how Martians can help your project to architect, federalize, and enhance the microservices environment—with GraphQL or otherwise. Please, give us a hint.