Merging Open Core and Machine Learning
In recent years, Deep Learning techniques have greatly improved accuracy in a whole family of Machine Learning tasks, making previously experimental technologies reliable enough for use in production. What used to be cutting-edge academic research is now an established industry with mature infrastructure and highly available tools that make building end-user solutions more straightforward than ever before.
One of the ways Machine Learning tools become available to businesses is a COSS (Commercial Open Source Software) model. It’s an open core that anyone can freely use and modify with an option to adopt commercial features and priority support—something that makes more economical sense for larger customers than relying on purely open source contributions. At Evil Martians, we have embraced the same approach for our OSS products: both imgproxy and AnyCable now offer commercial versions.
DVC (stands for Data Version Control) from Iterative.ai is a Git-based version control solution for Machine Learning engineers that follows the same COSS principles: free open core for individual developers and paid plans for teams and enterprises.
Martians helped the company rethink its web presence and shape the UI and frontend for sophisticated user-facing product Iterative.ai has—the DVC Studio.
Business on data
DVC was built for data scientists by data scientists: started in 2018 by Dmitry Petrov and Ivan Shcheklein, Iterative.ai is built on the experience both founders have acquired while working for companies like Microsoft, Node.io, Yandex Labs, and Google.
Remote-first and based in San Francisco, California, Iterative.ai raised a $20M Series A round in 2021, which brought the company’s total funding to over $25 million.
Rise of the versioning
Machine Learning models are different from “traditional” software products. Their success depends not only on planning and execution but also on training: a highly complicated process where even the slightest deviation in one of many parameters can make or break the outcome. There’s a strong demand to track all the inputs to the model and version the training variants: so that expensive CPU cycles don’t go to waste and a team can easily select the “winning” combination from a plethora of experiments. There were many approaches to that: from storing results in Python notebooks to having REST-accessible databases, but there was no standard, vendor-agnostic, lightweight toolchain to maintain data version control and model performance monitoring. Not before DVC came into play.