A broader picture: A guide on imgproxy for businesses
imgproxy is a Martian open source product that has been replacing the entire image processing tech stack in web applications for several years, both in its open source and extended Pro versions. It’s used by Photobucket, eBay, Dev.to, Algolia, and dozens of other businesses. This post collects the key use cases and gives a full picture of reducing development time and infrastructure costs.
Moment of history
imgproxy emerged from Evil Martians’ open source mines in 2011. Back then, we often addressed the daily challenge of processing a sheer number of user-generated images that arrived from different sources. Like most web developers, we wagered on image resizing libraries or plugins and a tedious process of hosting a lot of pre-processed files with the original, organizing background processing for all required resize variants, and running additional background resizing on every major landing redesign.
So we decided to shop around looking for unconventional but platform-agnostic proxying tools. But after careful investigation, we found that their performance varied from tool to tool and was not always stellar. Besides, we spotted some severe potential security vulnerabilities that were a regrettable trend for many of them: anyone could shape a special URL to access the service for their purposes or DDoS attacks.
We saw no alternative but to build our own advanced image proxying tool that could deal with image processing from any source. We published it as an open source project and began to use it in our customers’ products. That’s why we could watch the project’s benefits—best-of-breed processing speed, low memory consumption, and all security features—in the real world.
imgproxy allows engineers to build image processing in their products without reinventing the wheel or installing additional tooling, plugins, and libraries to process and store images in the background. It’s a simple and small standalone web server to host on your own—you can host as many instances as you need without any unexpected additional expenses.
Customers don’t need to store processed images on-premises: imgproxy resizes them on the fly, saving disk space. Since imgproxy is designed to run behind a caching CDN (Content Delivery Network), you don’t have to re-process images from scratch every single time, but even if you choose to do so—this is not a big deal for imgproxy’s overall performance.
It covers all the tasks on changing different image parameters and formats. By way of example, with imgproxy, an original 4000x2000 image of 650KB can be resized on the fly to a 500x250 image of 19KB—without any noticeable delay for the end-user. From the point of view of someone loading your website it will look exactly as if you have stored the images statically on your server.
imgproxy is used by large projects that deal with tens of thousands of images daily—photo stocks, ecommerce platforms, media, and startups in many industries from financial companies to drone control services and hotel businesses. To name a few: Photobucket, the photo service that hosts more than 10 billion images from 100 million registered members; Algolia, the search-as-a-service platform serving over 70 billion queries per month; Ozon, the fastest-growing online retailer in Europe with almost $1B of gross merchandise value, and many other marketplaces and media resources.
Use case 1: Image for millions
imgproxy suits projects of all sizes and even fits startups at their early MVP stages. However, the benefits of imgproxy are the most impressive in large projects with millions of user-generated images: avatars, photos, product pictures. Storing, updating, and keeping in sync all the image variants for all possible screens and page designs, present and future, quickly becomes a fool’s errand. In such cases, imgproxy can eliminate hundreds of lines of hard-to-maintain code and dozens of technical issues that keep your engineers from focusing on core business tasks.
Good cases in point are Algolia, which processes about 1,5M images per month and does not store them but uses a long client cache (over 1 year), and eBay spin-offs that already processed over 2M images. In these cases, if you keep image versions locally, their resizing and other re-formatting can take ages. imgproxy helps projects do it much faster: images and the entire image collections are transformed in no time and cached by a CDN for further requests.
Shopware is a provider of flexible and future-proof ecommerce solutions that leverage the growth potential of their 100,000+ customers. One of its products is a plugin that gives access to dynamic thumbnails (by providing the thumbnail URLs) and the LazyLoading functionality. Under the hood, it uses imgproxy.
Almost all marketplaces have a strong demand for image thumbnails. By default, they are automatically generated and saved during the upload. Thanks to this plugin, thumbnails no longer have to be stored—they are generated and delivered in real time. This allows not to waste computing power or storage space for thumbnails and accelerates image upload and backups that require fewer files.
Use case 2: Superspeed bullet
Images add up to 60% of the average web page’s total load time. These stats make speed one of the key imgproxy’s powers. Under the hood, we rely on libvips—the world’s most efficient and fast image processing library. It has a very low memory footprint; that’s the factor that makes the on-the-fly processing for a massive amount of images a real thing to do. In the following benchmark, we compared imgproxy with other popular open source image proxying tools. In the test, 4 users simultaneously required and made 250 image resizing requests (1000 requests in total). imgproxy handled that by using only 200MB of RAM. According to results in real customer projects, imgproxy rarely consumes more than half of that.
Here are benchmarking results with some alternatives.
To speed up the image processing, engineers can set up read, write, and download timeouts and limit maximum dimensions for images from remote sources to avoid slowdowns in processing massive pictures. They can also limit the number of image requests to be processed simultaneously (by default, it’s a double number of processor cores).
While choosing a solution for image processing, Photobucket focused on speed and cost optimization. The company first used an OSS imgproxy version but then opted for its Pro variant with features like GIF to MP4 converter and video thumbnails. We’ve also added ARM processors support that the platform needed to use with Amazon’s AWS Graviton architecture for better “bang per buck” on processing. Thus, we helped them achieve their goal to make image processing fast and affordable.
Use case 3: Easy to deploy
Building an image processing pipeline can be a tedious challenge and include many steps: tools and libraries installation, searching for a plugin or a library to support that tool in a specific framework or programming language, organizing storage, implementing image upload, figuring out security challenges, arranging background processing queues, and generating all image versions for different UI designs and screen sizes.
Following the “No code is better than no code” approach, imgproxy reduces this scheme to just one engineering task—image hosting (or re-using images from a remote source) and two steps—uploading the images and assigning the required parameters to the specially crafted request URL for imgproxy.
imgproxy fits any environment: since the product is language and framework agnostic, it can be used with any technology stack. For the simplicity of installation and configuration, it sticks to the Twelve-Factor App methodology to be suitable for deployment on modern cloud platforms and fully configured with environment variables only. It’s ready to be installed and used in any popular environment. You can use a Docker image, a Linux machine; a PaaS like Heroku, or your cluster for full control over your infrastructure and spendings.
Our goal was to design a lightweight and memory-efficient solution—because back then, resizing solutions on the market didn’t satisfy our requirements. For instance, we discovered that Thumbor’s Docker container was 300 times bigger than imgproxy’s one; it also used a much slower Pillow library under the hood. Picfit required storage to keep track of all changes. Pilbox didn’t use libvips, had no Twelve-Factor support, and relied on a Python-based Tornado web server that may not fit every application. Hosted services like Cloudinary were unpredictable in pricing. Although they typically included on-the-fly image processing, some could take every image version as a new image, thus depleting a quota.
imgproxy installation is a no-brainer. On the imgproxy’s site, you can find all the instructions to install it with a Docker pull. We even have a simple ”Deploy to Heroku” button to get an up-and-running test instance of imgproxy instantly. You can generate a key/salt pair available to cut and paste anywhere or create your own unique set of keys and random hex-encoded strings.
Fitting the infrastructure from scratch
Optimole is an all-in-one image optimization solution for WordPress and beyond that processes images in real time in a cloud to avoid straining their customer’s servers. Optimole was born as a product to solve the problems that the project’s team had with their own WordPress sites in terms of image processing and optimizations. The company wanted to offer an affordable enterprise-grade media stack for everyone.
Being a small team, they wanted to focus only on what they knew best to build a reliable and valuable product—the WordPress plugin—and partner with professionals for the image processing engine and infrastructures like AWS.
The Optimole engineering team did some research and checked up some popular solutions for image processing. They were looking for a project that could allow them a rapid release of the MVP, which they could use on their network of sites first. The key feature that the team was looking to have was high configurability and friendliness with modern cloud platforms—it should be able to customize every aspect without any code tweaks from engineers. Besides, they had secure and with low resource needs as Optimole was looking to handle millions of image transformations each day.
The team discovered imgproxy when searching for a proxying server that would use libvips, allowing image processing from any remote URLs with low memory usage. After checking some popular projects on GitHub and different services, imgproxy was the only one that met most of the requirements. After going through the API and code, they ended up using imgproxy from the first iteration of the service (from day 1, actually).
Today, with a performance-focused image processing library like imgproxy and the power of AWS services like Lambda@Edge and CloudFront, Optimole can deliver for WordPress users around 70% smaller images on average. Right now, the WordPress plugin is actively used by more than 50K sites, according to wordpress.org. imgproxy processing engine transformed more than 1.1B images (you can find a real-time counter of transformed images on optimole.com) in 1 year since the team started Optimole.
Cleaning up the mess
Many current imgproxy customers gave a try to other resizing and compression solutions in the past but faced their weak points. Algolia initially used imageproxy but found it more complicated to maintain and tweak. The company switched to imgproxy, an assonating but completely different tool.
The case that can highlight this point is our customer project with Retail Zipline, an advanced communication and task management platform for retail headquarters, store managers, and retail associates that serves more than half a million users across over 25 000 stores. Initially, they relied on Rails’ built-in Active Storage for image processing, which required a Ruby web server with ImageMagick installed to transform images on-the-fly. (Besides Magick, Active Storage can process images with the libvips library, but it requires a more complex deployment configuration and, again, will be based on on-the-fly processing within a Ruby web server). But this combination caused performance issues with image conversion. Even after an image has been transformed, its loading wasn’t completely smooth for end users. Active Storage had to check every image availability in the CDN before serving it (the way to avoid this problem was introduced only in the Rails 6.1 release).
Using imgproxy within their infrastructure helped optimize performance and speed up all the image processing workloads. Setting up imgproxy was dead simple since the Retail Zipline engineering team already had the right infrastructure combination up and running, so imgproxy could grab images directly from the CDN without bothering their Rails server in the process. They only needed to add a few lines of configuration to their Docker setup in the development environment, thanks to the official imgproxy Docker images’ availability.
Then, the team mixed imgproxy, Active Storage, and their CDN to achieve backward compatibility with the current setup and feature toggling. Besides, they enhanced security to eliminate non-expirable URL incidents with data leaking through accidental image link sharing. For this purpose, they made CloudFront’s Trusted Signers feature to work with imgproxy and Rails and adopted signed images’ URLs for imgproxy.
Use case 4: Like a pro
Customers’ needs have been designating the imgproxy’s evolution because in the project with eBay Bonus, the initial version provided all the features we needed. A while back, every new feature was once requested by a certain customer and became available and highly sought by others. For example, Photobucket needed an assembly for AWS Graviton2 and video thumbnail generation (that was the feature we wanted to release too, so that customer’s request encouraged us to do it faster). The other potential clients asked for PDF preview support, got it soon, and became our current customer.
imgproxy comes in an open source version and a Pro version that adds priority commercial support and several advanced features for image processing, fine-tuning, and security on top. The open source version is full-scale and can be a perfect fit for most customers: it provides image compression, format conversion, resizing and cropping, metadata and image background operations.
The most popular Pro-features are video thumbnail generation (turning every frame from a video file into a preview image), custom watermarks (adding a logo or copyright text to images where you can use different watermarks for different images specifying their URLs), and advanced JPEG optimization (squeezing your JPEGs to the last byte without compromising on quality).
Typically, our customers try the open source version first, and if they need any advanced features, they come to us for one of these licenses. But lately, we are watching more and more businesses who want to start with the Pro option from the offset.
Use case 5: Obsessed with security
Security out of the box had become one of the key imgproxy advantages since the product was born in the situation when we weren’t satisfied with the existing solutions’ level of security. One of the first features we implemented in imgproxy was taken from our early work, a Ruby gem to protect Ruby applications from image bombs—carrierwave-bombshelter.
It came into being when we dealt with large projects with user-generated product photos and needed to determine the image sizes. The previous standard to measure a picture size was through decoding the image first—it was a memory- and disk space-consuming process. carrierwave-bombshelter resolved the issue, letting engineers adjust the image sizes according to their application’s facilities.
The gem helped discover one of the most common attack vectors—image bombs (the sub-class of decompression aka “zip bombs”) and protect customers from malicious attacks to overload the image processing workloads. For example, a small PNG bomb of a 5.8MB file can inflate to take 141.4GB of space. This kind of attack often camouflages deeper and more serious attacks aiming to steal data while you are busy combatting the first wave. imgproxy checks image type and its “real” dimensions and cancels processing if this image is fake or its size is too big. Since imgproxy does it at the moment of downloading, the “mean” image will not even be fully downloaded (there is a setting for that).
The product’s further evolution solved other security problems related to the mass processing of images. For instance, imgproxy protects image URLs with a cryptographic signature, so attackers cannot launch a denial-of-service attack by requesting multiple images resizes. The feature ensures that no one will be able to use your imgproxy server unless they know both your configuration credentials.
As an extra layer of security, imgproxy supports authorization by an HTTP header: you can insert an authorization token to a header of all incoming HTTP requests. It can hide your image’s origin to protect it from attackers while still processing images via a CDN or a caching server.
We are now focusing on two areas of the imgproxy evolution. The first is building customized end-to-end solutions that can include deploying and configuring, and even designing specific new features. The second is launching new features to fit the market: e.g., we’re now witnessing a surge in demand for images and videos in Machine Learning projects and neural networks, including object and face detection, smart cropping, and elements blurring.
Do you think imgproxy can be a good fit for your projects? Try it right away, go further with its Pro version, or let us know if you fancy a quick chat about building a custom solution based on the product, consulting your development team, adjusting imgproxy to your needs, and integrating it into your infrastructure.