A broader picture: A guide on imgproxy for businesses

May 24, 2021

Topics

imgproxy is a Martian open source product that’s been replacing the entire image processing tech stack in web applications for several years, both in its open source and extended Pro versions. It’s used by Photobucket, eBay, Dev.to, Algolia, and dozens of other businesses. This post collects the key benefits and gives a full picture of reducing development time and infrastructure costs.

Moment of history

imgproxy emerged from the Evil Martians open source mines back in 2011. Back then, we often faced the daily challenge of processing a huge number of user-generated images arriving from different sources. Like most web developers, we bet on image resizing libraries or plugins and the tedious process of hosting a ton of pre-processed files with the original, organizing background processing for all the necessary resize variants, and running additional background resizing on every major landing redesign.

So, with this tedium in mind, we decided to shop around looking for unconventional, but platform-agnostic, proxying tools. But after careful investigation, we found that performance varied from tool to tool, and it was not always stellar. Moreover, we spotted some potential critical security vulnerabilities. This was a regrettable trend for many of them: anyone could formulate a special URL to access the service for their own purposes or for DDoS attacks.

With this, we saw no alternative but to build our own advanced image proxying tool that could deal with image processing from any source. We published it as an open source project and began to use it in our customers’ products. This gave us the ability to observe the project’s real-world benefits: a best-of-its-kind processing speed, low memory consumption, and robust security features.

imgproxy today

imgproxy allows engineers to build image processing in their products without reinventing the wheel or installing the additional tooling, plugins, and libraries necessary to process and store images in the background. It’s a simple, small standalone web server you host on your own; and you can host as many instances as needed without any unexpected additional expenses.

Customers don’t need to store processed images on-site: instead, imgproxy resizes them on the fly, saving disk space. Since imgproxy is designed to run behind a caching CDN (Content Delivery Network), you don’t have to re-process images from scratch every single time, but even if you choose to, this isn’t a big deal for imgproxy’s overall performance.

This covers all tasks related to changing different image parameters and formats. By way of example, with imgproxy, an original 4000x2000 image of 650KB can be resized on the fly to a 500x250 image of 19KB without any noticeable delay for the end-user. From the point of view of someone loading your website, it will look exactly as if you have statically stored the images on your server.

imgproxy is used by large projects that deal with tens of thousands of images daily: photo stocks, ecommerce platforms, media, and startups in many industries—from financial companies to drone control services and hotel businesses. To name a few: Photobucket, the photo service that hosts more than 10 billion images from 100 million registered members. Algolia, the search-as-a-service platform serving over 70 billion queries per month. Ozon, the fastest-growing online retailer in Europe with almost $1B of gross merchandise value—and many other marketplaces and media resources.

Benefit 1: Images for millions

imgproxy suits projects of all sizes; it even fits startups at early MVP stages. However, the benefits of imgproxy are most impressive in large projects with millions of user-generated images: avatars, photos, product pictures. Without a tool like imgproxy, storing, updating, and keeping all the image variants in sync for all possible screens and page designs, present and future, quickly becomes a fool’s errand. In these cases, imgproxy can eliminate hundreds of lines of difficult-to-maintain code (and dozens of technical issues that keep your engineers from focusing on core business tasks).

User-generated images

Some good illustrations of this power: Algolia, which processes about 1.5M images per month but does not store them and instead uses a long client cache (over 1 year), and eBay spin-offs which have already processed over 2M images. In these cases, if you keep the image variants locally, resizing and other re-formatting can take ages. imgproxy helps projects do this much faster: images and entire image collections are transformed in no time and cached by a CDN for further requests.

imgproxy solves the tedious task of optimizing every product image from eBay’s billions of listings. It only processes images that are requested, does it on completely the fly, and blazingly fast. Finally, imgproxy protects from a number of vulnerabilities, helping us to adhere to eBay’s strict security standards.

imgproxy in plugins

Shopware provides flexible, future-proof ecommerce solutions that leverage the growth potential of their 100,000+ customers. One of its products is a plugin that gives access to dynamic thumbnails (by providing thumbnail URLs) and LazyLoading functionality. And under the hood, it uses imgproxy.

Almost all marketplaces have a strong demand for image thumbnails. By default, they’re automatically generated and saved during upload. Thanks to this plugin, thumbnails no longer have to be stored—they’re generated and delivered in real time. This avoids wasting computing power or storage space for thumbnails and accelerates image upload and backups that require fewer files.

Benefit 2: Superspeed bullet

Images are responsible for up to 60% of the average web page’s total load time; thankfully, since speed is one of the key imgproxy’s powers, these stats don’t have to be a concern. Under the hood, we rely on libvips—the world’s most efficient and fast image processing library. It has a very low memory footprint; this makes processing a massive amount of images on-the-fly realistic. In the following benchmark, we compared imgproxy with other popular open source image proxying tools. In the test, 4 users simultaneously required and made 250 image resizing requests (1000 requests in total). imgproxy handled this using only 200MB of RAM. And, according to results from real customer projects, imgproxy rarely consumes more than half of that.

Below are benchmarking results with some alternatives.

To speed up image processing, engineers can set up read, write, and download timeouts and limit maximum dimensions for images from remote sources to avoid slowdowns when processing massive pictures. They can also limit the number of image requests to be processed simultaneously (by default, it’s double the number of processor cores).

When choosing a solution for image processing, Photobucket focused on speed and cost optimization. The company first used the OSS version of imgproxy but then opted for imgproxy Pro with features like GIF to MP4 converion and video thumbnails. We’ve also added ARM processor support that the platform needed to use with Amazon’s AWS Graviton architecture for better “bang-per-buck” processing. Thus, we helped them achieve their goal: making image processing fast and affordable.

imgproxy helps us optimize millions of images every hour to deliver the best quality, size, and format for each device. Optimizing on the fly helps us avoid storing billions of extra thumbnails and other versions of the same image and helps our customers get fast and efficient viewing and sharing. The growing list of Pro features like video thumbnails and EXIF support have helped us enhance our platform and simplify other processes.

Benefit 3: Easy to deploy

Building an image processing pipeline can be a tedious challenge that includes many steps: tools and libraries installation, searching for a plugin or a library to support that tool in a specific framework or programming language, organizing storage, implementing image upload, figuring out security challenges, arranging background processing queues, and generating all image versions for different UI designs and screen sizes.

Following the “no code is better than no code” approach, imgproxy reduces this scheme to just one engineering task: image hosting (or re-using images from a remote source); and two steps: uploading images, and assigning the required parameters to the specially-crafted request URL for imgproxy.

imgproxy is ready to be installed and used in any popular environment. Since the product is language and framework agnostic, it can be used with any technology stack. For the simplicity of installation and configuration, it sticks to the Twelve-Factor App methodology to be suitable for deployment on modern cloud platforms and fully-configured with just environment variables. You can use a Docker image, a Linux machine; a PaaS like Heroku, or your cluster for full control over your infrastructure and spending.

Our goal was to design a lightweight, memory-efficient solution; back then, the resizing solutions on the market didn’t satisfy our requirements. For instance, we discovered that Thumbor’s Docker container was 300 times bigger than imgproxy’s; it also used a much slower Pillow library under the hood. Picfit required storage to keep track of all changes. Pilbox didn’t use libvips, had no Twelve-Factor support, and relied on a Python-based Tornado web server that may not fit every application. Hosted services like Cloudinary were unpredictable in terms of pricing. Although they typically included on-the-fly image processing, some could take every image version as a new image, thus depleting a quota.

Installing imgproxy is a no-brainer. On imgproxy’s site, you can find all the installation instructions with a Docker pull. We even have a simple ”Deploy to Heroku” button to instantly get an up-and-running test instance of imgproxy. You can generate a key/salt pair to cut and paste or create your own unique set of keys and random hex-encoded strings.

Fitting infrastructure from scratch

Optimole is an all-in-one image optimization solution for WordPress and beyond that processes images in real time on the cloud to avoid straining their customer’s servers. Optimole was born to solve image processing and optimization problems that the project’s team had with their own WordPress sites. The company wanted to offer an affordable enterprise-grade media stack for everyone.

Being a small team, they wanted to focus only on what they knew best: building a reliable and valuable product—the WordPress plugin—and partnering with professionals for the image processing engine and infrastructures like AWS.

The Optimole engineering team did some research and checked out some popular image processing solutions. They were looking for a project that would allow them to rapidly release an MVP, which they could use on their network of sites first. The key feature the team was looking for was high configurability and friendliness with modern cloud platforms—an ability to customize every aspect, without any code tweaks from engineers. Further, they had secure and with low resource needs as Optimole was looking to handle millions of image transformations each day.

The team discovered imgproxy while searching for a proxying server that used libvips, allowing image processing from any remote URL with low memory usage. After checking some popular projects on GitHub and different services, imgproxy was the only one that met most of their requirements. After going through the API and code, they ended up using imgproxy from the first iteration of the service (from day 1, actually).

I think being focused on performance is one of the main aspects that differentiates imgproxy from similar libraries. Every feature is well analyzed and fine-tuned to allow maximum performance.

Today, with a performance-focused image processing library like imgproxy and the power of AWS services like Lambda@Edge and CloudFront, Optimole delivers WordPress users images that are around 70% smaller on average. The WordPress plugin is actively used by more than 50K sites according to wordpress.org. The imgproxy processing engine transformed more than 1.1B images (you can find a real-time counter of transformed images on optimole.com) in the year since the team started Optimole.

Cleaning up the mess

Many current imgproxy customers tried other resizing and compression solutions in the past but ran into weak points. Algolia initially used imageproxy but found it more complicated to maintain and tweak. The company switched to imgproxy, which has a similar name but is completely a different tool.

We opted for imgproxy because of easiness to deploy and set with Heroku and environment variables; we did not want to spend much time implementing a new proxy and we found imgproxy to be a robust and reliable choice. Thanks for building a fast and user friendly image proxy!

The case that can highlight this point is our customer project with Retail Zipline, an advanced communication and task management platform for retail headquarters, store managers, and retail associates that serves more than half a million users across over 25 000 stores. Initially, they relied on Rails’ built-in Active Storage for image processing, which required a Ruby web server with ImageMagick installed to transform images on-the-fly. (Besides Magick, Active Storage can process images with the libvips library, but it requires a more complex deployment configuration and, again, will be based on on-the-fly processing within a Ruby web server). But this combination caused performance issues with image conversion. Even after an image has been transformed, its loading wasn’t completely smooth for end users. Active Storage had to check every image availability in the CDN before serving it (the way to avoid this problem was introduced only in the Rails 6.1 release).

Using imgproxy within their infrastructure helped optimize performance and speed up all the image processing workloads. Setting up imgproxy was dead simple since the Retail Zipline engineering team already had the right infrastructure combination up and running, so imgproxy could grab images directly from the CDN without bothering their Rails server in the process. They only needed to add a few lines of configuration to their Docker setup in the development environment, thanks to the official imgproxy Docker images’ availability.

Then, the team mixed imgproxy, Active Storage, and their CDN to achieve backward compatibility with the current setup and feature toggling. Besides, they enhanced security to eliminate non-expirable URL incidents with data leaking through accidental image link sharing. For this purpose, they made CloudFront’s Trusted Signers feature to work with imgproxy and Rails and adopted signed images’ URLs for imgproxy.

Benefit 4: Like a pro

Customers’ needs have been designating the imgproxy’s evolution because in the project with eBay Bonus, the initial version provided all the features we needed. A while back, every new feature was once requested by a certain customer and became available and highly sought by others. For example, Photobucket needed an assembly for AWS Graviton2 and video thumbnail generation (that was the feature we wanted to release too, so that customer’s request encouraged us to do it faster). The other potential clients asked for PDF preview support, got it soon, and became our current customer.

The solution has had every feature we have needed, and information about possible future features usually has an associated public issue. We consider imgproxy to be a core component of Forem we hope to build around the long term.

imgproxy comes in an open source version and a Pro version that adds priority commercial support and several advanced features for image processing, fine-tuning, and security on top. The open source version is full-scale and can be a perfect fit for most customers: it provides image compression, format conversion, resizing and cropping, metadata and image background operations.

The most popular Pro-features are video thumbnail generation (turning every frame from a video file into a preview image), custom watermarks (adding a logo or copyright text to images where you can use different watermarks for different images specifying their URLs), and advanced JPEG optimization (squeezing your JPEGs to the last byte without compromising on quality).

We had some features that the OSS version was missing for a while in the backlog. Things like GIF to MP4, custom watermarking, or compatibility with MozJPEG were things we wanted to add to the image processing engine. Getting those features in imgproxy Pro and allowing us to focus more on the WordPress product was one of the main benefits of using it.

Typically, our customers try the open source version first, and if they need any advanced features, they come to us for one of these licenses. But lately, we are watching more and more businesses who want to start with the Pro option from the offset.

In fact, my next use of it will be for Speaker Deck, a little website that serves a lot of images. Seeing what a difference it made for Box Out has me quite excited to use it there. Recently, Evil Martians even added support for previewing PDFs to Pro, which could be fascinating in combination with (or in place of) our PDF processing on Speaker Deck.

Benefit 5: Obsessed with security

Security out of the box had become one of the key imgproxy advantages since the product was born in the situation when we weren’t satisfied with the existing solutions’ level of security. One of the first features we implemented in imgproxy was taken from our early work, a Ruby gem to protect Ruby applications from image bombs—carrierwave-bombshelter.

It came into being when we dealt with large projects with user-generated product photos and needed to determine the image sizes. The previous standard to measure a picture size was through decoding the image first—it was a memory- and disk space-consuming process. carrierwave-bombshelter resolved the issue, letting engineers adjust the image sizes according to their application’s facilities.

The gem helped discover one of the most common attack vectors—image bombs (the sub-class of decompression aka “zip bombs”) and protect customers from malicious attacks to overload the image processing workloads. For example, a small PNG bomb of a 5.8MB file can inflate to take 141.4GB of space. This kind of attack often camouflages deeper and more serious attacks aiming to steal data while you are busy combatting the first wave. imgproxy checks image type and its “real” dimensions and cancels processing if this image is fake or its size is too big. Since imgproxy does it at the moment of downloading, the “mean” image will not even be fully downloaded (there is a setting for that).

The product’s further evolution solved other security problems related to the mass processing of images. For instance, imgproxy protects image URLs with a cryptographic signature, so attackers cannot launch a denial-of-service attack by requesting multiple images resizes. The feature ensures that no one will be able to use your imgproxy server unless they know both your configuration credentials.

Steps for creating the signature are quite simple for any modern coding language and don’t require familiarity with cryptography. The project’s repository already contains a folder with example scripts for generating URLs in Ruby and Go; the web form we used for the demo is implemented in frontend JavaScript with the help of the jsSHA cryptographic library. This is an alternative way to assign your cryptographic calculations only to your own backend.

As an extra layer of security, imgproxy supports authorization by an HTTP header: you can insert an authorization token to a header of all incoming HTTP requests. It can hide your image’s origin to protect it from attackers while still processing images via a CDN or a caching server.

We use imgproxy on our ecommerce platform to deliver striking images of our beautiful bikes. imgproxy is fast, lean, secure, and focused on relevant features—the best choice to get the job done.

We are now focusing on two areas of the imgproxy evolution. The first is building customized end-to-end solutions that can include deploying and configuring, and even designing specific new features. The second is launching new features to fit the market: e.g., we’re now witnessing a surge in demand for images and videos in Machine Learning projects and neural networks, including object and face detection, smart cropping, and elements blurring.

imgproxy has been a breath of fresh air. It has become a critical component of the Forem infrastructure. It elegantly addresses the pitfalls of inflexible and inefficient stored image variations while giving us the peace-of-mind that it is open source.

Do you think imgproxy can be a good fit for your projects? Try it right away, go further with its Pro version, or let us know if you fancy a quick chat about building a custom solution based on the product, consulting your development team, adjusting imgproxy to your needs, and integrating it into your infrastructure.