Developing the Antman Project

Cloud storage services for media have greatly risen in popularity over the past couple of years. Services such as Google Photos and NAVER nCloud are such examples. LINE also provides its users with a service named LINE Album, a service that lets users permanently store and view their photos on a cloud server. LINE Album is one of the most used features on LINE app, and it enters its sixth year in service since September 2013. As with any service that is heavily used by a large number of users, the amount of data that is stored on our servers is astronomically large as well.

All media data on LINE, such as photos and videos, are managed by using a LINE Media Platform media storage solution called OBS (Object Storage). We use OBS to manage all media data across LINE and all LINE Family Services, which amounts to 100 PB (petabytes) of storage space used. Approximately 30 PB of that storage is used up by LINE Album. As all data must be stored on the server, the storage server maintenance cost alone is not something to be scoffed at.

Costs are one thing, but the larger problem is securing physical space for the servers at our data centers. While we should be happy that our service is actively used by many of our users, our policy of permanent storage means that we must hold on to any and all media data that has been stored on our servers since the beginning. It could be compared to housing problems, where there are not enough homes to support the growing population.

Our first storage optimization strategy

To tell the truth, we had out concerns about the growing amount of data on LINE Album for a while. And we made efforts to optimize our storage solutions before. With support from the OBS team, we analysed user media consumption patterns on LINE Album and observed that there was a long tail pattern

According to the OBS team’s findings, users looked at newly uploaded photos often, but almost never went back to look at older ones. In other words, old photos were taking up storage space without even being used. However, that did not mean that we could just delete that user-uploaded data under our own discretion.

The first thing that we tried was separating storage space depending on how recently a photo or video was uploaded. Newer data that was frequently requested for were put on servers with high-performance SSDs, while older data was put on high-density storage servers with low-performance, large-storage SATA drives. We took to calling this approach “storage layering.” For better stability, most of our storage drives on OBS have two backups, and our high-density drives have one backup each. Using storage layering, we were able to somewhat work around the lack of space at our data centers.

However, storage layering only gave us minimal improvement in data storage efficiency. As time went by, LINE Album encroached on the physical space of our data centers once again, forcing us to find a more effective method of storage optimization. This was what led to the development of the “Antman” project.

Let’s try using HEIF

The role of Antman is very simple: it takes hardly accessed JPEG files from the high-density “cold” storage, and converts them into HEIF (High Efficiency Image File Format) files. HEIF is an image format that boasts approximately 2 times the compression efficiency of JPEG files. For those not in the know, allow me to briefly explain about JPEG and HEIF.

Since being introduced in 1994, the JPEG (Joint Picture Expert Group) format has become the most widely used image format around the world. In fact, most video codecs in use today are also improvements based on the JPEG algorithm; making JPEG the precursor to video compression. Due to JPEG’s universal use, most modern mobile processors come equipped with a hardware JPEG codec, and most software codecs are sufficiently optimized as well. However, with 25 years under its belt, JPEG is becoming a bit long in the tooth with better-performing image formats aiming to take its place. HEIF (High-Efficiency Image File Format) is one of those image formats.

HEIF is an image format based on HEVC (aka H.265), publicly released in 2013. HEVC is one of the most popular video codecs today, boasting up to twice the compression performance compared to H.264 depending on the video that you’re working with. HEIF acts as a container that wraps around the video data, and the actual video data is compressed using HEVC. Still images can be made from a single key frame from an HEVC video, and animated images can use the same compression method used on HEVC videos. As HEIF supports alpha channels for adjusting transparency, it’s able to replace most of the widely used image formats available. Furthermore, HEIF also has official support for metadata formats such as Exif and XMP. All in all, HEIF was the best candidate for efficiently storing the back catalog of LINE Album images stored in JPEGs.

Let’s try using GPUs

Because of its more efficient compression performance, HEVC requires large numbers of calculations. Depending on the implementation method, the number of calculations required compared to JPEG could be a hundreds of times more. When we started developing Antman, there were already 20 PB of JPEG files on LINE Album’s storage, and we needed a lot of servers to convert all these files into HEIF in a short amount of time. We found a solution to this problem more easily than we thought; which was using equipment that would fit the domain. We started looking at servers equipped with GPUs. With deep learning applied to many services in the recent years, some people in the field tend to think that GPU-equipped servers are only useful for deep learning. However, GPUs are not only capable of hardware accelerating deep learning, but they can also be used with image and video hardware codecs. Fortunately, GPUs also support the JPEG and HEVC codecs that we needed to use, so we chose to adopt GPU-equipped servers without any hesitation.

All GPU codec interfaces are written in either C or C++. We developed our own JPEG ↔ HEIF conversion library in C and named it “Pym.” Just like how we named our image size conversion solution “Antman” after the Marvel superhero who can freely change the size of his own body, we’ve named our JPEG ↔ HEIF conversion library “Pym” after “Pym Particles”: the source of Antman’s power. Antman was developed as a JAVA Spring-based web server that provides JPEG ↔ HEIF conversion API. The API is then served to OBS. To use Pym’s capabilities with Antman, all of Pym’s interfaces are integrated through JNA (JAVA Native Access). Antman not only processes new JPEGs being stored, but all the JPEGs that accumulated over the 6 years LINE Album has been in service as well. Even with the GPUs, the high-density storage had trouble keeping up with the improved processing load, so we had to segment the files by the year they were created and then queue them for processing.

Even though Antman has successfully integrated into LINE Album today, the Antman project went through many trials and tribulations over its development period. In the next part of the post I’d like to talk about some of these problems and how we solved them.

What should we do about devices that don’t support HEIF?

HEIF is a relatively new image format, and not all devices in use today support it. iOS has supported HEIF (with their own customized version called HEIC that supports distributed processing) since iOS 11, and Android supported the format since version 9.0. Furthermore, most PCs running Windows do not support HEIF. As a workaround, if a device that doesn’t support HEIF sends a download request, Antman converts and sends the requested HEIF file into JPEG in real time. However, there is a problem.

“How much should we compress this JPEG?”

If we set the quality of the restored JPEG to “very low,” the small file size wouldn’t cause much traffic but the image quality would be poor; leading to a poor user experience. On the other hand, if we set the quality too high, we would be able to keep the quality of the image but it would cause unnecessarily high traffic costs.

JPEG and HEIF both use lossy compression methods, which lead to a loss in quality each time the image is converted to one or the other. The so-called “potato” images that you might have seen on the internet are mostly images that have lost their quality due to being repeatedly shared, screencapped, or converted. Just like how you can’t restore a crumpled up piece of paper back to its original form, an image file that was converted using a lossy compression method can never have better image quality than its original file. This is what I meant by “unnecessarily high traffic costs.”

Creating DQT backups of original JPEGs

So how do you determine the amount of JPEG compression on an image being converted from HEIF? We came up with a simple and straightforward hypothesis. If we could record the original JPEG file’s compression settings and recreate that when converting the HEIF back to JPEG, we could find the perfect balance between file size and image quality. The single piece of data that determines the amount of compression on a JPEG is the de-quantization matrix. A de-quantization matrix is comprised of 64 8-bit or 16-bit values matched on a 8×8 coefficient block. You could use one matrix each for luminance and chrominance, or only one at a time. JPEGs store which components these matrices are used on, in a header known as “DQT.” We chose to keep a backup of this DQT data inside the converted HEIF files, so that we can reference the data when we convert the HEIF back into a JPEG. We’ve collected this data and developed our format which is the Pym header.

Pym headers begin with the 3-byte identifier “PYM,” and includes the following information.

  • SAR (Sample Aspect Ratio)
    • SAR is one of the additional data stored on JFIF (JPEG File Interchange Format) files, which is an extension of the JPEG format. It allows pixels to be stretched into ratios other than 1:1. For example, an 1080×1080 image would appear in a square shape when displayed. However, if the SAR is set to 16:9, the square image would be stretched and displayed as a 16:9 image.
  • Original Image Resolution
    • Since resolutions are already defined by HEVC, it may not be necessary to store it separately. However unlike JPEGs which assign a resolution to each pixel, HEVC can assign resolutions to 2 pixels at a time only. Because of this, if you try to convert a JPEG with an odd number resolution to HEIF and then back to JPEG, it won’t be able to keep its original resolution. To get around this issue, we store the resolution of the original JPEG. For example, if you were to convert a 1920×1079 JPEG into HEIF, it would become a 1920×1080 image. Converting this HEIF file back to JPEG would produce a 1920×1080 JPEG. To be able to maintain the original resolution, you need to store the original resolution of 1920×1079.
  • DQT
    • Pym stores the DQT used for luminance and other places in a JPEG. Everything excluding the JPEG’s DQT marker and DQT header length is used as-is. The topmost 4 bits determines the precision of the Dequantization table, with zeroes and ones each representing 8-bit or 16-bit precision. The next 4 bits is the unique number assigned to the DQT, which is referenced when assigning the DQT each image component will be using. And then the actual dequantization table is added. With 64 coefficients in total, an 8-bit precision results in 64 bytes, and a 16-bit precision results in 128 bytes of data.
  • DQT for chroma (optional)
    • JPEGs can use any color format as long as the number of components is under 4. However, Antman requires that the original color format is YUV. In JPEGs, all components of Y, U, and V sometimes use a single DQT, but in some cases a separate DQT is used for U and V. If the original file contains a separate DQT for U and V, that data will be stored.

A Pym header is included into an HEIF with the data above. HEIF was created by customizing the MP4 container used for video. By default, MP4 has a box structure, with HEIF having additional boxes for storing data required for image storage. Among the many boxes used by MP4, there is a “free” box that acts as padding between two boxes. The “free” box is used to preallocate space for boxes that can change in size when creating an MP4, and doesn’t affect the image in any way. We’ve added the Pym header to this “free” box and appended it to the end of the HEIF file. The stored Pym header is then extracted when converting HEIF to JPEG, and is then used as the settings for JPEG compression.

Proving the hypothesis

If you recall, our hypothesis was that you could use the original JPEG’s DQT to find the optimal balance between file size and image quality when converting an HEIF file to JPEG. To prove this hypothesis, we ran a number of tests on selected samples. The test methods were as follows.

  • Group 1: Used DQT backup as-is when converting HEIF → JPEG
  • Group 2: Used DQT/2 when converting HEIF → JPEG
    • As DQT is the step size of quantization, a smaller number results in better image quality.

The blue line on the graph below shows the difference in file size between groups 1 and 2, and you can see that group 1 had a 43% reduction in file size compared to group 2. The red bars represent the image quality difference between the original and group 1 and 2. We used SSIM (Structural Similarity) as our IQA (Image Quality Assessment) algorithm, and found that most samples were of similar quality. Some samples showed an SSIM delta of 0.01, but when we consider 1 as the maximum value, the difference was miniscule. In conclusion, we’ve found that reducing the compression amount when converting an HEIF back to JPEG only increases the file size and does not improve the image quality.

To sum up, Antman provides JPEG conversion for devices that don’t support HEIF, and uses the original JPEG’s DQT to find the optimal balance between image quality and file size for the HEIF to JPEG conversion.

Have we properly converted JPEGs to HEIF?

Above, I explained our real-time JPEG conversion idea for devices that don’t support HEIF. We’re now only left with the task of converting JPEGs to HEIF. However, converting JPEGs to HEIF isn’t the end. When we’re done converting all JPEGs, we must delete the remaining original JPEG files to actually see our storage space freed up. Only there’s one problem: if we delete the original JPEG we have no way of reverting back if something goes wrong with the HEIF. So we had to come up with a way to verify that the HEIF actually retains the original quality of the JPEG. When we say “verify,” we don’t mean that we’re checking if the image looks “good enough.” We verify if the converted HEIF retains even the lossiness of the original JPEG. While we would be more than willing to check each image with the human eye, there are at most 1,000 JPEGs being uploaded each second to LINE Album. Not only would it be impossible to perform a manual check on each image, we shouldn’t be doing that in the first place since the images are all personal user data. And that’s why we had to develop a technology that can verify if the converted image retains the quality of the original.

In order to check our conversion results, we compared the original JPEGs to their converted HEIF files. Following the typical IQA method, we compared the RAW images decoded from the respective JPEG and HEIF files. However, the hardware JPEG/HEVC codecs in the GPUs used by Antman are not 100% stable. In other words, even if an image compressed with Antman’s hardware HEVC encoder can be read on Antman’s hardware HEVC decoder, it doesn’t mean that any HEVC decoder will be able to do the same. So what we decided to do was to use the CPU resources left over thanks to the overhead provided by the GPU, on running reference codecs. We ran common software JPEG, HEVC decoders on the CPU and then included the decoded RAW images in our comparison groups. The two RAW images decoded using software and hardware codecs from JPEGs converted back from HEIF were also included in the comparison groups. In summary, we set up a total of 6 RAW image comparison groups, and decided to choose the group with the lowest IQA value as our standard.

We’ve spent a lot of time choosing which IQA to use on the comparison groups. Since we knew that there’s no turning back once the original JPEGs are deleted, we tried many approaches and changed our methods several times.

PSNR & SSIM

The first IQA methods we chose were the most basic; PSNR (Peak Signal to Noise Ratio) and SSIM. PSNR is a traditional IQA method that is based on the simple deltas between each pixel. On the other hand, SSIM is an algorithm with the HVS (Human Visual System) model in mind. Unlike PSNR’s simle approach, SSIM focuses on the relationship between the signal changes in small blocks inside an image. After taking a look at both algorithms, we found some pros and cons to each. In the end, we decided on adopting PSNR for Antman. Considering what we were trying to do with Antman, PSNR was the more optimal choice. The goal of Antman is to seamlessly convert JPEGs to HEIF so that users wouldn’t notice any difference in image quality, and that includes keeping any lossiness of the original JPEG. And that is why we decided against using a complex HVS-based IQA method. However, we ran into an unanticipated problem while running tests using PSNR: PSNR is calculated using an average value of the entire image.

Let’s take a look at the sample image below; which has a complex pattern on a simple background.

The PSNR delta between the two images above is approximately 50dB, which means that the two images are basically identical in layman’s terms. Even with a lower bitrate, a simple background doesn’t give off complex signals which in turn causes almost no lossiness. However, in the highlighted area there is a complex signal, and you can see that there is some lossiness there. In conclusion, PSNR can be misleading due to it being based on average values, and cannot properly detect lossiness in small areas.

Grid PSNR

To detect lossiness in small areas more easily, we tried using PSNR in a grid formation.

We put a grid over the image and calculated the PSNR(N,M) on each tile, and then used the lowest value as the representative value. With this grid PSNR method, we were somewhat able to find localized lossiness, but this method also gave us some problems to think about as well. One problem was that the representative PSNR value fluctuated too heavily depending on the size of the tiles. Another problem was that if the main source of lossiness existed on the borders of two tiles, the lossiness is divided and can’t be detected properly. It was ultimately impossible to find a one-size-fits-all tile structure.

Sliding-window PSNR

We came to two conclusions based on the problems we found while using Grid PSNR.

  • It’s easier to find local lossiness if the tile size is smaller.
  • Lossiness located where two tiles meet is difficult to detect, so having a smaller tile size helps by having the lossiness take up more space on a single tile.

Based on our findings we decided to adopt a “sliding window” PSNR method. Sliding-window PSNR proved to be quite useful when detecting localized lossy areas. With PSNR measurements as tight as a coffee filter, we were able to find all lossy areas in an image.

 

It was then that we ran into another problem. Using a small tile to comb through the entire image required a very large number of PSNR calculations per image. For example, fully calculating a 1024×1024 image with a 8×8 tile with sliding-window PSNR would require 1,034,289 PSNR calculations. Reducing the number of pixels the tile would shift each time can reduce the number of calculations, but you would still need multiple runs of PSNR checks. Furthermore, Pym PSNR calculations require the CUDA (Compute Unified Device Architecture) function of the GPU, and CUDA functions require some time for loading and releasing. With all this, it was very time-consuming to finish PSNR calculations, and to unnecessarily load and release the PSNR calculation function 1,034,289 times.

Sliding-window MSE

We pressed on and customized our sliding-window PSNR solution into the sliding-window MSE (Mean Squared Error) method, which had more optimized GPU usage. What you see below is the final version of the IQA algorithm applied to Antman.

If you look at the PSNR formula, you can see that it is merely a log scale conversion of MSE. In other words, a PSNR threshold can be converted into an MSE threshold. Since we configured the PSNR threshold through numerous tests, we were able to adapt it to MSE as well. Because of that, we were able to implement a way to acquire the MSE from an image using the sliding-window method with relative ease.

First, we extract the absolute difference between the images from each comparison group. Then, by calculating the dot product between the extracted values, we acquire a result that is identical to the term results within ∑ in the MSE formula. Lastly, by using a K x K kernel on and applying a 2-D Mean filter, we were able to get the formula for Sliding-window MSE.

Optimizing compression

After developing an algorithm for detecting lossiness, we finally had time to actively look into ways to improve compression. The method we came up with was simple: we tried using different quality factors (larger numbers mean lower quality) in the HEVC encoder and repeated the HEIF conversion and IQA process. We then chose the highest quality factor that didn’t cause lossiness as our final HEIF result.

Antman then statisticizes the selected quality factor, uses the selected quality factor as the initial quality factor for the next image, minimizing the number of attempts required. Then the optimal quality factor is found by gradually changing the initial quality factor. With this approach, we were able to surpass the initial storage reduction goal of 50% and achieve a 60% reduction.

Infrastructure costs of Antman=0

We adopted GPUs to handle the large calculations that HEVC requires. However, the GPUs that are used on these servers almost cost as much as a typical web server itself. Even if we manage to save costs on storage, investing in a hefty number of GPU-equipped servers wouldn’t really help us cut our infrastructure costs. So we decided to make use of any idle resources as much as we could. We were already using GPUs on the video transcoders (or as we call it, Licoder) on OBS, and OBS saw a massive reduction in infrastructure costs after using GPUs for transcoding. (By the way, this will be a topic for an upcoming blog post.)

LINE has to handle about 3 times the usual traffic at the start of every year , and to make sure everything goes smoothly, we need equipment that can handle the increased load. Because of this, Licoder needs to have GPU-equipped servers ready even if they’re not usually needed. While these high-spec server come in handy when a hotly discussed topic online increases traffic, having these expensive GPUs just in case is a huge waste. That’s why we decided to use these GPUs for Antman. The GPUs now handle both Licoder and Antman instances. Antman continuously monitors Licoder requests and only enables its instance when the Licoder instance is idle.

By using Licoder GPUs that would’ve otherwise been wasted without being used, Antman was able to save infrastructure costs, while also not requiring any additional costs of its own. As an added bonus, we could smile happily knowing that all of our GPUs were working around the clock and not being wasted away.

Applying Antman on LINE Album!

In the beginning half of 2019, LINE Album storage showed a steady increase in usage by 0.8 PB each month. If it weren’t for Antman, we estimate that we would’ve needed at least 9 PB more, resulting in 34 PB in total. Antman has been in use since the end of May, 2019 on LINE Album, and we’ve been deleting the original JPEGs starting in July. Deleting these originals resulted in the steadily decreasing storage usage curve that you see above. The total usage by the end of December, 2019 was 24 PB. We’ve managed to reduce 10 PB so far, and Antman is still processing six years worth of data as of this moment. We predict that this process will continue until mid-2020, which would be a full year after Antman was first put into action.

Closing words

Antman was made possible by our team members coming up with ideas while sharing a beer. What started off as a simple idea to free up storage became much more over time. Dealing with a high-risk project that would permanently alter user-created data came with many headaches, and many sleepless nights were spent even after we implemented Antman to LINE Album just because we thought we might have missed something crucial. We felt very accomplished in the end, knowing that we were freeing up storage everyday without a hitch. We even jokingly ask ourselves if we should go for another drink when we run out of good ideas.

Antman is only the beginning for many of the media format transformations in store for all LINE services. LINE services are spread across multiple countries and various devices, and keeping all of our platforms in sync is no easy task. We only wish that the introduction of HEIF through Antman acts as a stepping stone for more improvements to come.

Lastly, I would like to thank Seunghoon Baek for sticking with me through thick and thin while developing Antman. And I would also like to give my thanks to Woochul Shin for integrating Antman into OBS.