Conditional Probability Models for Deep Image Compression

Year: Jan 2018
Authors: Fabian Mentzer, Eirikur Agustsson, Michael Tschannen, Radu Timofte, Luc Van Gool
Affiliations: ETH Zurich

Recently, DNNs trained as image auto-encoders led to promising results in image compression. A key challenge in training such systems is to optimize the bitrate \(R\) of the latent image representation in the auto-encoder. After discretization, we measure the bitrate \(R\) with the entropy \(H\) of the resulting symbols. Since discretization is non-differentiable, this presents challenges for gradient-based optimization methods.

In this paper, the authors propose a new method based on leveraging context models as an entropy term in the optimization. Experiments show that this approach yields SOTA results when measured in MS-SSIM.

Method Overview

Given a set of training images \(\mathcal{X}\), we wish to learn a compression system which consists of an encoder, a quantizer, and a decoder. The encoder \(E: \mathbb{R}^d \to \mathbb{R}^m\) maps an image \(\mathbf{x}\) to a latent representation \(\mathbf{z} = E(\mathbf{x})\). The quantizer \(Q: \mathbb{R} \to \mathcal{C}\) discretizes the coordinates of \(\mathbf{z}\) to \(L = \lvert \mathcal{C} \rvert\) centers, obtaining \(\hat{\mathbf{z}}\) with \(\hat{z}_i = Q(z_i) \in \mathcal{C}\). The decoder \(D\) then forms the reconstructed image \(\hat{\mathbf{x}} = D(\hat{\mathbf{z}})\).

We want the encoded representation \(\hat{\mathbf{z}}\) to be compact, while at the same time we want the distortion \(d(\mathbf{x}, \hat{\mathbf{x}})\) to be small. This results in the rate-distortion trade-off

\[d(\mathbf{x}, \hat{\mathbf{x}}) + \beta H(\hat{\mathbf{z}})\]

This system is realized by modeling \(E\) and \(D\) as the encoder and decoder of a CNN auto-encoder.

Quantization

Given centers \(\mathcal{C} = \{c_1, \dots, c_L\} \subset \mathbb{R}\), we use the nearest neighbor assignments to compute

\[\hat{z}_i = Q(z_i) = \text{argmin}_j \lVert z_i - c_j \rVert\]

but rely on (differentiable) soft quantization

\[\tilde{z}_i = \sum_{j=1}^L \frac{\exp(-\sigma \lVert z_i - c_j \rVert)}{\sum_{l=1}^L \exp(-\sigma \lVert z_i - c_j \rVert)}c_j\]

to compute gradients during the backward loss.