Conditional Probability Models for Deep Image Compression¶
Recently, DNNs trained as image auto-encoders led to promising results in image compression. A key challenge in training such systems is to optimize the bitrate \(R\) of the latent image representation in the auto-encoder. After discretization, we measure the bitrate \(R\) with the entropy \(H\) of the resulting symbols. Since discretization is non-differentiable, this presents challenges for gradient-based optimization methods.
In this paper, the authors propose a new method based on leveraging context models as an entropy term in the optimization. Experiments show that this approach yields SOTA results when measured in MS-SSIM.
Method Overview¶
Given a set of training images \(\mathcal{X}\), we wish to learn a compression system which consists of an encoder, a quantizer, and a decoder. The encoder \(E: \mathbb{R}^d \to \mathbb{R}^m\) maps an image \(\mathbf{x}\) to a latent representation \(\mathbf{z} = E(\mathbf{x})\). The quantizer \(Q: \mathbb{R} \to \mathcal{C}\) discretizes the coordinates of \(\mathbf{z}\) to \(L = \lvert \mathcal{C} \rvert\) centers, obtaining \(\hat{\mathbf{z}}\) with \(\hat{z}_i = Q(z_i) \in \mathcal{C}\). The decoder \(D\) then forms the reconstructed image \(\hat{\mathbf{x}} = D(\hat{\mathbf{z}})\).
We want the encoded representation \(\hat{\mathbf{z}}\) to be compact, while at the same time we want the distortion \(d(\mathbf{x}, \hat{\mathbf{x}})\) to be small. This results in the rate-distortion trade-off
This system is realized by modeling \(E\) and \(D\) as the encoder and decoder of a CNN auto-encoder.
Quantization¶
Given centers \(\mathcal{C} = \{c_1, \dots, c_L\} \subset \mathbb{R}\), we use the nearest neighbor assignments to compute
but rely on (differentiable) soft quantization
to compute gradients during the backward loss.