Pre-trained GAN Model To Super-resolution Technology

GAN (Hostile Generation Network) 16/11/2021

3 main points
✔️ Super-resolution using a pre-trained GAN model
✔️ Demonstrates good quality results with 64x super-resolution
✔️ Demonstrates the potential of pre-trained GAN models to be applied to a variety of tasks

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution
written by Kelvin C.K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, Chen Change Loy
(Submitted on 1 Dec 2020)
Comments: Published on arxiv.
Subjects: Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

first of all

GANs are not only used for image generation and image editing but there is also research (https://ai-scholar.tech/articles/gan/fewshotpartsegmentation) that diverts their internal representation to other computer vision tasks.

In this article, we introduce our work on super-resolution at high magnifications (8x to 64x), which is usually difficult, by using the knowledge of pre-trained GAN models. The proposed method (GLEAN) successfully achieves the following high-magnification super-resolution, for example.

The proposed method (GLEAN)

The architecture of the proposed method, Generative LatEnt bANk (GLEAN), is represented in the following figure.

This figure shows an example of super-resolution in which the input image is 32x32 and the output image is 256x256.

About Encoder

First, we extract the feature $f_0$ using RRDBNet for the input low-resolution (LR) image (corresponding to $E_0$ in Fig.)

Next, we iterate the convolution process on the features to find the features with lower resolution.

$f_i=E_i(f_{i-1}), i \in {1,... ,N}$

Here, $E_i$ represents the stack of stride 2/stride 1 convolution. Based on the features obtained at this time, we obtain a matrix $C$ with the StyleGAN latent vector $c_i$ as a column.

$C=E_{N+1}(f_N)$

These feature and latent vectors are fed to the Generative Latent Bank, which is based on the pre-trained StyleGAN.

About Generative Latent Bank

To obtain prior knowledge about the image from the pre-trained StyleGAN, we use the following three modifications as Genetic Latent Bank.

For each block $S_0,. ,S_{K-1}$, give each block $S_i$ a single latent vector $c_i$ as input.
To condition additional features of the image in addition to the latent vector, we use additional convolution to obtain the following features.

Instead of generating a high-resolution image directly from the Generator in StyleGAN, we pass the feature $g_i$ from the Latent Bank and the feature from the encoder to the decoder to better fuse the two features.

In general, the goal of the Generative Latent Bank is to acquire useful knowledge for super-resolution by introducing minimal modifications and additional convolution layers to StyleGAN.

About Decoder

The decoder is defined by the following equation for a 3x3 convolution $D_i$ and its output $d_i$.

We use the standard L2 loss, Perceptual loss, and Adversarial loss for the loss at training. The loss setting at learning time is similar to that of the existing work, ESRGAN, with the main difference being that we introduce a pre-trained StyleGAN.

experimental results

In our experiments, we use pre-trained StyleGAN or StyleGAN2.

qualitative comparison

To begin with, the results of comparison with existing methods in 16x super-resolution are as follows.

In general, the existing methods fail in terms of identity preservation, artifacts, texture, and detail, while the proposed method, GLEAN, succeeds in producing good-quality images. The results for further increasing the magnification factor are as follows.

Even in the challenging setting of 64x super-resolution, we have succeeded in producing a good quality image similar to Ground Truth.

Robustness to poses and content

The proposed method can produce good images even if the generated images are not limited to frontal human images. This is shown in the following figure.

While the existing method, PULSE, fails to generate non-human images and non-frontal images, the proposed method shows good results. Also, the results when applied to non-human animals and landscapes are as follows.

Even in this case, the proposed method shows good results, indicating that it is robust to content and pose.

quantitative comparison

For a quantitative comparison, we computed the cosine similarity to Ground Truth on the ArcFace embedding space for 100 images extracted from CelebA-HQ and the results are presented in the table below.

The results for the different categories are as follows (average PSNR/LPIPS for 100 images is measured).

The proposed method shows the best results for all the categories except Bedroom, which shows its superiority over the existing methods.

Ablation Research

About the encoder

In the proposed method, the Latent Bank is fed with multi-resolution features generated from the encoder.

The results of decreasing the number of features are shown below.

The more features are given, the higher the fidelity and quality of the generated image to the original image, which shows the effectiveness of the proposed method.

About Latent Bank

Next, the results of decreasing the number of features used from the Latent Bank are shown below.

If the information is not available from the pre-trained GAN model, the network has to generate both the structure and texture of the image at the same time, and it does not perform well on either.

On the other hand, receiving information about these structures and textures from the Latent Bank allows us to obtain better results for both.

About the decoder

The results without using the decoder are as follows.

If you didn't use a decoder (w/o decoder), you'll notice that even though the overall image is not uncomfortable, there are some unpleasant artifacts when you zoom in on the image.

Comparison with reference-based methods

The results of comparing the proposed method with reference-based methods for super-resolution, SRNTT, and DFDNet, are as follows.

Existing methods improve the quality of image restoration by using a dictionary of images, but they do not work well on parts of the image that are not in the dictionary (e.g., skin and hair), and they cannot reproduce fine textures.

On the other hand, the proposed method succeeds in super-resolution with better quality than the existing methods without complicated procedures such as searching images in the dictionary.

Application to image retouching

The application of the proposed method other than super-resolution is image retouching. This is shown in the following figure.

In this figure, we successfully remove unnatural artifacts by using the proposed method on an image containing blurred regions (Retouched).

Thus, the proposed method has the potential to be applied to tasks other than super-resolution.

summary

The proposed method, GLEAN, shows good results for super-resolution up to 64 times by using pre-trained GAN models such as StyleGAN.

This has the potential to be extended to a variety of image tasks, such as image denoising, and this research shows the potential for transferring pre-trained GAN models to other tasks.