Numerous Image Conversions Are Possible! The New Encoder Of StyleGAN! Pixel2Style2pixel

GAN (Hostile Generation Network) 14/09/2020

3 main points

✔️ Proposed Encoder "pSp" to embed real images into the latent space of StyleGAN
✔️ It can be applied to various image transformation tasks.
✔️ Utilizing the diversity of StyleGAN

Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
written by Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
(Submitted on 3 Aug 2020)
Comments: Published by arXiv
Subjects: Computer Vision and Pattern Recognition (cs.CV)

Code

Introduction

Although StyleGAN is capable of generating high-quality images, it is difficult to embed real images into the latent space of StyleGAN, so several methods have been proposed. pixel2Style2pixel (pSp)" introduced here is an encoder that can directly estimate latent variables of StyleGAN from images. It can be applied to various image conversion tasks such as face image generation from segmentation maps, face frontalization, and super-resolution without changing the structure of pSp.

Structure of StyleGAN

First, let's take a quick look at the structure of StyleGAN. StyleGAN2 is used in the actual experiment. But the general structure is the same as StyleGAN, so we'll look at the overall picture of StyleGAN.

Cited from A style-based generator architecture for generative adversarial networks Figure 1.(b)

A 512-dimensional vector z sampled from the normal distribution is passed through the Mapping network to obtain the latent variable w which is also a 512-dimensional vector. This is then assigned to each of the locations in synthesis network A to obtain the final image output. In synthesis network A, we perform an affine transformation (the coefficients are the training parameters). If the resolution of the final output is 1024 x 1024, there are $2^2$ to $2^{10}$ blocks, each of which has two w's entered twice, so we have a total of 18 w inputs.

In StyleGAN, we input the same w 18 times, but we know that just inferring w by latent variable estimation does not work. So we use pSp to infer different 18 w's. We call this latent space W+.

Let's look at the actual structure of the pSp.

To read more,

Please register with AI-SCHOLAR.

Categories related to this article

けやみぃ: I am a first-year student at the Faculty of Engineering, Kyoto University, and I am interested in image generation and image transformation using GAN.

Numerous Image Conversions Are Possible! The New Encoder Of StyleGAN! Pixel2Style2pixel

Introduction

Structure of StyleGAN

PCC-GAN: High-quality PET Image Reconstruction Using 3D Point-based Context-cluster GAN

PCC-GAN: High-quality PET Image Reconstruction Using 3D Point-based Context-cluster GAN

GT-GAN, Which Enables Time Series Data Synthesis By Unifying Even Missing Data Columns

GT-GAN, Which Enables Time Series Data Synthesis By Unifying Even Missing Data Columns

A New Gesture Generation GAN That Takes Into Account Human Emotions!

A New Gesture Generation GAN That Takes Into Account Human Emotions!

New GAN With Improved Editing Performance!

New GAN With Improved Editing Performance!

GAN Inversion With Transformer!

GAN Inversion With Transformer!

Test Accuracy Can Be Inferred From GAN Samples!

Test Accuracy Can Be Inferred From GAN Samples!