Catch up on the latest AI articles

Numerous Image Conversions Are Possible! The New Encoder Of StyleGAN! Pixel2Style2pixel

Numerous Image Conversions Are Possible! The New Encoder Of StyleGAN! Pixel2Style2pixel

GAN (Hostile Generation Network)

3 main points

✔️ Proposed Encoder "pSp" to embed real images into the latent space of StyleGAN
✔️ It can be applied to various image transformation tasks.
✔️ Utilizing the diversity of StyleGAN


Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
written by Elad RichardsonYuval AlalufOr PatashnikYotam NitzanYaniv AzarStav ShapiroDaniel Cohen-Or
(Submitted on 3 Aug 2020)

Comments: Published by arXiv
Subjects:  Computer Vision and Pattern Recognition (cs.CV)



Although StyleGAN is capable of generating high-quality images, it is difficult to embed real images into the latent space of StyleGAN, so several methods have been proposed. pixel2Style2pixel (pSp)" introduced here is an encoder that can directly estimate latent variables of StyleGAN from images. It can be applied to various image conversion tasks such as face image generation from segmentation maps, face frontalization, and super-resolution without changing the structure of pSp.

Structure of StyleGAN

First, let's take a quick look at the structure of StyleGAN. StyleGAN2 is used in the actual experiment. But the general structure is the same as StyleGAN, so we'll look at the overall picture of StyleGAN.

Cited from A style-based generator architecture for generative adversarial networks Figure 1.(b)

A 512-dimensional vector z sampled from the normal distribution is passed through the Mapping network to obtain the latent variable w which is also a 512-dimensional vector. This is then assigned to each of the locations in synthesis network A to obtain the final image output. In synthesis network A, we perform an affine transformation (the coefficients are the training parameters). If the resolution of the final output is 1024 x 1024, there are $2^2$ to $2^{10}$ blocks, each of which has two w's entered twice, so we have a total of 18 w inputs.

In StyleGAN, we input the same w 18 times, but we know that just inferring w by latent variable estimation does not work. So we use pSp to infer different  18 w's. We call this latent space W+.

Let's look at the actual structure of the pSp.

To read more,

Please register with AI-SCHOLAR.

Sign up for free in 1 minute

けやみぃ avatar
I am a first-year student at the Faculty of Engineering, Kyoto University, and I am interested in image generation and image transformation using GAN.

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us