Numerous Image Conversions Are Possible! The New Encoder Of StyleGAN! Pixel2Style2pixel
3 main points
✔️ Proposed Encoder "pSp" to embed real images into the latent space of StyleGAN
✔️ It can be applied to various image transformation tasks.
✔️ Utilizing the diversity of StyleGAN
Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation
written by Elad Richardson, Yuval Alaluf, Or Patashnik, Yotam Nitzan, Yaniv Azar, Stav Shapiro, Daniel Cohen-Or
(Submitted on 3 Aug 2020)
Comments: Published by arXiv
Subjects: Computer Vision and Pattern Recognition (cs.CV)
Introduction
Although StyleGAN is capable of generating high-quality images, it is difficult to embed real images into the latent space of StyleGAN, so several methods have been proposed. pixel2Style2pixel (pSp)" introduced here is an encoder that can directly estimate latent variables of StyleGAN from images. It can be applied to various image conversion tasks such as face image generation from segmentation maps, face frontalization, and super-resolution without changing the structure of pSp.
Structure of StyleGAN
First, let's take a quick look at the structure of StyleGAN. StyleGAN2 is used in the actual experiment. But the general structure is the same as StyleGAN, so we'll look at the overall picture of StyleGAN.
Cited from A style-based generator architecture for generative adversarial networks Figure 1.(b)
A 512-dimensional vector z sampled from the normal distribution is passed through the Mapping network to obtain the latent variable w which is also a 512-dimensional vector. This is then assigned to each of the locations in synthesis network A to obtain the final image output. In synthesis network A, we perform an affine transformation (the coefficients are the training parameters). If the resolution of the final output is 1024 x 1024, there are $2^2$ to $2^{10}$ blocks, each of which has two w's entered twice, so we have a total of 18 w inputs.
In StyleGAN, we input the same w 18 times, but we know that just inferring w by latent variable estimation does not work. So we use pSp to infer different 18 w's. We call this latent space W+.
Let's look at the actual structure of the pSp.
To read more,
Please register with AI-SCHOLAR.
ORCategories related to this article