Hairstyle Conversion With GAN Inversion! "LOHO.

GAN (Hostile Generation Network) 08/09/2021

3 main points
✔️ Latent Optimization of Hairstyles via Orthogonalization (LOHO), an optimization-based hairstyle transformation method using GAN Inversion
✔️ Improves the quality of the resulting image by performing optimization in two stages
✔️ Achieves higher FID score than existing hairstyle conversion methods

LOHO: Latent Optimization of Hairstyles via Orthogonalization
written by Rohit Saha,Brendan Duke,Florian Shkurti,Graham W. Taylor,Parham Aarabi
(Submitted on 5 Mar 2021 (v1), last revised 10 Mar 2021 (this version, v2))
Comments: Accepted by CVPR 2021
Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)

code：

The images used in this article are from the paper or created based on it.

first of all

Several hair transformation methods using deep learning have been studied in the past, but there was a problem that the realism of the resulting image would be degraded if the structure of the source hair and the destination hair were different.

The Latent Optimization of Hairstyles via Orthogonalization (LOHO ) presented in this article generates natural images by using the optimization approach of GAN Inversion.

GAN Inversion

GAN Inversion is a technique for embedding an image into the latent space of a pre-trained GAN model, as opposed to generating an image from latent variables. If this can be done, various operations in the latent space of GAN can be performed on arbitrary images, and images can be easily edited. Another advantage of this technique is that the resulting image is of high quality because the GAN has been pre-trained. There are two approaches to embedding in the latent space, one is using the encoder and the other is the optimization approach, in our method we are using an optimization approach. The following images are the resultant images from the paper.

The leftmost image is the original image, and the image with the converted hairstyle is on the right.

The small images to the left of the transformed image are, from top to bottom, a reference image for the appearance and style of the hairstyle, a reference image for the structure, and a mask image for the shape (the same person as in the structure reference image). You can see in the resulting image that these attributes are reflected and the image is still natural.

proposed method

The following diagram shows the overall overview.

手法の概要

I 1 in the image of I 2 The shape and structure of the hairstyle of I 3 will be optimized to produce an image that reflects the appearance and style of the hairstyle of I

Next, we will look at each loss function.

loss function

The above is a loss function for identity reconstruction; LPIPS is suitable for identity reconstruction because it is an evaluation metric based on judgments of human similarity.

Next is the loss function for reconstructing the shape and structure of the hairstyle. If we use the mask of the hairstyle shape as it is, the image will collapse when the source and destination hairstyle shapes are very different, so we use a slightly eroded mask.

This is a loss function for transferring the appearance of a haircut. The appearance is the color of the hair, and the appearance is represented by using the features in the shallowest layer of VGG. For a detailed description of A, please refer to the paper.

This is a loss function for style transformation of the hair. Style refers to the waviness and shading of the hair. The loss function is based on the Gram matrix, which is often used in style transformation.

Finally, the loss function is used to regularize the noise map. It is introduced to avoid optimizing the noise information.

Two-stage optimization

We optimize using the loss function we have just seen, but if we optimize all of them at the same time I 2 and I 3 are not synthesized properly due to conflicting hairstyle information. Therefore, we divided the optimization into two stages: stage 1 The first stage we reconstructed only the shape and structure of the identity and hairstyle. Then, in the second stage, we reconstruct the loss function for the appearance and style of the hairstyle. Here, the first stage The loss function used in the first stage is also used in the second stage so that the information in the first stage is maintained.

Straightening of the gradient

L r captures all attributes of the hair, not just shape, and structure. Therefore, during the second stage of optimization, the appearance and style information is I 3 information. To avoid this, we use the L r in the second step by projecting the gradients of shape and structure onto a vector subspace orthogonal to the gradients of appearance and style I 2 so that it does not reflect the appearance and style information of

The above figure shows a comparison between the case without (second row from the right) and with (first row from the right) the two-step optimization and gradient orthogonalization. It can be seen that the hair attributes are accurately reflected by the optimization, and the image is synthesized into a natural one.

Comparison with existing methods

The results of comparing this method with MichiGAN, which is a SOTA model of hairstyle transformation method, and FID score are shown below.

We can see that it achieves a lower FID score than MichiGAN. In addition, the LOHO-HF results are based on an image where only the hair and face areas are masked out. The even lower scores indicate that the quality of the synthesis of the hair and face regions is high. The following figure compares the output images of MichiGAN and LOHO. (The second row from the right is MichiGAN and the first row is LOHO)

The results show that LOHO responds well to the deformation of the hair shape.

Another important measure is the quality of identity reconstruction. The results of comparing our method with two state-of-the-art image embedding methods in terms of PSNR and SSIM scores are shown below.

From the above results, we can see that the reconstruction of identities is of higher quality than Image2StyleGAN++ (I2S++).

I2S also shows how the distance calculated between the optimized latent variable and the average face latent variable relates to the quality of the synthesized image; I2S shows that the distance between valid human face latent variables is [30.6, 40.5] and LOHO is within that range.

summary

In this article, we introduced LOHO, a hair transformation method adopted in ICCV2021.

In this method, various devices were devised to successfully convert hairstyles even when the shape of the source hairstyle is different from the shape of the destination hairstyle, and high-quality results were obtained. However, when converting hairstyles with large deviations, unnatural images may be produced. Another disadvantage of LOHO is that it is an optimization method, so it takes a long time during inference.

How to solve these shortcomings is expected to be the subject of future research.