Generate A Minecraft World With GAN!

GAN (Hostile Generation Network) 27/09/2021

3 main points
✔️ Using GAN to generate Minecraft worlds
✔️ Overcome the problem of insufficient GPU memory with the word2vec approach
✔️ In the future, we may be able to generate from natural language

World-GAN: a Generative Model for Minecraft Worlds
written by Maren Awiszus, Frederik Schubert, Bodo Rosenhahn
(Submitted on 18 Jun 2021)
Comments: IEEE Conference on Games (CoG) 2021
Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Neural and Evolutionary Computing (cs.NE)

code：

The images used in this article are from the paper or created based on it.

Outline of Research

In this study, we proposed World-GAN, which is a data-driven approach to map generation in Minecraft using machine learning from a single sample. We overcame the two problems that arise when using TOAD-GAN, a 2D map generation approach using GAN, for map generation in Minecraft: "lack of GPU memory" and "need to define the priority of each token", and succeeded in generating 3D maps. We have succeeded in generating 3D maps.

related research

TOAD-GAN

TOAD-GAN is a 2D map generation model using GANs. In the paper, the performance of TOAD-GAN is tested by generating a map of Super Mario Bros. The architecture of TOAD-GAN is based on the architecture of SinGAN as shown in the figure above. The difference between TOAD-GAN and SinGAN is the way of downsampling, which is done by defining a priority for each token in the Super Mario Bros. map. In the paper, we define a priority table like the one below.

Downsampling is performed by replacing the token in the region of interest with the token with the highest priority.

For example, consider the region of interest shown in the figure above. In this case, this region contains two types of tokens: empty tokens and brick block tokens. If we want to downsample this region to a single token, the priority (Hierarchy) of the empty token is 0 and the priority (Hierarchy) of the brick block is 3, so the brick block with the higher priority is selected and the result of the downsampling is By downsampling in this way, TOAD-GAN successfully takes advantage of the architecture of SinGAN.

Problems in using TOAD-GAN for Minecraft

Since TOAD-GAN is a model that can generate maps for Super Mario Bros, it seems easy to use this in Minecraft. However, there are two major problems with this.

GPU memory shortage
Need to define a priority table

Insufficient GPU memory

In TOAD-GAN, each token is represented by One hot encoding. Since there are more tokens in Minecraft than in Super Mario Bros. In Minecraft, there are more tokens than in Super Mario Bros.

Another reason is that Minecraft is a 3D map, which means that there is more information on the map than in Super Mario Bros. In the case of Super Mario Bros., the map can be represented by the number of tokens x the length of the map x the width of the map, but Minecraft requires more GPU memory because it requires the depth of the map in addition to this.

It is necessary to define the priority table.

To perform downsampling in TOAD-GAN, a priority table needs to be defined. In the case of Super Mario Bros., the number of tokens is not so many, so it is not difficult to define the priority table by hand, but in the case of a game with many tokens like Minecraft, it is difficult to design this table by hand. applying TOAD-GAN to Minecraft would be a daunting task.

proposed method

block2vec

To solve the above problems, the authors used the idea of word2vec, a model of natural language processing that considers the matrix of initial weights as a variance vector for each word in a model trained on the task of predicting the context before and after a word input. Using this method, a natural language represented by one-hot encoding can compress information into a dense matrix as a variance vector. The authors applied this idea of word2vec to Minecraft tokens in the same way and named the method block2vec.

Learning block2vec

block2vec is trained in much the same way as word2vec. It takes a token as input and trains a model on the task of predicting the tokens around that token. The first weight matrix of the model that has been trained is treated as the variance vector for each token.

Why did block2vec solve the problem?

By using block2vec, the way to represent each token went from one hot encoding to a distributed vector. Why did this solve the problem?

Solving the GPU shortage

When we represent each token with one-hot encoding, we need to prepare the number of dimensions for each token type. However, since the number of dimensions of the weight matrix of the variance vector can be determined at the stage of training block2vec (changing the number of nodes in the middle layer), the dimensions of the variance vector can be adjusted according to the GPU memory capacity.

Need to define a priority table

Using block2vec, we can also skip the definition of the priority table. In this paper, downsampling is done using the bilinear method.

Specifically, we perform downsampling as shown in the figure above. First, we represent the blocks in the region of interest with a variance representation. Since the variance representation is a vector, it can be averaged, and the block with the variance representation most similar to the averaged vector is treated as the representative value of the region of interest.

Downsampling can be performed in this way without defining a priority table.

World-GAN

The architecture of World-GAN is shown in the figure below.

The architecture is almost the same as TOAD-GAN. The differences are the 3D convolution and the downsampling method.

experiment

We will now look at the validation results of the proposed method.

generated example

Some examples of world generation are shown below.

As you can see, natural generation is achieved while maintaining the atmosphere of the underlying sample. However, there were some cases where the generation was not successful.

In this example, we are using a village as a sample. You can see that the overall atmosphere is generally well generated, but the details of the buildings are not well generated.

Comparison with TOAD-GAN

Although TOAD-GAN is not originally a model for 3D map generation, the authors compare the proposed method, World-GAN, with the model that the authors extended to 3D maps without changing the idea of TOAD-GAN.

Comparing the two, it can be seen that they generate almost identically; World-GAN is superior to TOAD-GAN 3D in that it requires less GPU memory and does not require the definition of a priority table, so if they generate identically, World GAN is the better model if they are generated equally.

Next, we examine how well the pattern of the generated map matches that of the base sample using KL divergence. The more the pattern is consistent with the base sample, the smaller the KL divergence value becomes.

The smaller the value of the KL divergence, the better it represents the atmosphere of the base sample.

Finally, we evaluate the uniqueness of the generated maps. As a measure of this evaluation, we use the Levenshtein distance in this paper. The greater this distance, the greater the variability, and thus the greater the uniqueness of the generated map.

The results show that World-GAN has the most uniqueness of the generated maps.

Distributed representation of tokens with BERT

One of the most famous models in natural language processing is BERT. As BERT is a natural language processing model, it does not allow for the input of one hot encoding representation of tokens. Therefore, in this experiment we input a token description (e.g. mossy stone bricks) into a pre-trained BERT model and use the output of the final layer as a distributed representation of the token.

The example generated using the embedded representation with BERT is shown in the figure above, and it can be seen that the pattern is not as densely modeled as when using the block2vec distributed representation. This can be attributed to the high dimensionality of the distributed representation (the authors used a 768-dimensional distributed representation). However, we can see that the general structure of a stone ruin and the grass around it can be generated even though it has not been trained in Minecraft. Since the distributed representation was created using only textual descriptions, this experiment points us in the direction of future research to base World-GAN on natural language.

summary

This paper proposed World-GAN, a GAN that generates the Minecraft world. By changing the token representation method from one-hot encoding to a distributed representation using block2vec By changing the token representation from one-hot encoding to a distributed representation using block2vec, the TOAD-GAN architecture can be used in games such as Minecraft. Also, by using an embedded representation using BERT, we may be able to describe map features in natural language and generate maps accordingly in the future. There are other experiments in the paper that I couldn't show here, so check them out if you're interested.