Innovative Mobile UI Design Using "UI Grammer" With Large-scale Language Modeling!
3 main points
✔️ The utility of large-scale language models in generating UI layouts is attracting attention.
✔️ Proposes a new method to incorporate "UI grammars" that represent the hierarchical structure of UI into the prompts of large-scale language models.
✔️ Improves not only the performance of large-scale language models in generating UI layouts, but also the explainability and user controllability of large-scale language models.
UI Layout Generation with LLMs Guided by UI Grammar
written by Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, Toby Jia-Jun Li
(Submitted on 24 Oct 2023)
Comment: ICML 2023 Workshop on AI and HCI
Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
The images used in this article are from the paper, the introductory slides, or were created based on them.
First of all
In recent years, artificial intelligence (AI) and human-computer interaction (HCI) researchers have focused on user interface (UI) and graphical user interface (GUI) layout generation. from an HCI perspective, UI is a key element in providing a good user experience ( UX), and various research and usability testing methods have been developed to improve the usability and functionality of UIs.
In particular, since the emergence of RICO, a large dataset on mobile UI, a number of AI models have been proposed for mobile UI layout generation. These models include Generative Adversarial Network (GAN), Variational Autoencoder (VAE), Diffusion Model, Graph Neural Network (GNN), Transformer, and many others. GAN, Diffusion Model, Graph Neural Network (GNN), Transformer, etc.
More recently, research has also been conducted on how large-scale language models can be used for UI-related tasks. For example, studies have been conducted on using large-scale language models to perform UI modeling tasks and automated GUI testing. These studies show that large-scale language models can be effective in UI tasks and may improve UI without the need for large data sets or complex learning processes.
This paper also investigates the utility of large-scale language models in mobile UI layout generation. In particular, it focuses on how large-scale language models can generate high-quality UI layouts with one-shot learning. To address this, the paper proposes a new method called "UI Grammer" (UI Grammar).
This "UI Grammer" accurately represents the hierarchical relationships among UI elements, making the LLM generation process more structured and contextualized. By incorporating this UI Grammer, users can better understand how the LLM is generating UI layouts and have more control over the final generated results.
In this paper, we conduct experiments to evaluate the performance of large-scale language models in generating UI layouts and the impact of UI grammars on the process of generating large-scale language models.
What is UI Grammar (UI Grammer)?
UI elements within a screen have a hierarchical structure. This is reflected in the grouping capabilities of UI design tools such as Figma and the view hierarchy of Android. In this paper, we hypothesize that preserving this hierarchical structure between UI elements and using it to implicitly guide the generation process will improve the quality of mobile UI generation.
The "UI Grammer" proposed in this paper defines a set of generation rules for describing the parent-child relationship between UI elements in a screen. These generation rules take the form A → B, where A represents a parent UI element and B represents a sequence of one or more child UI elements. For example, in a simple UI structure visualized in the figure below, the UI grammar can be applied as "Root → Container Button, Container → Pictogram Text".
This paper evaluates the ability to generate mobile UI and its impact when the UI Grammer is used as a prompt in a large-scale language model, To evaluate the impact of the UI Grammer, we created two pipelines, one with the UI Grammer and one without the UI Grammer, and compared the performance of the two pipelines.
Note that rather than using the RICO dataset, which contains about 66k unique UI screens, as is, this paper uses the CLAY dataset, which is an improved version based on RICO. CLAY removes noise in RICO, such as the lack of match between UI elements and visual information, and contains approximately 59k human-annotated UI screen layouts, and contains data from UI layouts with less noise than RICO. The SCREEN2WORDS dataset, which contains natural language summaries of RICO's UI screens, was also used to construct the prompts.
How to assess the impact of UI Grammar (UI Grammer)
When generating layouts without using a UI grammar (UI Grammer), screens are randomly selected from CLAY and used as samples for in-context training and excluded from generation to prevent data leakage, as shown below For each UI screen in CLAY, a SCREEN2WORDS corresponding natural language description is retrieved and used as the description of the generation target; a meaningful list of 25 UI element labels defined in CLAY is included as a constraint in the prompt. We also control the response format of the API for the large language model to facilitate analysis of the generated results.
When using the UI Grammer to generate layouts, rather than having the large language model directly generate the final screen layout, as shown in the figure below, how to first introduce a list of screen UI grammars and then generate a sample UI layout using the provided UI grammar is Explanation.
An important step in constructing a prompt containing a UI grammar (UI Grammer) is to choose which screen grammar to parse from the CLAY dataset. This is because if a layout is generated using the description of screen S in the original CLAY dataset, and the grammar parsed from S is also entered into the prompt, data leakage would occur and screen S could be reconstructed directly from that grammar. To avoid this, we randomly split the CLAY data set 20/80 and use the grammar parsed from the 20% set to guide the generation of the 80% set.
In addition, since many screens in CLAY have similar layout structures within the same app, there is a possibility of data leakage where screens within the same app are included in both sets. To avoid this, the data set is split by app.
The above two pipelines are compared using OpenAI's GPT-4 API as of May 2023. The GPT-4 used is gpt-4-0314 ver. with max token=2,000 and temperature=0.7. The results generated by the two pipelines are shown below.
In the four screens, the two images on the left are the original images and the corresponding bounding boxes, while the two images on the right are the result of the UI layout generated by GPT-4.
In addition, quantitative evaluation is performed by MaxIoU, Overlap, and Alignment; MaxIoU is computed between CLAY's original screen S, including the screen summary provided as part of the prompt, and the generated screen S′; Overlap and Alignment are computed on the generated Overlap and Alignment are computed on the generated result S. The results are shown in the table below. The results are shown in the table below.
The Overlap in the table shows that the "GPT-4 no grammer" shows the best performance. Alignment shows that both "GPT-4 with grammer" and "GPT-4 no grammer" have good performance. It can also be seen that MaxIoU is slightly better for "GPT-4 with grammer" compared to "GPT-4 no grammer".
This paper suggests that large-scale language models are highly useful in generating UI layouts by incorporating the UI hierarchy of UI grammars into the prompts of large-scale language models. In addition to layout, large language models such as GPT-4 also have the potential to generate content and produce medium- to high-fidelity prototypes, according to the report. For example, they suggest that combining existing UI templates and design systems, such as Google Material Design, with large-scale language models could lead to a more automated, customizable, and efficient UI prototyping methodology.
The introduction of the UI grammar proposed in this paper not only improves the quality of generation of large-scale language models such as MaxIoU, but also increases the explainability and user controllability of pre-trained, black-box large-scale language models by introducing the UI grammar as an intermediate representation The report states that.
Categories related to this article