Knowledge Graphs Evolve Through Human-AI Cooperation! All About KG-HAIT, A New Link Prediction Technology
3 main points
✔️ The importance of human-AI cooperation in knowledge graph embedding (KGE) models.
✔️ Generating Human Insight Feature (HIF) Vectors Using Human-Designed Dynamic Programming (DP).
✔️ Improving Performance and Training Efficiency of Link Prediction Tasks with the KG-HAIT System.
Harmonizing Human Insights and AI Precision: Hand in Hand for Advancing Knowledge Graph Task
written by Shurong Wang, Yufei Zhang, Xuliang Huang, Hongwei Wang
(Submitted on 15 May 2024)
Comments: Published on arxiv.
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
code:
The images used in this article are from the paper, the introductory slides, or were created based on them.
Summary
Knowledge graphs (KGs) have been applied in a variety of fields, including relationship discovery, question answering, and recommendation systems, thanks to their expressive power. However, despite its massive scale, KGs are inherently incomplete and manual knowledge collection is never sufficient. For this reason, Knowledge Graph Completion (KGC), known as a strategy for uncovering additional information to expand KG, has been attracting a lot of attention. And at the core of KGC, link prediction (LP), which is dedicated to extracting new reliable knowledge based on existing knowledge, is greatly facilitated by machine learning (ML) techniques.
In this paper, we propose a novel approach to improving the performance of LP tasks by combining cutting-edge techniques in the field of knowledge graph embedding (KGE) with human insight. Specifically, we utilize fully human-designed dynamic programming (DP) to generate human insight feature (HIF) vectors that capture structural features and semantic similarities of knowledge graphs, and incorporate them into the training of KGE models to significantly improve model accuracy and convergence speed.
This approach aims to overcome the limitations of traditional LP methods by leveraging the computational power and high accuracy capabilities of AI while leveraging the conceptual analytical and creative abilities of humans. The paper's results show significant improvements in a variety of benchmarks, suggesting the potential for further exploration and innovation in KG analysis techniques. It is hoped that this will open new avenues for AI and human collaboration to develop more effective and insightful methods for analyzing knowledge graphs.
Related Research
Human AI Team
Human-AI cooperation has been successful in areas such as data mining, decision making, and text generation. In data mining, Steyvers et al. showed that different human and AI perspectives complement each other in the types of misclassifications in classification problems. Cao et al. also reported that human and AI analyst teams performed well in collecting investment information and predicting stock prices. In the area of decision making, Munaka et al. studied the mechanism results of human-AI teams using partial observation information in cooperative games and showed good results.
Dynamic Programming
Dynamic programming (DP) is a method for simplifying and solving complex problems and has applications in diverse areas such as optimization and scheduling. The strong theoretical foundation and interpretability of DP can help improve the performance of AI models.
Link Prediction Model
Knowledge Graph Embedding (KGE) models are mainly classified into geometric, tensor decomposition, and deep learning models. Geometric models represent relationships as vector additions and are characterized by their simplicity and intuitive comprehensibility. For example, TransE has been successful in link prediction tasks. Other models include tensor decomposition models such as DistMult and TuckER, and deep learning models such as ConvE and KBGAT.
This study focuses specifically on geometric models and suggests ways to improve model performance by incorporating human insights.
Proposed Methodology (KG-HAIT)
We propose a new system, KG-HAIT, for Linked Prediction (LP), which is a collaborative effort between humans and AI. This system leverages human insight to extract features of the knowledge graph (KG) and incorporates them into the training of AI models to improve their performance. The proposed method consists of three main parts (see Figure 2)
1. Construction of Human Insight Feature Vector (HIF/Entity)
First, fully human-designed Dynamic Programming (DP) is used to aggregate graph structure information around each entity to generate HIF-entities. This allows us to capture the local subgraph features and semantic similarities of the entities; the computational process of DP is as follows
The initialization step computes the weights of the incoming and outgoing edges around each entity.
In subsequent steps, the interaction with the entity's neighborhood is repeated to obtain the final HIF-entity for each entity.
The detailed calculation procedure is shown in Algorithm 1.
2. Dimensional Reduction
Next, we find a transformation matrix to adjust the dimensions of the HIF-entity to fit an arbitrary number of dimensions. Specifically, we compress the dimensions while preserving the cosine similarity of each HIF-entity pair. This transforms them into low-dimensional vectors that can be used by AI models while preserving the information in the original high-dimensional space.
3. Construction of Human Insight Feature Vector (HIF/Relation)
Since the construction of HIF-relation is difficult, we use HIF-entity and the AI model itself. Specifically, a KGE model initialized with HIF-entity is trained over multiple epochs, and the resulting relational embedding vector is the HIF-relation. This allows the relation embedding to reflect human insights and facilitate training of the AI model.
With the proposed method, KG-HAIT achieves significant performance gains in the link prediction task, showing the impact of human insight on the training of the KGE model.
Experiment
Experiment Details
Three datasets, FB15k/237, WN18RR, and LastFM/9, were used in this study to evaluate the effectiveness of the proposed KG/HAIT system. The details of each dataset are as follows (see Table II).
FB15k/237: Dataset extracted from FreeBase and improved to avoid test leakage problems.
WN18RR: A dataset built on WN18 with the test leak problem fixed.
LastFM・9: A music listening dataset collected from the online music platform last.fm.
The number of entities, relations, and triples for each dataset is shown in Table II.
Three KGE models (TransE, TransH, and TransR) were used in the experiments and compared with and without the proposed HIF applied. All models were implemented in PyTorch and trained on an NVIDIA GeForce RTX 3090 GPU. Training was done in mini batches (batch size 2000) and Adam was used as optimizer. Hyperparameters were selected by grid search.
Result
1. link prediction: significant performance gains were observed for all KGE models when HIF was applied. Table III shows the results for each model with and without HIF. In particular, MR (mean rank) was reduced by an average of 42.8%, H@1 (hit@1) was improved by about 4 times for WN18RR, by an average of 44% for LastFM and 9, and by over 20% for FB15k and 237.
Table III shows the results of the TransE, TransH, and TransR models with and without HIF applied.
2. semantic similarity: we demonstrated that HIF has the ability to capture the semantic similarity of entities.
Figure 1 shows an example where entities of the same type exhibit similar subgraph structures. The cosine similarity between the two selected entity types (country/region and institution) was calculated and the results are shown in Figure 4 as a confusion matrix.
The average similarity within countries/regions was 71.36% and within institutions was 72.30%.
3. convergence rate: The TransE model with HIF applied has significantly improved convergence speed.
Figure 5 shows the epoch-by-epoch changes of H@10 and MR in the LastFM・9 data set; for the H@10 index, TransE with HIF applied grew rapidly in the first 100 epochs and converged after 200 epochs. In contrast, TransE without HIF showed no signs of convergence until 400 epochs.
Consideration
Experimental results show that the KG-HAIT system provides significant performance gains in the link prediction task. In particular, we found that human insight's ability to capture the semantic similarity of entities contributed to the improved performance of the KGE model. Training efficiency was also significantly improved due to the increased convergence speed. Future work will explore human-AI cooperation mechanisms more deeply through application to more advanced models and in-depth analysis of HIF vectors.
Conclusion
In this study, we proposed a new link prediction system, KG-HAIT, which combines human insight with the computational power of AI. KG-HAIT uses human-designed dynamic programming (DP) to generate human insight feature (HIF) vectors that capture subgraph features and semantic similarities in a knowledge graph (KG) and incorporate it into the training of the KGE model to improve link prediction performance. Experimental results confirm that HIFs significantly improve model accuracy and accelerate training convergence speed on multiple benchmark datasets.
As for future prospects, the first goal is to expand the scope of application of KG-HAIT to more advanced and complex KGE models. It is also important to conduct a detailed analysis of HIF vectors to clarify the mechanism of their performance improvement. Furthermore, we will deepen human-AI collaboration and promote the development of more effective and insightful knowledge graph analysis methods. This will lead to applications in diverse areas such as data mining, decision making, and recommendation systems. Ultimately, we hope to contribute to the further advancement of knowledge graph tasks by building new systems in which humans and AI cooperate closely.
Categories related to this article