A New Method For Global Description Of Heterogeneous Graph Neural Networks Using Description Logic

GNN 26/02/2025

3 main points
✔️ Propose a new method to utilize class expressions (CE) of description logics (DL) to address the global lack of explanation of GNNs.
✔️ Generates optimal class representation by beam search based on model fidelity and GNN score to explain GNN behavior.
✔️ Proposed method allows spurious correlations to be identified, improving model transparency and reliability.

Utilizing Description Logics for Global Explanations of Heterogeneous Graph Neural Networks
written by Dominik Köhler, Stefan Heindorf
(Submitted on 21 May 2024)
Comments: Published on arxiv.
Subjects: Artificial Intelligence (cs.AI); Logic in Computer Science (cs.LO)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Graph neural networks (GNNs) are widely used for node classification with graph-structured data such as knowledge graphs, product graphs, protein graphs, and citation graphs. However, the interpretability of GNN predictions (Explainability) remains a challenge, especially the lack of a "global explanatory method" to explain the behavior of the entire model.

Most existing explanatory methods attempt to intuitively understand the behavior of GNNs by visualizing their output results as subgraphs (small partial network structures). However, this approach cannot properly explain the higher-level conceptual patterns learned by the GNN (e.g., the semantic role of a particular relationship).

In this paper, we propose a new method for global explanation of GNNs by utilizing "Class Expressions (CEs)" in Description Logic (DL). This enables explanations that take into account complex rules (e.g. negation, inclusion relations, number constraints) that cannot be handled by conventional subgraph-based explanations.

This method builds an algorithm that generates multiple candidate CEs that explain the GNN's predicted results and selects the best one. This allows for more precise analysis and improved transparency of the GNN decision-making process.

This research could provide a basis for clarifying the basis on which GNN decisions are based, especially in areas where AI transparency is required (e.g., medicine, finance, law). Personally, I found the application of GNNs to be particularly promising in the medical field. When analyzing a patient's diagnostic data with a GNN, being able to clearly explain which medical knowledge the decision is based on will greatly contribute to gaining the trust of physicians.

Related Research

Classification of GNN Explanatory Methods

GNN's explanatory methods can be broadly divided into "local explanation" and "global explanation:

Local description (e.g., GNNExplainer, PGExplainer)
- Small subgraphs explain why a particular node is predicted.
- Existing methods primarily employ this approach.
- For example, a method of predicting the impact of a literature based on its citation relationship by showing a small number of influential papers.
Global description (e.g., XGNN, GNNInterpreter)
- A method that comprehensively describes all nodes with a certain label.
- Existing methods treat graph patterns as simple subgraphs, so complex rules cannot be applied.
- For example, a method of explaining why researchers in a particular field were classified into a particular cluster while showing overall research trends.

Use of Descriptive Logic (DL)

Description Logic (DL) is a logical system used to define ontologies (knowledge representations). In this study, a type of DL, "EL description logic," is used to generate CEs that explain the predictions of GNNs.

While it has been difficult to represent the learned rules of GNNs as graph patterns using conventional methods, the DL in this study allows us to describe the behavior of GNNs as more general rules.

For example, suppose GNN predicts that a certain product category is "fine wine". With conventional explanations, the only explanation could be as simple as "this wine is in a high price range. However, with descriptive logic, a more detailed logical explanation can be provided: "This wine is from France, is made from a specific grape variety, has been aged for a long time, and has a good past reputation.

Proposed Method

The proposed methodology for this study consists of the following three steps

Generate Class Expressions (CE)

CEs are randomly generated using beam search.
Each CE is compared to the GNN forecast results and scored as appropriate.

Scoring Function Design

Two scoring functions are used to select the best CE:

Model Fidelity Score (Fidelity)
- Evaluate the agreement between CE's and GNN's predictions.
GNN Score
- Evaluate GNN output for graphs with CE applied.

Figure 2 depicts the CE search process, starting with an initial random CE, and then deriving the best explanation through repeated scoring.

Applying A Global Description

After the best CE is selected, it is used as an explanation for the GNN's behavior. This improves the transparency of GNN decision-making and ensures consistency of explanations.

Experimental Results

Data-Set

The heterogeneous graph dataset "Hetero-BA-Shapes" was used in this study.
The graphs include different node types and edge types, which are suitable for evaluating the accuracy of GNN predictions and the validity of explanations.

Analysis of Results

In our experiments, we measured the performance of the proposed method using the following evaluation metrics

Fidelity
- Evaluate the agreement between GNN forecasts and CE.
- CE-based methods outperform traditional graph-based methods.
Explanation Accuracy (EA)
- Measure whether the proposed method actually captures the behavior of the GNN correctly.
GNN Output Scores
- Evaluate the extent to which the GNN supports the CE generated.

Table 2 shows a comparison of GNN scores and fidelity, revealing that the proposed method has high fidelity. In particular, the ability to identify spurious correlations (false associations) in the model is confirmed by the high fidelity of the proposed method.

Detection of Spurious Correlations

The proposed methodhas shown that it is possible to detect "false associations" (spurious correlations) thatthe GNNhasunintentionally learned.

For example, Table 3 compares GNN scores when different edge types are removed, visually showing that GNN is overly dependent on certain relationships.

Conclusion

In this study, we proposed a global explanation method for GNNs that utilizes description logic and achieves the following points

That the behavior of the GNN can be explained using CE.
- This increases the transparency of GNN's decision-making and improves its credibility.
Improved detection of spurious correlations
- The proposed method provides a new means of validating the results of GNN predictions.
Improved model fidelity and explanation accuracy
- Experimental results show that the proposed method achieves higher accuracy than conventional graph pattern-based explanation methods.

Future work includes extensions to a more expressive version of DL (ALCQ) and application to different GNN architectures.
Further evaluation on larger datasets is expected to further improve the utility of the proposed method.

Categories related to this article

Sasayama