Generate-and-contrast Self-supervised Learning Anomaly Detection That Fully Exploits The Information Contained In Graphs.
3 main points
✔️ An anomaly detection model has emerged that takes full advantage of the information contained in the graph structure
✔️ It incorporates a generative model and contrast learning based on GNN encoder/decoder
✔️ Combine the anomaly scores from the two models to obtain the final anomaly score
Generative and Contrastive Self-Supervised Learning for Graph Anomaly Detection
written by Yu Zheng, Ming Jin, Yixin Liu, Lianhua Chi, Khoa T. Phan, Yi-Ping Phoebe Chen
(Submitted on 23 Aug 2021)
Comments: Published on arxiv.
Subjects: Machine Learning (cs.LG)
The images used in this article are from the paper, the introductory slides, or were created based on them.
first of all
Recently, we have seen an increase in the number of domains that continuously produce complex, interdependent, and connected data, represented in the form of graphs or networks. Typical examples include social networks, biological networks, transportation networks, and financial transaction networks. Data mining and analysis from these graph-structured data have received significant attention, especially in the task of graph anomaly detection, which aims to identify patterns (e.g., nodes, edges, subgraphs) that are significantly different from the majority of the graph. For example, in financial trading networks, it is very important to identify anomalous edges (illegal transactions) between two accounts [1 ]. In social networks, it is also important to detect anomalous nodes (social bots) as they may spread rumors on social networks [2 ]. However, detecting anomalies in graphs is a challenging task because many graphs contain complex linkage (structure) information and node attribute information. As a result, anomalies may be hidden in the structure space, attribute space, and combinations of both. Furthermore, in many cases, the correct solution to the anomaly is unknown, making many supervised classification approaches inapplicable. These two challenges have prompted an increase in efficient anomaly detection efforts, ranging from shallow to deep representations for anomaly detection in a purely unsupervised learning fashion.
Shallow methods concentrate primarily on defining the weighing of anomalous quantities relative to the graph and developing an anomaly capture method based on these weighings.
For deep learning-based methods, in anomaly detection, autoencoders are often used in a purely unsupervised learning environment with no correct data. It encodes the context and structure into latent embedding and calculates the anomaly score by the recovery error.
However, existing graph- and autoencoder-based methods do not make full use of contextual information, even though it is important for anomaly detection.
Therefore, in this paper, we propose a self-supervised learning method shown in Fig. 1. After the subgraphs around the target node are encoded into latent representations by GNNs, generative and control models are generated, and the two are combined to produce the final heterogeneity score.
prior art (patents)
Traditional statistical methods, deep learning, and semi-supervised anomaly detection all deal with target data in Euclidean space, but recently anomaly detection from graph structure data outside of Euclidean space has been attracting attention. In AMEN, normality is defined as a measure of anomaly. In AMEN, normality is defined as a measurement. Matrix regression approach is applied, and there are two approaches, Rader approach, and Anomalous approach, in which anomalies are detected when residuals are large. In the graph, the Deep approach is applied. In CoLA, self-supervised learning, local information from network data is explored by a pairwise sampling of instances, and contrast learning is used to learn node representations. The anomaly score is computed by the predicted score on the control pairs; CoLA only uses self-supervised learning to capture anomalous patterns in contrast.
Self-supervised learning, which has been largely successful in computer vision and natural language processing, has been extended to the graph domain. DGI (Deep Graph Infomax) is the first contrast learning algorithm for unsupervised learning of graph data in the embedded form. MVGRL is from first-order neighbors and graph diffusion in two views.CG3 uses graph structure and limited label information-based contrast learning between nodes; MERIT enhances the supervised signal by maximizing the agreement of node embeddings across different views and networks; JOAO is a self-learning model that automatically learns graph expansions.
However, all these algorithms only represent the nodes and do not perform anomaly detection.
graph representation learning
The goal of graph representation learning is to learn the representation of each node or the entire graph to facilitate downstream graph analysis tasks; GCNs (Graph Convolutional Networks) convey messages in the spectral domain and perform well in node classification; GATs (Graph Attention network) automatically learns the weights of neighbors in the message transmission process; GraphSage algorithm improves scalability, SIGN performs graph convolution at different sizes and reduces graph sampling dependency; ARGA ( Adversarial Regularised Graph Autoencoder) uses adversarial learning to regulate embeddings in the latent space to improve the robustness of graph learning; GSNN (Graph Stochastic Neural Network) learns a family of functions at the GIL uses the strengths of Euclidean and hyperbolic geometry.
However, existing GNN approaches largely focus on learning generic graph representations and are still exploring anomaly detection (paper authors).
SL-GAD consists of three parts as shown in Fig. 2: graph view sampling, adversarial-contrast self-supervised learning, and graph heterogeneity scoring.
First, we select a target node from the input graph and use the context information of that node. Specifically, we generate two related graph views using different extensions. We then make full use of the rich mode and sub-graph level information to generate two self-supervised objects: generative attribute restoration and multi-view contrast learning. The former, inspired by the idea of Graph Auto-encoder (GAE), mismatch the attributes of the regression loss between the original feature vector and the restoration vector if the chosen target node is anomalous. The latter is a contrasting objective that directly compares the target node with the surrounding context in embedding and structure space. Overall, the model optimizes two self-supervised objects that are closely related to graph anomaly detection. During inference, two scoring functions are specifically designed based on the two objects above.
Graph view establishment for anomaly detection
Prior work has shown that the design of the fractionator pair is the key for graph encoders to extract rich structural and attribute information. Broadly, there are two categories: generative and contrastive, where the former mainly predicts attributes and structural auxiliary properties, and the latter can perform sorting at different scales (e.g., nodes and graphs).
Here, we propose two self-supervised learning objects (OBJECTIVES) from different scales/spaces to establish the connection between the target node and its surrounding context. First, node-level segregation is performed. It recovers the feature vector of the target node with GAE and compares it with the correct answer in attribute space. Next, to inject rich structural information, we construct embeddings between target nodes and local subgraphs, and mixed-level contrasts in structure space. By sampling multiple views, the in the contrast module. We explore a variety of semi-global information during differentiation.
In the left block of Fig. 2, we sample the target node from the input graph and then sample two different views around it to give different graph extensions. Increasing the number of views to three or more seems to degrade the properties.
For the generating target, the fractionator pairs are the original and the restored target nodes. The contrast targets are two sampled graph views consisting of a target node and two fractionator pairs.
Target node sampling
To focus primarily on node-level anomaly detection in the graph, we first sample the target nodes.
Graph view sampling
Random Walk with Restart ( RWR ) is applied as a data extension to the graph, sampling with fixed size K from the graph view centered on the target node.
Graph View Anonymization
The sampled graph views are anonymized. In other words, the feature values are set to zero. This means that the raw attribute information of the target node is not used in the computation of feature recovery or graph view embedding. It prevents information leakage and relies only on context information for anomaly detection.
Generative Learning with Attribute Recovery
Deep self-encoders are suitable for anomaly detection, but for graphs, they only recover node attribute information, so we use a GNN-based encoder/decoder such as
The encoder is represented using the embedding matrix, node feature matrix, and proximity matrix as follows. For more details, please check the paper.
GNN Base Decoder
Similarly, the decoder is represented as follows, where Wdec is the learning parameter matrix.
Generating graphs to detect abnormalities
As shown in the central block of Fig.2, we collect the proximity information and optimize it with the following loss function to minimize the difference between the original and restored features (MSE) of the two graph views.
multiview contrast learning
As illustrated in the central block of Fig.2, the control module consists of three parts.
The feature vector/matrix is obtained using the target node and two associated graph views as input. The graph encoder is the same as the one in the generative model. However, the transformation of the feature vector is as follows
To contrast the hidden layer ht with the surrounding subgraphs, the readout module generates two semi-global representations. (Fig. 2, middle part) In this paper, we use mean pooling for simplicity.
We generate positive and negative pairs P of h, g as follows.
To contrast the two elements of a pair, we design a fractionator with bilinear transformation using sigmoid function and obtain the contrast fractionation score as follows
Multi-view control graph anomaly detection
At the anomaly points of the graph, the surrounding context, attributes, and topological properties should be different. In other words, (12) should be noticeably larger than (13). This forms a multi-view contrast object with the Jensen-Shannon divergence. This is maximized when the target node and the surrounding context match.
Add them together for the two subgraphs to get the following equation
graph heterogeneity scoring
In the generative model, the attribute recovery of the target node is based only on the local context information, and the score function is as follows using the L2 norm distance
The contrast model segregates the target node from the surrounding nodes. Hence the score function is as follows
The above two heteroskedasticity score functions are combined to obtain the final graph heteroskedasticity score function of the following equation, where α and β are adjustable balancing factors.
Model Optimization and Algorithms
The loss function used for model optimization is a combination of (7) and (15), which yields
We use the following real-world data as our experimental dataset. The top two (BlogCatalog, Flickr) are social network data. Nodes represent users and edges represent relationships between two users. The rest is data about citations (ACM, Cora, CiteSeer, Pubmed ). Nodes represent published papers and edges represent citation relationships between two papers.
AMEN, Radar, ANOMALOUS, DOMINANT, DGI, and CoLA are used for baseline. The evaluation measure is ROC-AUC. The size k of the subgraph is 4. For other parameters, please refer to the paper.
The results are shown in Table 3, Fig.3. As we can see, SL-GAD shows outstanding performance against past SOTA as we expected. The fact that SL-GAD outperforms other deep models, not to mention the above three which do not use deep models and have inferior expressive power, shows the effect of combining two learning methods (generative model and control model).
In addition, the citation data pushes better performance, which may be related to the fact that SNS data is composed of higher dimensions. In this work, we fixed the size of the subgraphs, but efficient view sampling of higher-order graphs is future work.
Efficiency per component
Table 4 shows the results of isolating and evaluating the effects of the generative model, the control model, and the heteroskedasticity scoring component. The contribution of the control model seems to be larger than that of the generative model. We can also see the effect of scoring.
The dependence of the generative and control models within Eqs. (18) and (19) on the balance factors α and β are shown in Fig. 4. As above, the contribution of the control model appears to be significant but degraded when the generative model is set to zero.