# Multivariate Time-series Anomaly Detection Based On Actual Anomaly Patterns

*3 main points*✔️ We proposed a framework FMUAD that can explicitly capture three heterogeneous patterns specific to multivariate time series that cannot be captured by conventional classical or deep learning models

✔️ Features a modular structure and a loss function designed for brevity

✔️ When compared to SOTA on the same real-world dataset, it scores high on precision and recall in a balanced manner, outperforming the F1 score by 17

Forecast-based Multi-aspect Framework for Multivariate Time-series Anomaly Detectionwritten by Lan Wang, Yusan Lin, Yuhang Wu, Huiyuan Chen, Fei Wang, Hao Yang

(Submitted on 13 Jan 2022)

Comments: Published at IEEE BigData 2021

Subjects: Machine Learning (cs.LG)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

## first of all

Multivariate time series anomaly detection is of strong interest for real-world applications. These include road traffic monitoring, financial fraud detection, web log analysis, and network analysis. Several methods including graphs have been introduced on this site. Researchers at VISA are working on additional challenges for real-world operations.

One of the most important of the many challenges of anomaly detection is the nature of multivariate time series data itself. For example, it could be mutual effects between multiple time series, relationships within a series pattern, or frequency shifts or sudden changes in trend. Most forecast-based models are flawed because they do not explicitly address such patterns.

The proposed method takes a modular approach with each module characterizing three heterogeneous patterns: inter-temporal correlation dynamics, intra-temporal temporal dynamics, and multi-scale spatial dynamics. These modules are coupled and trained with a unified interface.

Another point is the lack of labels for heterogeneous events. Since the anomaly data is scarce, we need a label-independent model, i.e., unsupervised learning.

To sum it up.

The proposed FMUAD (Forecast-based Multi-aspect Unsupervised Anomaly Detection framework) captures different patterns of anomalies. We confirmed that the F1 score is more than 17% higher than other SOTA of Forecast-based methods.

It has a modular architecture to explicitly respond to different patterns of anomaly, making it an intuitive and describable framework. Since it learns different modules in a coupled manner, it not only captures the heterogeneity of each pattern but also detects the heterogeneity of mixed patterns.

We define a new loss function that includes not only the forecast error but also the conciseness (variance). Conciseness has been identified as a desirable property to control the variance in a class. By controlling the conciseness of the normal class, we aim to learn tighter representations that are more sensitive to outliers.

## related method

To represent the complexity of time series data, graph models have been proposed as described above, but the increasing complexity of the models makes it difficult to extend them to real-world applications. As a result, a universally satisfactory representation of time series data is still heavily debated today.

The extensive literature on time series anomaly detection can be divided into three main groups.

1) Proximity-based

Quantify the similarity of objects by defined distances. Detects objects that are farther away from the majority as dissimilar.

2) Reconfiguration base

Reconstruction error is assumed to be the core and the anomaly is in a different branch than the majority of data points. It cannot be effectively reconstructed from low-dimensional space.

3) Forecast Base

Unusual patterns that cannot be predicted from past data are assumed to be abnormalities. The anomaly is detected from the prediction error.

Forecast-based methods are further classified according to the method of forecasting. The classical method ARIMA is a statistical analysis model, which learns the autocorrelation of time series data to predict future values; Holt-Winter and FDA are similar concepts. While efficient, it is sensitive to the choice of dataset and model parameters and requires domain knowledge. Machine learning-based methods attempt to address these limitations: Hierarchical Temporal Memory ( HTM) is an unsupervised sequence memory algorithm for detecting anomalies in-stream data; Ding et al. combine HTM with Bayesian networks to develop a multivariate time LSTM-RNNs are widely used in SOTA research, and Hundman's proposed LSTM-NDT is an unsupervised, non-parametric thresholding approach that uses LSTM networks to generate predictions. DAGMM optimizes the parameters of the deep network and the Gaussian mixture model together to detect anomalies.

These methods are risky because they are solutions that capture all possible anomalies and vary markedly between anomaly patterns.

## proposed method

Fig. 2 shows the high-level conceptual diagram and Fig. 4 shows the detailed structure. Data is fed in with a window length k, and anomaly detection is performed on a window-by-window basis. _{It} is input combined with data from the past τ intervals to create a prediction model.

It is processed by branching into three types of change detectors ^{Dc}, ^{Dt}, and ^{Ds}. They look at changes in inter-series correlation, intra-series temporal patterns, and multi-scale spatial patterns, respectively.

To capture the characteristics of each series, we transform the input data _{Wt} into _{Ft} and _{St}, where _{Ft} is a frequency matrix and _{St} is a sine matrix containing the correlation information between series_{.}

**Inter-sequence correlation detectors**

To obtain the correlation between the series, we use the cosine similarity instead of the inner product. As shown in Fig.4, after making the window data series, the cosine similarity is obtained by TF1 transform, and after making the characteristic matrix series, the temporal information is extracted by ConvLSTM, and the weights are adjusted adaptively by temporal attention operation.

**The intra-sequence time pattern detector**

To capture the temporal pattern in the series, we perform a discrete Fourier transform TF2 on the window series data into a frequency matrix series. As before, we perform the ConvLSTM and temporal attention operations.

**Multi-scale pattern detector**

Another important anomaly pattern is spatial dynamics or "value change", such as (c) and (d) in Fig. 1, where (c) is easy to determine, but (d) is difficult to determine because the change is slow and slight. In the case of (d), if we aggregate the data into a time axis, it becomes spike-like and easy to detect. To understand such long-term dynamics, we use Dilated CNNs. In addition, we pass through a 1x1 Conv and a full join layer.

*Loss function*

We use a loss function based on the forecast error and its variability. The forecast error is _{l1} averaged over batches of the L2 norm of the true and predicted data. We further introduce a conciseness loss. The variance of the forecast error is _{l2} averaged over the batch. The final loss function is obtained by multiplying _{l1} and _{l2} as follows

## experiment

The datasets used for evaluation are SMD (Server Machine Dataset) and MSL (Mars Science Laboratory rover), which are often used in other papers.

The three benchmarked models are LSTM-NDT, DAGMM, and LSTM-VAE. The first two are forecast-based. The last one is also based on LSTM. The evaluation metrics are precision (P), recall (R), and F1 score (F1). table II lists the evaluation results. fmuad has a good balance between precision and recall and consistently scores the best F1 score. On average, this is an improvement of 17.8 percent.

### Intuitive and easy to understand

Fig. 5 shows the FMUAD output data for the two datasets MSL and SMD. The red area is the range where the anomaly was detected, the upper part is the manually selected original series that may be related to the anomaly, and the lower part is the FMUAD output.

In the left case, the frequency of the fourth data has changed. In the case on the right, the sudden change in the second and fourth data is easy to notice, but the part where the change continues in the latter half is difficult for people to judge. Both of them are detected clearly in the output of FMUAD.

### isolation

To confirm the modularity of the system, an isolation experiment has been performed, in which each of the three detectors was activated independently. Fig. 6 shows the results of running each of the three detectors independently, each of which is good at detecting different patterns.

TABLE IV shows the F1 scores. The results are as expected.

TABLE V compares the results when each detector is operated alone and when all detectors are operated. In both cases, the best results are obtained by running all detectors. Furthermore, it is noteworthy that the model is very stable, with a variation of only 0.0001 when the data set is changed.

The loss function included a term to control for variation; TABLE VI confirms the effect of this term. TABLE VI confirms the effect of this term and shows that the effect of the variation control is evident in both datasets.

## summary

In this paper, we have described a forecast-based unsupervised multivariate time series anomaly detection multi-aspect framework or FMUAD for short. This is a bold attempt in this category of anomaly detection models to identify the characteristics of anomalous patterns and classify them in a "divide and conquer" approach. A modular framework is introduced where the input time series is transformed into an intermediate representation that better captures each target pattern and feeds into each detector module that makes predictions for each of its intermediate representations. We also experimented with a new loss function that facilitates the brevity of the training representation during training. To bring them all together, we apply the loss function to compute the prediction error and generate an anomaly score.

By training the three detectors together and contrasting them with SOTA on the public reference dataset, we were able to prove that they have optimal anomaly detection performance: the standard F1 score of FMUAD is indeed higher than other models in its class. We also proved that each detector module performs favorably by associating the detection mechanism with human intuition and mapping each to the assigned anomaly pattern. In addition, we investigated the role of modularity through carve-outs and confirmed that combining all three modules provides consistently higher and more stable performance than using each alone. We were also able to confirm that the _{l2} loss term, which focuses on conciseness, provides a small but consistent improvement over the other improvements.

A recent trend we see in the machine learning community is a move towards "end-to-end" or "one-size-fits-all" models, in what appears to be a rebellion against the old-fashioned feature engineering approach. However, FMUAD may remind us that the individual patterns present in the input data may require some degree of tuned model design, even though it is an end-to-end trainable, self-adaptive model.

This is not the end of the journey, as the flexibility demonstrated in FMUAD opens up opportunities for further enhancements. For one, we will continue to discover new types of anomaly patterns and tailor the corresponding detectors to their characteristics so that they can effectively capture these patterns. Second, we may be able to better implement the detectors ^{Dc}, ^{Dt}, and ^{Ds}, which will allow individual modules to perform even more optimally. Finally, the three detectors of the FMUAD were completely isolated from each other and operated as a final aggregator. Enabling the interaction between these modules may lead to some interesting observations worth investigating. Overall, this model has increased the potential of the Forecast-based Anomaly Detection model. And we hope that it will start a discussion to further address more general directions.

With graph structures, we may be able to explore structures that optimize the balance between complexity and expressiveness.

Categories related to this article