Detecting Anomalies In IoT Device Data With Both Accuracy And Latency

Time-series 09/09/2021

3 main points
✔️ There is a need for highly accurate and fast anomaly detection for IoT device data
✔️ A way to build a distributed system in a hierarchy up to the cloud, while adapting to the data and changing the level of inference implementation
✔️ Balanced results beyond both centralized and distributed systems have been confirmed.

Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach
written by Mao V. Ngo, Tie Luo, Tony Q.S. Quek
(Submitted on 9 Aug 2021)
Comments: Accepted by ACM Transactions on Internet of Things (TIOT)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

code：

The images used in this article are from the paper or created based on it.

outline

With the proliferation of IoT, it is inevitable that there will be a growing demand to utilize IoT data in a more organic way. In this paper, we propose a system that performs anomaly diagnosis on a hierarchical edge computing system (HEC) from IoT, edge computers to the cloud, while achieving both real-time and accuracy of data generated from IoT devices. More characteristically, the system adaptively decides which layer of the model to apply depending on the context of the data at each point in time. These systems are trained using reinforcement learning. Let's take a look at the details.

first of all

Smart IoT devices are spurring applications in smart factories, smart homes, autonomous vehicles, and digital health around the world. The huge networks of these IoT devices generate vast amounts of sensor data, from which machine learning and deep learning are used to extract insights and detect anomalies. Some systems, such as fire alarm systems, require real-time response. In such cases, streaming all traditional IoT data to cloud computers raises issues of communication latency, backbone network overwhelm, and data privacy risks.

Edge or fog computing, on the other hand, performs anomaly detection processing in the vicinity of sensor data sources, but faces the challenge of computational performance and overall over-resourcing. Pruning and distillation are methods to adapt large and complex models to the performance of IoT devices, but they must be fine-tuned on a case-by-case basis and are limited in that they can only be applied to a subset of sparse DNN models.

There are three problems with the distributed anomaly detection approach: 1) it will be a one-size-fits-all approach, but in practice each problem is different, 2) it will track accuracy instead of focusing on latency or memory usage, and 3) it may not properly analyze the locality in the distributed system and may send data back and forth to the cloud many times. data back and forth to the cloud.

Therefore, this paper proposes an anomaly detection system with high real-time performance and computational load for hierarchical HEC.

related research

The prior work we are influenced by is BranchyNet. It is a model for image classification that terminates inference when the confidence level of the inference is increased, and it is applied to hierarchical computing by dividing it into different layers.

Seeing that input data often has varying degrees of difficulty to analyze, we also consulted another prior work: models using kNN classification, which selects an appropriate inference model depending on the input image and the required accuracy Lin et al. use a reinforcement learning-based approach to dynamically prune at runtime BlockDrop proposes a method to dynamically use and drop regidual blocks during inference.

Description of the proposed method

In this paper, we apply it to anomaly detection in IoT and edge computing instead of images, and instead of multi-layered sequential kNNs, we use a single policy network to directly output the appropriate model according to the situation information, and handle multiple models in distributed computing with reinforcement learning.

overall configuration

Fig. 1 shows the overall structure of adaptive anomaly detection, which consists of three flows: 1) training of anomaly detection model (black solid line), 2) training of policy network (purple line), and 3) online adaptive detection (orange line). Up to data preprocessing is commonly performed.

In the first flow, we train a multivariate anomaly detection model. The second flow determines the score and threshold for anomaly detection. In the third flow, we train the policy network. It selects the best anomaly detection model according to the context of the input. Finally, in the online adaptive detection phase, the IoT implementing the trained policy network selects the appropriate anomaly detection model during execution.

Multiple Anomaly Detection in Hierarchical Edge Computing (HEC)

We assume a distributed stacked edge computing system (HEC) with k layers. The first layer is the IoT device, the second to k-1 layers are the edge servers, and the k layers are the cloud.

Univariate anomaly detection model

For univariate IoT data, autoencoder is applied. The number of hidden layers increases with the upper layers.

Multivariate anomaly detection model

For multivariate data, we use LSTM-based Sequence-to-sequence model. After all, the model becomes more complex towards the upper layers, and in the cloud, we use bidirectional LSTM-based seq2seq.

Dissimilarity Score

After training, the restoration error produces small values for normal values and large values for abnormal values. In general, the recovery error follows a Gaussian distribution, and the log probability distribution (logPD) of the recovery error is used as the anomaly score. Normal values show high logPD and abnormal values show low logPD. The minimum logPD of the normal data set is used as the threshold of the anomaly detection.

Adaptive model selection scheme

We propose an adaptive model selection scheme that selects the best anomaly detection model depending on the context of the input data so that each data is processed at the appropriate layer among the layers (IoT, edge, and cloud). The data is first input to the IoT device and then forwarded upwards to the appropriate layer.

We use a contextual-bandit model where the policy is determined in response to a reward, and the reward includes the accuracy and the cost of data transfer. To reduce the learning time, we perform mini-batch processing, training on N input contexts at a time. To prevent overtraining, we add an L2 regularization term to the loss function. For the balance between search and exploitation, we use the decayed-$\epsilon$-greedy algorithm.

Implementation and Experiment

data set

We use the UC Riverside time series dataset of electricity consumption for one year from a Dutch research institute and the UC Irvine multivariate MHEALTH data. The former is also available as a further updated database.

Implementation of anomaly detection model and policy network

It is implemented using Tensorflow and Keras, with a Raspberry Pi 3 as the IoT device, a Jetson-TX2 as the edge computer, and GPUs at the edge and in the cloud.

Accelerated Learning of Policy Networks

Following a distributed mechanism recently announced by Google that efficiently accelerates the learning process of deep reinforcement learning, we can modify parts of the algorithm to: 1) group inputs belonging to the same action and throw them together to batch inference the anomaly detection model, instead of querying the reward of each input sequentially. model; 2) run multiple anomaly detection models in parallel if the action output of the $\epsilon$-greedy method has more than one anomaly detection model.

Software Architecture and Experimental Setup

The software architecture is as shown in Fig.7, which consists of GUI, adaptive model selection scheme by policy network, and anomaly detection model at each layer; from GUI, we can set which dataset and selection scheme to use, and check the result. (Fig.8)

SOTA scheme for comparison

The benchmark targets are kNN-sequence, Adaptive-BlockDrop, and kNN-Single.

experimental results

Comparison of anomaly detection models

Table 1 shows the performance comparison between the three layers of models. The results correspond to the performance of the hardware and the size of the models.

Model selection scheme comparison

Table 2 shows a comparison with the benchmark, where Proposed is a hierarchical adaptive network. We can see that the delay is suppressed while the accuracy is improved.

Cost function: trade-off between accuracy and delay

If we change the parameter α which adjusts the weights of accuracy and delay, we can see the results as shown in Fig.11. In univariate data, we can see the dependency, but in multivariate data, the accuracy always seems to stay at the same level as the cloud.

Accelerated Learning of Policy Networks

When we apply the accelerated learning algorithm described above, the transfer time is significantly improved for univariate and the computation time is also significantly improved for multivariate.

Context information: hand settings and encoding feature values

The result is that the accuracy is better with encoding, but the delay is greater. Therefore, if you want to improve both accuracy and latency, one way is to do manual configuration. However, without domain knowledge, it is difficult to perform effective configuration.

Conclusion.

As we have seen, the real-time adaptive policy selection scheme proposed here shows an effective improvement in applying the appropriate level of computational power and models to the data from IoT devices for anomaly detection.

(The author's opinion) Several methods have been proposed for the model to understand the correlation between parameters or feature values. I am interested in whether it is possible to set up a model that considers hierarchy other than the LSTM-based seq2seq proposed here.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO

Detecting Anomalies In IoT Device Data With Both Accuracy And Latency

outline

first of all

related research

Description of the proposed method

overall configuration

Multiple Anomaly Detection in Hierarchical Edge Computing (HEC)

Implementation and Experiment

data set

Implementation of anomaly detection model and policy network

Accelerated Learning of Policy Networks

Software Architecture and Experimental Setup

SOTA scheme for comparison

experimental results

Comparison of anomaly detection models

Model selection scheme comparison

Cost function: trade-off between accuracy and delay

Accelerated Learning of Policy Networks

Context information: hand settings and encoding feature values

Conclusion.

Time Series Data Clustering With Quantum Computing (annealing Machines)

Time Series Data Clustering With Quantum Computing (annealing Machines)

WaveBound, A Regularization Method That Prevents Overlearning In Dynamic Time Series Data And Improves Prediction Accuracy

WaveBound, A Regularization Method That Prevents Overlearning In Dynamic Time Series Data And Improv ...

Time-Frequency Consistency (TF-C), The First Realization Of Prior Learning In Time Series With Self-supervised Contrasted Learning

Time-Frequency Consistency (TF-C), The First Realization Of Prior Learning In Time Series With Self- ...

Unsteady Transformer

Unsteady Transformer

Predictive Performance SCINet Beyond Transformers

Predictive Performance SCINet Beyond Transformers

FiLM Solves The Trade-off Between Noise Removal And Variability Detection In Time Series Forecasting Models.

FiLM Solves The Trade-off Between Noise Removal And Variability Detection In Time Series Forecasting ...