
Detecting Anomalies In IoT Device Data With Both Accuracy And Latency
3 main points
✔️ There is a need for highly accurate and fast anomaly detection for IoT device data
✔️ A way to build a distributed system in a hierarchy up to the cloud, while adapting to the data and changing the level of inference implementation
✔️ Balanced results beyond both centralized and distributed systems have been confirmed.
Adaptive Anomaly Detection for Internet of Things in Hierarchical Edge Computing: A Contextual-Bandit Approach
written by Mao V. Ngo, Tie Luo, Tony Q.S. Quek
(Submitted on 9 Aug 2021)
Comments: Accepted by ACM Transactions on Internet of Things (TIOT)
Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)
code:![]()
The images used in this article are from the paper or created based on it.
outline
With the proliferation of IoT, it is inevitable that there will be a growing demand to utilize IoT data in a more organic way. In this paper, we propose a system that performs anomaly diagnosis on a hierarchical edge computing system (HEC) from IoT, edge computers to the cloud, while achieving both real-time and accuracy of data generated from IoT devices. More characteristically, the system adaptively decides which layer of the model to apply depending on the context of the data at each point in time. These systems are trained using reinforcement learning. Let's take a look at the details.
first of all
Smart IoT devices are spurring applications in smart factories, smart homes, autonomous vehicles, and digital health around the world. The huge networks of these IoT devices generate vast amounts of sensor data, from which machine learning and deep learning are used to extract insights and detect anomalies. Some systems, such as fire alarm systems, require real-time response. In such cases, streaming all traditional IoT data to cloud computers raises issues of communication latency, backbone network overwhelm, and data privacy risks.
Edge or fog computing, on the other hand, performs anomaly detection processing in the vicinity of sensor data sources, but faces the challenge of computational performance and overall over-resourcing. Pruning and distillation are methods to adapt large and complex models to the performance of IoT devices, but they must be fine-tuned on a case-by-case basis and are limited in that they can only be applied to a subset of sparse DNN models.
There are three problems with the distributed anomaly detection approach: 1) it will be a one-size-fits-all approach, but in practice each problem is different, 2) it will track accuracy instead of focusing on latency or memory usage, and 3) it may not properly analyze the locality in the distributed system and may send data back and forth to the cloud many times. data back and forth to the cloud.
Therefore, this paper proposes an anomaly detection system with high real-time performance and computational load for hierarchical HEC.
related research
The prior work we are influenced by is BranchyNet. It is a model for image classification that terminates inference when the confidence level of the inference is increased, and it is applied to hierarchical computing by dividing it into different layers.
Seeing that input data often has varying degrees of difficulty to analyze, we also consulted another prior work: models using kNN classification, which selects an appropriate inference model depending on the input image and the required accuracy Lin et al. use a reinforcement learning-based approach to dynamically prune at runtime BlockDrop proposes a method to dynamically use and drop regidual blocks during inference.
Description of the proposed method
In this paper, we apply it to anomaly detection in IoT and edge computing instead of images, and instead of multi-layered sequential kNNs, we use a single policy network to directly output the appropriate model according to the situation information, and handle multiple models in distributed computing with reinforcement learning.
overall configuration
Fig. 1 shows the overall structure of adaptive anomaly detection, which consists of three flows: 1) training of anomaly detection model (black solid line), 2) training of policy network (purple line), and 3) online adaptive detection (orange line). Up to data preprocessing is commonly performed.
In the first flow, we train a multivariate anomaly detection model. The second flow determines the score and threshold for anomaly detection. In the third flow, we train the policy network. It selects the best anomaly detection model according to the context of the input. Finally, in the online adaptive detection phase, the IoT implementing the trained policy network selects the appropriate anomaly detection model during execution.
Multiple Anomaly Detection in Hierarchical Edge Computing (HEC)
We assume a distributed stacked edge computing system (HEC) with k layers. The first layer is the IoT device, the second to k-1 layers are the edge servers, and the k layers are the cloud.
Univariate anomaly detection model
For univariate IoT data, autoencoder is applied. The number of hidden layers increases with the upper layers.
Multivariate anomaly detection model
For multivariate data, we use LSTM-based Sequence-to-sequence model. After all, the model becomes more complex towards the upper layers, and in the cloud, we use bidirectional LSTM-based seq2seq.
Dissimilarity Score
After training, the restoration error produces small values for normal values and large values for abnormal values. In general, the recovery error follows a Gaussian distribution, and the log probability distribution (logPD) of the recovery error is used as the anomaly score. Normal values show high logPD and abnormal values show low logPD. The minimum logPD of the normal data set is used as the threshold of the anomaly detection.
Adaptive model selection scheme
We propose an adaptive model selection scheme that selects the best anomaly detection model depending on the context of the input data so that each data is processed at the appropriate layer among the layers (IoT, edge, and cloud). The data is first input to the IoT device and then forwarded upwards to the appropriate layer.
We use a contextual-bandit model where the policy is determined in response to a reward, and the reward includes the accuracy and the cost of data transfer. To reduce the learning time, we perform mini-batch processing, training on N input contexts at a time. To prevent overtraining, we add an L2 regularization term to the loss function. For the balance between search and exploitation, we use the decayed-$\epsilon$-greedy algorithm.
Implementation and Experiment
data set
We use the UC Riverside time series dataset of electricity consumption for one year from a Dutch research institute and the UC Irvine multivariate MHEALTH data. The former is also available as a further updated database.