Detecting Anomalies In IoT Device Data With Both Accuracy And Latency

3 main points
✔️ There is a need for highly accurate and fast anomaly detection for IoT device data
✔️ A way to build a distributed system in a hierarchy up to the cloud, while adapting to the data and changing the level of inference implementation
✔️ Balanced results beyond both centralized and distributed systems have been confirmed.

written by Mao V. NgoTie LuoTony Q.S. Quek
(Submitted on 9 Aug 2021)
Comments: Accepted by ACM Transactions on Internet of Things (TIOT)

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC); Networking and Internet Architecture (cs.NI)

code：

The images used in this article are from the paper or created based on it.

outline

With the proliferation of IoT, it is inevitable that there will be a growing demand to utilize IoT data in a more organic way. In this paper, we propose a system that performs anomaly diagnosis on a hierarchical edge computing system (HEC) from IoT, edge computers to the cloud, while achieving both real-time and accuracy of data generated from IoT devices. More characteristically, the system adaptively decides which layer of the model to apply depending on the context of the data at each point in time. These systems are trained using reinforcement learning. Let's take a look at the details.

first of all

Smart IoT devices are spurring applications in smart factories, smart homes, autonomous vehicles, and digital health around the world. The huge networks of these IoT devices generate vast amounts of sensor data, from which machine learning and deep learning are used to extract insights and detect anomalies. Some systems, such as fire alarm systems, require real-time response. In such cases, streaming all traditional IoT data to cloud computers raises issues of communication latency, backbone network overwhelm, and data privacy risks.

Edge or fog computing, on the other hand, performs anomaly detection processing in the vicinity of sensor data sources, but faces the challenge of computational performance and overall over-resourcing. Pruning and distillation are methods to adapt large and complex models to the performance of IoT devices, but they must be fine-tuned on a case-by-case basis and are limited in that they can only be applied to a subset of sparse DNN models.

There are three problems with the distributed anomaly detection approach: 1) it will be a one-size-fits-all approach, but in practice each problem is different, 2) it will track accuracy instead of focusing on latency or memory usage, and 3) it may not properly analyze the locality in the distributed system and may send data back and forth to the cloud many times. data back and forth to the cloud.

Therefore, this paper proposes an anomaly detection system with high real-time performance and computational load for hierarchical HEC.

related research

The prior work we are influenced by is BranchyNet. It is a model for image classification that terminates inference when the confidence level of the inference is increased, and it is applied to hierarchical computing by dividing it into different layers.

Seeing that input data often has varying degrees of difficulty to analyze, we also consulted another prior work: models using kNN classification, which selects an appropriate inference model depending on the input image and the required accuracy Lin et al. use a reinforcement learning-based approach to dynamically prune at runtime BlockDrop proposes a method to dynamically use and drop regidual blocks during inference.

Description of the proposed method

In this paper, we apply it to anomaly detection in IoT and edge computing instead of images, and instead of multi-layered sequential kNNs, we use a single policy network to directly output the appropriate model according to the situation information, and handle multiple models in distributed computing with reinforcement learning.

overall configuration

Fig. 1 shows the overall structure of adaptive anomaly detection, which consists of three flows: 1) training of anomaly detection model (black solid line), 2) training of policy network (purple line), and 3) online adaptive detection (orange line). Up to data preprocessing is commonly performed.

In the first flow, we train a multivariate anomaly detection model. The second flow determines the score and threshold for anomaly detection. In the third flow, we train the policy network. It selects the best anomaly detection model according to the context of the input. Finally, in the online adaptive detection phase, the IoT implementing the trained policy network selects the appropriate anomaly detection model during execution.

Multiple Anomaly Detection in Hierarchical Edge Computing (HEC)

We assume a distributed stacked edge computing system (HEC) with k layers. The first layer is the IoT device, the second to k-1 layers are the edge servers, and the k layers are the cloud.

Univariate anomaly detection model

For univariate IoT data, autoencoder is applied. The number of hidden layers increases with the upper layers.

Multivariate anomaly detection model

For multivariate data, we use LSTM-based Sequence-to-sequence model. After all, the model becomes more complex towards the upper layers, and in the cloud, we use bidirectional LSTM-based seq2seq.

Dissimilarity Score

After training, the restoration error produces small values for normal values and large values for abnormal values. In general, the recovery error follows a Gaussian distribution, and the log probability distribution (logPD) of the recovery error is used as the anomaly score. Normal values show high logPD and abnormal values show low logPD. The minimum logPD of the normal data set is used as the threshold of the anomaly detection.

We propose an adaptive model selection scheme that selects the best anomaly detection model depending on the context of the input data so that each data is processed at the appropriate layer among the layers (IoT, edge, and cloud). The data is first input to the IoT device and then forwarded upwards to the appropriate layer.

We use a contextual-bandit model where the policy is determined in response to a reward, and the reward includes the accuracy and the cost of data transfer. To reduce the learning time, we perform mini-batch processing, training on N input contexts at a time. To prevent overtraining, we add an L2 regularization term to the loss function. For the balance between search and exploitation, we use the decayed-$\epsilon$-greedy algorithm.

Implementation and Experiment

data set

We use the UC Riverside time series dataset of electricity consumption for one year from a Dutch research institute and the UC Irvine multivariate MHEALTH data. The former is also available as a further updated database.

Implementation of anomaly detection model and policy network

It is implemented using Tensorflow and Keras, with a Raspberry Pi 3 as the IoT device, a Jetson-TX2 as the edge computer, and GPUs at the edge and in the cloud.

Accelerated Learning of Policy Networks

Following a distributed mechanism recently announced by Google that efficiently accelerates the learning process of deep reinforcement learning, we can modify parts of the algorithm to: 1) group inputs belonging to the same action and throw them together to batch inference the anomaly detection model, instead of querying the reward of each input sequentially. model; 2) run multiple anomaly detection models in parallel if the action output of the $\epsilon$-greedy method has more than one anomaly detection model.

Software Architecture and Experimental Setup

The software architecture is as shown in Fig.7, which consists of GUI, adaptive model selection scheme by policy network, and anomaly detection model at each layer; from GUI, we can set which dataset and selection scheme to use, and check the result. (Fig.8)