AdaptIoT] Self-labeling System Using Cause-and-effect Relationships In The Manufacturing Industry

Internet Of Things 27/09/2024

3 main points

✔️ AdaptIoT supports self-labeling with interactive causality in manufacturing for high throughput, low latency data acquisition, and ML application integration
✔️ Self-labeling service automates task model adaptation and improves model accuracy and stability through continuous learning
✔️ 3D printer experiments achieve higher accuracy than other semi-supervised learning

A Cyber Manufacturing IoT System for Adaptive Machine Learning Model Deployment by Interactive Causality Enabled Self-Labeling
written by Yutian Ren, Yuqi He, Xuyin Zhang, Aaron Yen, G. P. Li
[Submitted on 9 Apr 2024]
Comments: Accepted by arXiv
Subjects: Machine Learning (cs.LG); Systems and Control (eess.SY); Methodology (stat.ME)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Machine learning (ML) has proven to be a productivity enhancer in many manufacturing applications. Several software and Industrial IoT (IIoT) systems have been proposed for manufacturing applications to receive these ML applications. Recently, a self-labeling method (SLB) that leverages interactive causality has been proposed to evolve adaptive ML applications. This method can automatically adapt and personalize ML models to accommodate changes in data distribution after deployment. The unique characteristics of the self-labeling method require a new software system that can dynamically adapt at various levels.

This paper proposes an AdaptIoT system that includes an end-to-end data streaming pipeline, ML service integration, and an automated self-labeling service. The self-labeling service consists of a causal knowledge base and an automated, full-cycle self-labeling workflow to adapt many ML models simultaneously.AdaptIoT provides a scalable and portable solution for small and mid-size manufacturers through a It employs a containerized microservices architecture. A field demonstration of a self-labeling adaptive ML application was conducted in the makerspace and demonstrated reliable performance.

Introduction

The integration of real-time machine learning (ML) technology into cyber-physical systems (CPS), especially in smart manufacturing, requires hardware and software platforms to coordinate sensor data streams, ML application deployment, and data visualization. Modern manufacturing systems leverage advanced cyber technologies such as Internet of Things (IoT) systems, service-oriented architectures, microservices, data lakes and data warehousing ML applications are integrated with existing tools to support and enable smart manufacturing systems can be supported and realized.

For example, Yen et al. developed a software-as-a-service (SaaS) framework to manage the health of manufacturing systems using IoT sensor integration to facilitate data and knowledge sharing. Mourtzis et al. proposed an IIoT system for small and medium-sized manufacturers (SMMs) that incorporates big data software engineering techniques to process terabytes of data on a monthly basis; Liu et al. proposed a service-oriented IIoT system to facilitate efficient data management and transmission in a cloud manufacturing paradigm. and transmission in a cloud manufacturing paradigm, Liu et al. designed a service-oriented IIoT gateway and data schema to facilitate

The main goal of the authors' research is to provide personalized intelligence in manufacturing, which requires adaptation of ML models to the environment after deployment. However, there are several barriers to developing and deploying personalized ML systems in a manufacturing environment. For example, the cost of manually collecting and annotating training datasets has slowed the adoption of ML-enhanced smart manufacturing systems, especially for small and medium-sized manufacturers (SMMs).

Recently, adaptive machine learning, which adapts to various deployment environments, has emerged as an effective solution to lower the ML barrier to entry for SMMs. Several adaptive ML methods have been proposed, including semi-supervised learning (SSL) with pseudo-labels, lazy labels, and learning that leverages domain knowledge.

A novel interactive causal-based self-labeling method has been proposed to enable adaptive machine learning in manufacturing cyber-physical systems applications. The method uses causal relationships extracted from domain knowledge to automate the post-deployment self-labeling workflow and adapt the ML model to the local environment. The self-labeling method automatically captures and labels data in real-time, effectively utilizing limited pre-assigned or public data sets.

To support and implement this approach, especially for SMMs, a system infrastructure is needed that meets the following requirements

Real-time time-stamped data transfer of sensor, voice and video data from heterogeneous services and devices.
A causal knowledge base that manages interactions between models and facilitates self-labeling ML among causal nodes.
A core self-labeling service that connects ML services, routes data streams, executes self-labeling workflows, and autonomously retrains and redeploys ML models at the edge.
Scalable architecture for easy integration of new edge, ML, and SLB services.

To address the unique needs of interactive causality, a new software system is needed that enables self-labeling capabilities for a variety of ML models. This software system will leverage real-time IoT sensor data, ML, and self-labeling services to allow models to adapt as the environment changes.

Related Research

Relevant studies for this paper are listed below.

IoT and Smart Manufacturing: Lu and Cecil proposed an IoT-based collaborative framework for advanced manufacturing. This will enable cooperation and data sharing throughout the manufacturing process.
Service Oriented Smart Manufacturing: Tao and Qi have demonstrated a new IT-driven Service Oriented Smart Manufacturing framework and its characteristics. This framework enables flexible and adaptable manufacturing processes.
Microservices and Manufacturing Systems: Thramboulidis et al. proposed a cyber-physical microservices and IoT-based framework (CPUS-IoT) for manufacturing assembly systems. This system monitors and controls the entire assembly line.
Data Lake and High Pressure Die Casting: Rudack et al. This will enable efficient management and analysis of large amounts of manufacturing data.
MONITORING AND DIAGNOSIS OF MANUFACTURING SYSTEMS: Yen et al. have developed a framework for monitoring and diagnosing manufacturing systems using IoT sensor integration. This framework facilitates data and knowledge sharing.
Big Data and IIoT: Mourtzis et al. proposed an IIoT system to handle monthly terabytes of data generation and transmission at a manufacturing site with 100 machines.
Cloud Manufacturing and IIoT Gateway: Liu et al. designed a service-oriented IIoT gateway and data schema to facilitate efficient data management and transmission.
CNC Machine and Edge-Cloud Coordination: Sheng et al. proposed a multimodal ML-based system for quality checking of CNC machines. This system coordinates from the edge (sensor data acquisition) to the cloud (deep learning computation).
Predictive Scheduling and Cloud Manufacturing: Morariu et al. designed an end-to-end big data software architecture for predictive scheduling in service-oriented cloud manufacturing systems.
ML Life Cycle Challenges: Paleyes et al. summarized the challenges in deploying ML systems at various stages.
Cyberinfrastructure for Smart Manufacturing: Davis et al. discussed cyberinfrastructure for democratizing smart manufacturing.
Adaptive ML and semi-supervised learning: Yan et al. proposed a self-labeling enhancement method for source-free adaptive ML with semi-supervised learning; Zhou et al. proposed a theory-driven self-labeling refinement method for contrastive representation learning; and Zhou et al.
Lazy labels and performance evaluation: Grzenda et al. investigated performance measures for evolution prediction in lazy labeling classification.
Data Programming and Physical Laws: Ratner et al. proposed a data programming method for rapidly creating large training sets; Stewart and Ermon proposed label-free super vision of neural networks using physical laws and domain knowledge They also proposed a new method of label-free super vision of neural networks using physical laws and domain knowledge.
Benefits of Self-Labeling: Ren et al. proposed a self-labeling method for adaptive machine learning in manufacturing cyber-physical systems. This method automates the self-labeling workflow after deployment and adapts the ML model to the local environment.

Overview of Interactive Causality and Self-Labeling Methods

Self-Labeling Methods Utilizing Causal Relationships

The self-labeling with interactive causality (SLB) method was developed to enable adaptive learning of ML systems. This method allows deployed ML models to adapt to changes in the local data distribution and performs self-labeling in real time. Self-labeling begins with the selection of causal nodes in a dynamic causal knowledge graph (KG) extracted from domain knowledge and ontologies.

As shown in Figure 1, self-labeling begins with the selection of causal nodes in the dynamic causal knowledge graph (KG). This selected node may vary in causality over time and is associated with transitions in effect states; SLB monitors one or more data streams and observes the time of occurrence of causal events.

Self-labeling requires three types of models

Effect state detector (ESD): monitors effect data and detects effect state transitions.
Interaction Time Model (ITM): Predicts causal time delay using effect data as input.
Task model: use causal data as input features and train a task model with effect transitions as labels.

Figure 1 illustrates the entire self-labeling procedure.

Figure 1. illustration of the overall self-labeling procedure.

Continuous Learning of Task Models

The task model is continuously learned through the SLB. Continuous learning is especially beneficial in scenarios where the input and/or output data distribution fluctuates from the initial training time. Causal relationships are robust to data drift, and this robustness carries over to self-labeling methods and provides the basis for continuous learning; SLB relates causal data to effect-state transitions and uses this to train task models without human intervention.

System Infrastructure Requirements

To support and implement the SLB methodology, a system infrastructure is needed that meets the following requirements, especially for small and medium-sized manufacturers (SMMs)

Real-time time-stamped data transfer from heterogeneous services and devices.
A causal knowledge base that manages interactions between models and facilitates self-labeling ML among causal nodes.
Core self-labeling service that routes data streams, executes self-labeling workflows, and autonomously retrains and redeploys ML models at the edge.
Scalable architecture for easy integration of new edge, ML, and SLB services.

Software Architecture of AdaptIoT

The software architecture of the AdaptIoT system has a modular structure designed to support self-labeling applications. This section describes the main functional modules of the system and their respective roles.

Module-Level Architecture

The AdaptIoT system consists of edge services, a data streaming manager (DSM), a database for storage, a cluster of machine learning (ML) services, an interactive causal engine (ICE), and a front-end graphical user interface (GUI) handler. Edge services consist of sensors, edge computing devices, external applications, and plant machinery. Local edge services stream data to databases and applications via the DSM, which acts as a back-end to route high-throughput streaming data to the appropriate destination

Several types of databases are implemented, including time series databases, SQL databases, and NoSQL databases. They store raw time-stamped sensor data, service and device metadata, processed ML results, self-labeling results, etc. In addition, clusters of ML services such as task models, effect state detectors (ESD), and interaction time models (ITM) are operationalized to provide actionable intelligence while participating in self-labeling workflows.

Figure 2 shows a high-level block diagram of an IIoT system for a self-labeling application.

Figure 2. high-level block diagram of the proposed IIoT system for a self-labeling application.

Interactive Causal Engine (ICE)

The Interactive Causal Engine (ICE) is the core engine for the adaptability of deployed ML task models; ICE consists of a causal knowledge graph database, an information integrator, a self-labeling service, and a self-labeling trainer. Each of these four components is responsible for a different task and automatically executes the self-labeling workflow.

The causal knowledge graph database stores multiple knowledge graphs (KGs) representing interactions and causal relationships among nodes. These KGs are extracted and reconstructed from existing domain knowledge. Figure 3(a) shows an example of a simplified knowledge graph for a 3D printer. Links in the graph indicate interactions, and connections between nodes suggest possible causal relationships.

Figure 3 (a) Simplified knowledge graph of the 3D printer use case; (b) Corresponding state transition relationships for the causal Hand&Arm and Controller nodes.

The Information Integrator connects the causal KG database, self-labeling services, sensor metadata, ML services, and users, integrating the necessary information to control self-labeling. Through the Information Integrator, users can start or stop self-labeling workflows between causally linked nodes.

The self-labeling service receives input from the information integrator and initiates the self-labeling workflow. This involves collecting the raw data stream, interfacing with the ML service, and coordinating with the self-labeling trainer.

The self-labeling trainer constantly monitors the number of self-labeled samples and retrains and redeploys task models upon user command. This trainer is designed to be independent of the self-labeling service for reusability and scalability.

Unit Service Model

To ensure scalability and homogeneity of the AdaptIoT system, an abstract, layered unit service model has been designed. This model applies to all services in the system and provides a standardized interface to generate and send data to storage locations. The unit service model consists of four layers: asset layer, data generation layer, service layer, and API layer.

Asset layer: abstracts independent components such as sensors and machines.

Data Generation Layer: Responsible for data generation and interfaces with the asset layer.

Service Layer: Integrates with the data generation layer and performs necessary service functions.

API layer: Defines API endpoints and manages interactions with other services.

System Implementation and Analysis

The implementation of an AdaptIoT system includes hardware and software infrastructure, with the implementation of a self-labeling service as a specific example.

Hardware Infrastructure for Cyber Makerspace

The AdaptIoT system is deployed in the CyberMaker Space Lab and includes the following manufacturing equipment

3D printer
CNC machines (milling machines and lathes)
collaborative robot
TIG welding machine

Each machine is equipped with multimodal sensors, including cameras, power meters, vibration sensors, acoustic sensors, distance sensors, and environmental sensors. Sensors are placed on critical components of the machine and at multiple locations to collect data. The CNC machines and robots are also controlled by programmable logic controllers (PLCs), which directly obtain information on the machine's operating status.

Software Implementation of AdaptIoT System

The AdaptIoT system implementation includes the following software components

Message queues: a method for asynchronous communication in distributed systems and computer networks; Apache Kafka is used to provide horizontal scalability and high throughput.
Database and Storage: The following types of data will be stored
Metadata: MySQL database
High Throughput Sensor Data: InfluxDB for Time Series Database
ML Service Results: MongoDB and MySQL
Causal Knowledge Graphs: Neo4j in the Graph Database
Video and audio data: file system

Figure 4 shows the hardware and software infrastructure of the AdaptIoT system.

Figure 4. Hardware and software infrastructure. Each block represents a containerized software service.

Figure 5. Web GUI displaying real-time data and ML results.

Data Flow

The AdaptIoT data flow describes the complete data flow from edge sensors to ML services in the following steps

Data generation: edge sensors generate samples and send them to the Kafka cluster.
Data Routing: Samples are processed within the Kafka cluster and stored in InfluxDB. The data dispatcher also routes them as HTTP data streaming, and the ML service receives the data stream.
Result storage: The inferred ML results are again sent to Kafka and stored in MongoDB.

Figure 6 shows a detailed implementation of the unit service model receiving data from an external application.

Figure 6. Example of a unit service receiving data from an external application.

Interactive Causal Engine (ICE) Implementation

The ICE implementation includes data structures for causal relationships between nodes and for managing causal logical relationships. The causal knowledge graph is stored in the Neo4j database, and the truth table is stored as key/value pairs in MongoDB.

The self-labeling service defines a standard class SlbService that can apply self-labeling with associated parameters. The output of self-labeling includes three key values: effect state, cause state end timestamp, and cause state duration.

Figure 7. self-labeling modular structure for multiple effects.

Figure 8. illustration of virtual interaction between ML services due to initialization of pairwise self labeling.

System Characterization

A system characterization was conducted to evaluate key performance indicators of the AdaptIoT system. The following elements were evaluated

Edge node throughput: One edge node has an average throughput of 284 msg/s, an average message size of 250.2 bytes, an average delay of 31 ms, and a maximum delay of 64 ms.
Camera Streaming: Using the Raspberry Pi 4B and the Raspberry Pi Camera Module 3, the camera streams in two modes: preview and full HD. The average delay in preview mode is 39 ms, while high quality image data is acquired in full HD mode.

Table 1 shows test results for a single edge node.

Table 1: Test results for a single edge node

These results show that the AdaptIoT system provides high throughput and low latency and is capable of integrating multiple edge and ML services.

Self-Labeling Experiments Running on AdaptIoT

This section provides a real-world example of a self-labeling application to demonstrate the effectiveness of the AdaptIoT system. This self-labeling application uses an adaptive model to detect operator-machine interactions with a 3D printer

Experiment Summary

The experiment aims to adapt a motion recognition model of a worker at a 3D printer by utilizing interactive causality. On one side of the causal relationship, a camera is used to detect the worker's actions, and on the other side, a power meter is used to detect the machine's response in the form of energy consumption.

Knowledge Graph Construction

This self-labeling application is implemented by extracting domain knowledge representing causal relationships among operators, machines, and materials to construct a causal knowledge graph (KG). Figure 3 shows a simplified knowledge graph for a 3D printer. This graph is used to relate the operator's actions to changes in the machine's state.

Sensor and ML Service Implementation

Five sensors corresponding to five nodes have been implemented, each with a corresponding ML service to detect changes in state. The following nodes have been implemented

Worker behavior: recognized using cascaded OpenPose and Graph Convolutional Networks (GCN).

Power changes in machines: Detected using event detectors and classifiers.

Figure 9 shows the experimental setup for the 3D printer self-labeling use case.

Figure 9. experimental setup for the 3D printer self-labeling use case. (a) shows the data processing pipeline for the task model and (b) illustrates the ESD pipeline pipeline for the current signal.

Perform Self-Labeling

In this experiment, 400 sample datasets were manually collected and labeled for validation and testing; a 3D printer was used for three weeks to automatically collect and label 200 self-labeled datasets using the AdaptIoT system.

Model Evaluation

We retrained the task model using the dataset generated by self-labeling and evaluated its accuracy. Table 3 shows the accuracy compared to several semi-supervised learning approaches.

Table 3: Accuracy of models trained on experimental data sets (%)

The results show that the self-labeling method consistently shows higher accuracy and better training stability than other semi-supervised learning methods.

Conclusion

In this study, we designed and demonstrated AdaptIoT, an IoT system that supports self-labeling workflows that leverage interactive causality to support the development of adaptive machine learning applications in cyber-physical systems.AdaptIoT is a manufacturing Designed as a web-based microservices platform for digitizing and intelligentizing the IoT, AdaptIoT includes an end-to-end data streaming component, a machine learning integration component, and a self-labeling service. throughput and low latency data acquisition, and ensures seamless integration and deployment of ML applications.

The self-labeling service automates the entire self-labeling workflow to enable real-time and parallel task model adaptation. the AdaptIoT system will be deployed in the makerspace as a university lab and serve as a foundation for the development of future adaptive learning cyber-physical manufacturing applications More adaptive ML applications based on AdaptIoT are expected to be developed in the future.

This system has the following features

High throughput and low latency data acquisition: AdaptIoT processes large amounts of data in real time and acquires data efficiently.
Seamless ML application integration: AdaptIoT facilitates the integration and deployment of ML applications and promotes intelligent manufacturing processes.
Automated self-labeling: Self-labeling services automate the adaptation of task models in real time, enabling continuous learning.

In the future, it is expected that more adaptive ML applications will be developed and commercialized in the field of cyber-physical manufacturing based on AdaptIoT.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO