Topological Data Analysis In Smart Manufacturing Processes--A Survey Of Current Technologies

Topological Data Analysis 11/01/2024

3 main points
✔️ Literature survey on topological data analysis (TDA) in Industry 4.0
✔️ TDA can be used to identify patterns and relationships in data that are difficult to detect using traditional methods
✔️ TDA is a domain of industrial production and manufacturing processes TDA has been shown to be a particularly suitable method for analyzing complex data sets from sensors and other devices in the areas of industrial production and manufacturing processes

Topological Data Analysis in smart manufacturing processes -- A survey on the state of the art
written by Martin Uray, Barbara Giunti, Michael Kerber, Stefan Huber
(Submitted on 13 Oct 2023)
Comments: Preprint still under review
Subjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Applications (stat.AP)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

Topological data analysis (TDA) is a mathematical technique that uses topological techniques to analyze complex multidimensional data and has been widely and successfully applied in fields such as medicine, materials science, and biology. This survey summarizes the state of the art of TDA in yet another application area: industrial production and manufacturing in the context of Industry 4.0. A rigorous and reproducible literature search is conducted on the application of TDA in industrial production and manufacturing settings. The results are then categorized and analyzed based on application areas and input data types in manufacturing processes. The main advantages of TDA and its tools in this field are highlighted, and its challenges and future possibilities are described. Finally, which TDA methods are underutilized in the industry (specific areas of the industry) and the types of applications identified will be discussed.

Here is an image made by ChatGPT showing an impression of this paper.

Introduction

Industry 4.0 is the fourth industrial revolution that combines digital and physical technologies. This revolution is transforming manufacturing processes, enabling the development of smart production systems that collect and analyze data in real time, make intelligent decisions, and adapt to changing conditions such as customized products and on-demand production. Topology is the field of mathematics that studies the shape of objects. More recently, topological data analysis (TDA) is a field at the intersection of data analysis, computer science, and topology, with particular use of the latter; TDA has been shown to be useful in a wide range of applications, including anomaly detection, image processing, genome sequencing, and predictive maintenance. However, its potential in Industry 4.0 has yet to be realized. The authors believe that TDA is particularly well suited for smart manufacturing systems. This is because it can be used to extract insights from complex data sets generated by sensors and other devices and make decisions based on them. In addition, it is beneficial to use TDA to identify patterns and relationships in the data that are difficult to detect using traditional methods. This study reviews current applications of TDA in manufacturing and production processes and identifies future research directions in these areas.

The applications of topological data analysis (TDA) are so diverse that it is impossible for individual researchers to track all developments. This raises the question of how to structure the interface to the application areas of TDA so that it is accessible to both theorists and practitioners. One recent effort is DONUT, a search engine that provides an easy way to search for TDA applications. Another way to structure this knowledge is through scientific survey articles that summarize and compare different approaches on how TDA has been linked to application setups. These articles usually focus on explaining the theory and presenting a few application examples that demonstrate the diversity of areas that can benefit from TDA. since a comprehensive survey of all applications of TDA would result in a document of unwieldy size, this approach is reasonable This approach is reasonable because a comprehensive survey of all TDA applications would result in a document of unwieldy size. In particular, for review works that bridge the gap between theoretical results of TDA and industrial applications, no other review articles on the application of TDA in industrial production and manufacturing processes can be found. On the other hand, literature reviews that bridge the gap to other areas and literature surveys on manufacturing processes do exist, but they are not specific to TDA. The authors review the application of machine learning (ML) and other data analysis techniques in electronic design automation of semiconductor manufacturing and briefly discuss the application of mapper algorithms in this manufacturing process. Regarding the challenges of big data analytics for smart factories, it states that TDA methods, e.g., mapper algorithms for clustering data, are promising, but does not provide empirical studies on real-world applications of TDA. A recent review of the current state of chatter detection in machining highlights the vast amount of research in this area and the potential application of TDA to specific problems. The current literature on data-driven approaches to metal forming and blanking technology is also reviewed, and the Uniform Manifold Approximation and Projection (UMAP) method is explicitly mentioned in this survey. However, Industry 4.0 and TDA include many more areas and methods. Big Data is another area relevant to Industry 4.0; a survey published in 2017 aims to bridge the gap between the theoretical results of geometric and topological methods and this engineering discipline. The application of Persistent Homology (PH: Persistent Homology) and mapper algorithms is also discussed in the field of additive manufacturing, where 3D printing is an excellent technology. Here we propose a different kind of survey dedicated to domain experts in industrial applications.

To the best of the authors' knowledge, this is the first paper to address the following questions What are the current applications of TDA in industrial manufacturing and production, i.e., what are the current applications of TDA in industrial manufacturing and production? And what applications are missing and should therefore be focused on?

This paper presents an overview of the current literature on the application of TDA in industrial production and manufacturing processes. We believe that this survey will be of interest to both theorists in the TDA field and practitioners in industrial production. Although both fields are very active and the number of publications is growing, mutual recognition is still lacking. The purpose of this survey is to bridge this gap and facilitate the exchange of ideas and methods between these worlds.

The contributions of this paper are as follows

1) provide an overview of the literature on the application of TDA in the field of industrial production and manufacturing processes;
2 ) indicate the areas and method combinations used;
3 ) highlight underutilized application areas and method combinations.

Smart Manufacturing in Industry 4.0

Industry 4.0

The term "Industry 4.0" was proposed by the German government in the early 2010s to promote the movement known as the "Fourth Industrial Revolution." This movement calls for greater flexibility in production, more adaptive machine operations, and smarter, more autonomous machines, production lines, factories, and even entire supply chains. This enables lot size 1 production, mass customization, and optimization of production not only on a machine-by-machine basis, but also along the entire value chain, contributing to new business and operational models. Industry 4.0 has also given rise to terms such as "smart factory" and "cognitive factory. However, the context of Industry 4.0 is broader than just the production of products. By incorporating other technologies that are going on simultaneously, from gene sequencing to nanotechnology, from renewable energy to quantum computing, there is the potential for innovative and revolutionary products. The operation of these systems requires the four design principles of Industry 4.0. They are,

Interconnected" (sensors, machines, and humans are interconnected),
Information Transparency" (i.e., information about all components in the system is transparent),
Technical Assistance" (technical equipment assists in decision making, problem solving, or assists or takes over difficult or hazardous tasks)
"Decentralized decision making" (decision making at the edge rather than by a centralized entity)

It is. Cyber-physical systems can make their own decisions based on the information they have. Exceptional cases are left to higher-level instances.In Erboz's review, the key components of an Industry 4.0 system are: big data and analytics, autonomous robotics, simulation, horizontal and vertical system integration, industrial IoT, cloud, cybersecurity, additive manufacturing, and augmented reality (AR). For cybersecurity in the context of Industry 4.0, the term operational technology (OT) security is also referenced; OT security is the cybersecurity area of OT systems. The term additive manufacturing refers to 3D printing technology in an industrial manufacturing context. In this technology, three-dimensional objects are created by layering materials in a computer-controlled process.

Production and Manufacturing

This section describes the use of the terms "Manufacturing" and "Production" as they relate to the process of creating a product. In some fields, he says, the term "Fabrication" is also found, as in semiconductors. Although there are differences in meaning among the terms manufacturing, production, and fabrication, this paper uses "production" and "manufacturing" as synonyms and uses these three terms collectively. The manufacture of a product involves a series of process steps applied by industrial machinery on a production line. At the beginning of the design process, product requirements are defined, followed by its conceptual design and evaluation. Based on this, a prototype is created and drawings are prepared for industrial reproduction. Here, "industrial" means repeatable, efficient, and effective. These drawings and product requirements define specifications for the selection of materials, processes, and production equipment. Production is then accompanied by inspection and quality assurance and is complete before the product is packaged. Manufacturing or production engineering describes the field of engineering involved in all processes of manufacturing. This field is concerned with the planning and optimization of the production process. Figure 1 shows the stages of the manufacturing engineering process in sequence. In general, smart manufacturing in Industry 4.0 poses new challenges compared to traditional manufacturing. Here, additional strategies and technologies are used to improve manufacturing processes and meet the needs for integration into Industry 4.0. An overview of the technologies and architectures for smart manufacturing systems is provided in a separate document.

Figure 1. the manufacturing engineering process with stages along the product manufacturing, starting with the product definition and ending with the final mass-produced deliverable. For readability, feedback connections are omitted.

Topological Data Analysis

The field of Topological Data Analysis (TDA) is divided into three main methods: the Mapper Algorithm, Persistent Homology (PH), and Uniform Manifold Approximation and Projection (UMAP). What these methods have in common is that they first transform the data at hand into an appropriate geometric representation and then analyze its topological properties. An important observation is that many parameters need to be handled when analyzing the data (tuning, deletion, weighting, etc.). Topology is a mathematical discipline that deals with geometry, and since data often contain geometry, topology can be used to deal with parameters that are "related" to geometry.

These three methods treat parameters in different ways. The mapper algorithm combines parameters (and their values) into different groups and clusters the inputs accordingly. These groupings often find previously unknown relationships in the data set. PH, on the other hand, overcomes the need to select parameter thresholds and analyzes the data against all possible alternatives. It tracks how the shapes in the data evolve along the thresholds. This makes it particularly suitable for automated production; UMAP is a dimensionality reduction method, which removes some parameters by projecting the data into a lower dimensional environmental space that can be more easily analyzed.

The pipeline for each method is shown in Figure 2 and discussed in more detail in the sections that follow.

Figure 2 (a) Mapper pipeline. (b) Persistent homology pipeline. (c) UMAP pipeline.

Mapper's Algorithm

The mapper algorithm is a conceptually simple approach; the only topological property considered is connectivity. Essentially, it is a way to map an object set V into a low-dimensional space Rd and construct a graph of clusters. The mapping is done using, for example, PCA or autoencoder. This step is important because two elements that are very far apart in the input setting may be grouped close together in the low-dimensional space, and if so, these associations will not be detected. Next, the image of f(V) is covered by the (overlapping) sets U1, . , Uk to cover them. Each Ui is pulled back to Rn and clustered using the clustering method of choice (e.g., k-means with fixed k if V is a Euclidean space, or kernel k-means for more general V). All f^-1(Ui) clusters form vertices of the mapper graph G, and if these clusters intersect, they add edges to G. Clustering takes place in the space of the original point set, but is guided by the filter function and coverage. Mapper graphs are used for exploratory data analysis. It typically looks for ridges (flares) in the graph. These are subpopulations that are connected across several scales (intervals) and are distinguished from other objects at these scales. These subpopulations are then analyzed (using traditional data analysis methods) to find their characteristics.

To give a practical example, the paper by Rodrigo Rivera-Castro et al. aims to improve state-of-the-art demand forecasting methods. The problem is as follows: a manufacturer needs to forecast the demand for a product and its (hierarchical) components. The frequency of demand for each component is a time series, labeled with the best-fitting forecasting model. These time series are incorporated into a mapper graph and divided into clusters based on the best forecasting model. This method has the great advantage of not only improving our understanding of the predictive model, but also efficiently selecting a predictive model for a new component. In practice, dividing f(V) into _U1, . , _Un, is the main barrier to the mapper algorithm, and the interpretability of the results depends entirely on this choice. Several standard choices are known, but obtaining meaningful insights from the mapper pipeline usually requires prior knowledge of the domain expert. Nevertheless, the mapper algorithm should be considered a powerful and versatile interactive tool that can reveal hidden connectivity within a data set.

Persistent Homology (PH)

This paragraph describes homology, a fundamental concept from algebraic topology. Homology is used to identify shapes that cannot be continuously deformed by each other. A formal description is beyond the scope of this paper, but informally, homology reveals the number of holes in k dimensions of a shape, for all integers k. For k=0, 1, 2, this corresponds to the number of connected components, tunnels and cavities in the shape. Importantly, given a continuous map between two shapes, for example, an inclusion from X to Y, there is a well-defined map between these holes.

The PH pipeline constructs a series of expanding shapes Xr, called filtrations, for each scale parameter r≥0, and observes how holes of different dimensions appear and disappear when the growth of Xr is considered as a continuous process. The evolution of this topological feature can be represented as a barcode (aka persistence diagram). This is a set of intervals (bars) representing the duration of the holes in a filtration. The length of a bar is called the persistence of the corresponding topological feature.

As a practical example, we illustrate the results of Jeffrey Mahler et al. In manufacturing, gripping an object is a well-known task. In addition to shape-closed grips and force-closed grips, we can also consider energy-limited cages. A force field f acting on an object O pushes the object into the gripper. Therefore, a certain amount of energy is required for the object to escape from the gripper against f. Setting a limit on the escape energy creates an energy-limited cage. The purpose of this paper is to identify these. To do so, the authors consider the configuration space of O, sample the free space, approximate it using alpha complexes, and construct a superlevel set filtration according to the energy potential for each unit. The energy-limited cage of the set space is displayed as a persistent homology class in the persistence diagram. There, birth time corresponds to the escape potential, death time corresponds to the deepest potential in the cage, and persistence corresponds to the escape energy as the difference between these potentials. One advantage of this framework is that the pipeline can be more easily automated, as there are often natural choices for selecting filtrations. There is also a wealth of theory on how to compare barcodes of two datasets and how to integrate PH into ML methods (e.g., kernel-based methods and neural nets). Well established theory and the interpretability of the obtained features contribute to the success of PH in practice. In addition, there are many efficient algorithms for computing filtrations and barcodes and comparing them. However, it should be noted that despite these advances, PH does not scale easily to very large data sets. This is in contrast to the conceptually simple mapper algorithm and the UMAP described next, which is specialized for large data sets.

Figure 3. left side is an energy-bounded cage: given a force field f, for a given pose of the object (blue), a certain energy is required to escape from the gripper (black). On the right are the points of the persistence diagram: the persistence corresponds to the escape energy required.

Uniform Manifold Approximation and Projection (UMAP)

It describes data that exist in high dimensional environments and the effect of UMAP (Uniform Manifold Approximation and Projection) as a means of reducing those dimensions. Often, data has multidimensional information, including spatial dimensions, costs, materials, and hierarchical position in production. However, for a given analysis, much of this information is unnecessary and can even hinder understanding. Therefore, it is sometimes desirable to reduce the dimensions of the data before the actual analysis; UMAP is very well suited for this task.

UMAP works by creating a weighted graph from the input points and projecting it into a lower-dimensional space to obtain another, simpler graph that preserves the information deemed important. This latter choice is made by selecting the appropriate projection. The construction of the graph is not easy and requires preserving information about the local distances of the points. It is based on the k-nearest neighbor method and a fuzzy construction, where the fuzzy construction is a way to weight the belonging to a set (not completely affiliated or unaffiliated, but elements can belong fuzzily to a set). This construction is very abstract, so we will not go into details here.

As an example, consider additive manufacturing of electromagnetic devices. There, manufacturing anomalies (geometric information) can cause unpredictable performance problems. Therefore, all but non-geometric information and information about electromagnetic performance is ignored, which is exactly what UMAP does in this case. The remaining information is fed to the ML pipeline, whose output is the relationship between geometric defects and performance; the output of UMAP, unlike the output of the mapper algorithm or PH, is not directly interpretable and must be analyzed further (e.g., using ML). Nevertheless, UMAP has significant applications within and outside of industry.

Method of investigation

To ensure reproducibility of the findings, it was decided to conduct the review as an exhaustive literature review in which each step is documented and reproducible. The problem defined for this work was a review of methods from TDA on the application of industrial production and manufacturing processes. The pipeline of the research methodology is as follows: definition of suitable keywords for the search, identification of the digital library to be searched, and filtering of the obtained works. These steps are described in detail here.

Keywords and Queries

This section describes how to define search queries for literature review. Search queries are defined by two categories: Method and Domain. Keywords in the Method category are used to find applications about topological data analysis (TDA) tools. Keywords in the Domain category, on the other hand, describe applications and tasks in industrial manufacturing processes. Figure 4 shows these categories and the identified keywords. The intersection of both categories is the search space for this literature review.

Figure 4. Venn diagram of the set of 13 keywords in the category "Domain" (left segment) and the set of 4 keywords in the category "Method" (right segment). The intersection of the two segments indicates the scope of the literature search. Note: The asterisk "*" indicates a wildcard, allowing for keyword variants (e.g., Technologies and Technology).

To create a meaningful search query, keywords in each category are connected using an "OR" statement. The search string results for both categories are connected using the Boolean "AND" operator. The resulting single search query is used to collect literature from the digital library.

Digital library

The five digital libraries used in the literature review are

IEEE (IEEE Xplore Digital Library)
Springer (SpringerLink)
Elsevier (ScienceDirect)
ACM (ACM Digital Library)
The American Society of Mechanical Engineers (ASME) (ASME Digital Collection)

These digital libraries were selected because they are the most prominent digital libraries for scientific publications in the fields of computer science and engineering, especially TDA and ML applications, and industrial engineering. The only exception is the ASME Digital Collection. It was selected because an initial and preliminary semi-exhaustive search conducted in Google Scholar using a restricted keyword set, similar to the method used in Maximilian E. Tschuchnig et al. showed additional relevant results different from those found in the first four digital collections It was selected because it showed.

IEEE Xplore, SpringerLink, and ACM Digital Library provide search interfaces that allow the use of generated search strings; ScienceDirect's interface is limited by the limited combination of Boolean expressions and search strings The following are some of the limitations of the interface Since the Boolean operator "OR" is distributable to the Boolean operator "AND," the search string is split into multiple search strings and combined with the Boolean operator "OR. This reduces the number of keywords per search string and yields the same search results. Finally, the ASME digital collection does not provide advanced searching and supports only one Boolean operator per query. In this particular case, 13 x 4 = 52 keyword combinations are used independently. The resulting reasonably small number of publications made this approach feasible.

From all of these digital libraries, all results are collected and stored using the Zotero Reference Manager. Metadata for each publication is automatically extracted through Digital Object Identifiers (DOIs).

Filtering Results

The filtering process for the literature review is described. The reason for not limiting the search results to a specific time period is that topological data analysis (TDA) is still a young research field, and the application of TDA, especially in the field of industrial manufacturing processes, is even newer. The search took place at the end of June 2023 and all works published up to that date were considered. The filtering procedure was done step by step and items were removed according to the following criteria

Duplicates
Related types of publications
Language of references
Availability of full text
Context of works

The first step in the filtering procedure was the removal of duplicates identified by DOI and publication title. Where duplicates were found based on title, these duplicates were manually identified. Each duplicate was removed and only one instance of the publication was retained. To provide a meaningful, high-quality review, only publications that met certain quality criteria were considered (i.e., peer-reviewed publications). Based on this requirement, only conference proceedings and journal publications were considered for this review. Since the status of the peer review process was not provided by the digital library, it was assumed that these were peer reviewed. Other references such as preprints, presentations, books, and reports were excluded and not considered in the analysis. To ensure that information was correctly extracted from publications and that they were generally reproducible, only publications for which full text in English was available were considered. Availability is highly dependent on our institutions' access to and subscription to these digital libraries. Therefore, all publications without available full text were manually screened to ensure that no relevant publications were missed. In the case of missing but relevant full text, alternative sources such as preprint servers and author web pages were searched to retrieve the full text. Although it was not necessary to perform this procedure, the majority of the publication's full text was available in the digital library. Semi-automatically filtered references are analyzed for context. Therefore, all publications were screened manually. Here, keywords in the method and domain categories were searched within the publications. These had to be present in the section relevant to the contribution. In particular, it was not sufficient to mention a method or domain only in the relevant work or outlook. After this filtering procedure, only the remaining publications are further considered for this study. In total, 4683 results were screened and 27 publications were considered relevant to our research question. These publications were screened in detail. Based on this screening, publications were manually sorted into different categories and then presented.

Result

It is explained that Topological Data Analysis (TDA) methods are used not only for rigorous data analysis tasks, but also for validation of analyses performed by other methods. For example, in one paper persistence homology (PH) is used for evaluation, and in several papers UMAP is used for the same purpose. However, since this survey focuses on direct applications of TDA, these tasks are not included in the following discussion.

Studies from production areas other than manufacturing, such as oil production, were also found during the literature search. These studies were not included in this study because they are not related to manufacturing processes. Additionally, other studies mention the use of TDA methods in manufacturing processes, but no empirical work has been done. These include potential applications of TDA methods to hybrid twins for smart manufacturing and 3D printing.

In total, 27 works were identified as relevant to this study. Each piece was assigned to one of three clusters (A-C) based on its application within the manufacturing process. The clusters identified are as follows:

A: Product-level qualitycontrol
B : Process-level quality control
C : Manufacturing engineering

The relevant works are listed in Table I. In this table, each work is assigned to one of three clusters (A-C) and arranged according to the TDA method used. Figure 5 provides an overview of these works in relation to the clusters to which they were assigned. The figure also shows the TDA method used in each application area. The publication dates of the listed works show that interest in TDA methods in the manufacturing process has increased over the past years. the first publication using TDA methods was published in 2016. a significant increase in interest can be observed starting in 2022. the number of publications after 2022 is higher than in the years before 2022. than in the years prior to 2022. No publications were found in 2020 for this survey work. Figure 6 shows the absolute number of publications identified by year. While the illustration in Figure 6 appears to show a decrease in interest in TDA methods in the manufacturing process toward 2023, it should be noted that the data for 2023 is incomplete and this survey only includes publications added to the digital library by June 30, 2023. A more detailed summary of the results is presented in Table II. This table shows each individual work, its associated cluster, the TDA method used, and the type of input data used to solve the task. The input data types were extracted from the referenced works. The most common data type is time series data, followed by point clouds and scalar fields. In addition, one work was found that applied the TDA method to text log files and labeled graphs. In the remainder of this section, the three application clusters identified are discussed. For each application area, a brief description is presented, followed by a brief summary of the relevant works. For more information on these works, please refer to the original referenced publications.

Figure 5. This figure shows the relationship between the identified clusters and the works. Additionally, the TDA methodology used is shown above each cluster. The numbers in parentheses indicate the number of publications involved.

Figure 6. number of relevant papers per year.

Product-level quality control

Two different methods of quality control are considered in the works found, either at the product level or at the production process level. In the first class the quality of the production is assessed based on the goods produced; in the second class the quality is assessed by observations from the production process. Here we focus on the results related to the first class.

The use of TDA methods at the product level allows for a very efficient analysis of the quality of the goods produced. In general, TDA methods are very well suited to analyze structures, surfaces, and shapes, and are very efficient in terms of computational complexity as well as robustness to noise. The works identified in this part perform the "classical" tasks of TDA, and these tasks are commonly found in the literature. Nevertheless, we included only those works that explicitly mention the application of TDA at the product level in the production process.

A natural application of TDA at the product level is the analysis of product topological differences. In this work, the authors describe the classification of topological differences in additive manufacturing (AM). The products are embedded as R3 meshes and pure homology is used primarily. Analysis of surface textures is another natural application. In earlier works PH was used for differentiation. No specific task was provided in this early work, but it is the case in later works. The latter discusses surface texture as an important factor in product quality. Their method was applied to surface profiles and then to the more specific task of microscopic images.

For shape segmentation, a new method using PH and graph convolutional networks has been proposed. Their PH-based graph convolutional network surpasses the state-of-the-art in fine-grained 3D shape segmentation methods performed on point cloud data. More specialized use cases are presented. An application of their work is quality control in wafer production. The task is to cluster defect patterns using a mapper algorithm. Input features are extracted from wafer map images by a vision transformer. Another case is the production of electric motors. In another paper, the authors use PH to detect eccentricity in electric motors. The data here is given as a time series of process parameters for electric motors. Their study is able to predict failure levels with reasonable accuracy, while at the same time keeping computational complexity low through the use of a simple regression model. Similarly, they analyze the root causes of manufacturing variations between parts in the manufacturing process of mechanical components. Applied to point cloud data taken from optical scan data, the ML method is extended using UMAP. Another paper does not classify individual product anomalies. Their use case requirement is to detect and discard defects in manufactured wafer maps. Within the subsystems of their deep learning pipeline, UMAP is used for dimensionality reduction. The same task has been performed in a very recent work [27], detecting defect patterns in wafer maps during the production process. In this work, the authors propose the use of PH for feature generation in neural networks. For further processing by the neural network, the resulting persistence diagram is converted into a persistence image; for the task of additive manufacturing of RF devices, the authors propose the use of UMAP and convolutional neural networks for microscope images. Identifying defect mechanisms and their impact on performance by mapping geometric variations to electromagnetic performance metrics contributes to faster and cheaper quality control, since in-line electromagnetic simulation is not required.

Quality control at the process level

Following an overview of the literature on quality control, we discuss the results at the process level. The goal is to use process data to assess the quality of the production process by observing process variables rather than product quality (see Figure 7). Examples of data include machine conditions, sensor data, or other data obtained from the production process. Seven studies were found during the review for this task. While this number suggests a wide variety of applications, two major applications were actually identified.

Figure 7. The application on process-level quality control captures key process parameters as time-series data as the basis for quality control of the machine process. The process parameters are shown in the schematic diagram of an injection molding machine. This diagram is complete with respect to the type of input data (time series), as shown in Table II.

The goal of the study using observations from key process parameters is to predict the productivity of the manufacturing process. According to the authors, these studies are the first to use TDA methods for manufacturing applications.

There is also a proposal to use a mapper algorithm to identify unique clusters in the benchmark processing data set. Using the output network of the mapper algorithm, key process variables or features that affect final product quality are selected. Their research has shown that it achieves the same level of predictive accuracy as using all process variables and is more cost effective.

The second application within the cluster of process-level quality control is chatter detection. Chatter detection in machining has received a lot of attention in the last few years, as can be seen by the research studies on that particular application area. Chatter detection is important because it can damage workpieces and machine tools. In particular, the detection of chatter using the TDA method has attracted some attention and influenced Firas A. Khasawneh to provide five works in his research. The first work was a proof of concept, in which the author showed that chatters can be detected using PH. In the subsequent work [26], the author proposed a method for chatter detection based on PH and supervised learning. The author states that this method can detect chatter with high accuracy. In [60], the author proposes a supervised chatter detection method based on topological feature vectors obtained using PH. The same work is described in more detail in [59]. In a subsequent work, they further propose a transfer learning method [58]. Here, they showed and evaluated how transfer learning can be used to improve chatters' performance when training with different datasets. In this work, dynamic time warping is also used to align the time series. Naturally, all process tasks rely on sequential steps in time. Therefore, it is a natural fact that all the works in this cluster apply their methods to time series data.

Manufacturing Engineering

This section describes the application of TDA (Topological Data Analysis) in the field of manufacturing engineering. Manufacturing engineering is the engineering discipline that designs, analyzes, and improves manufacturing processes and systems. Tasks in this field do not focus on the products themselves, but are concerned with the processes and systems used to manufacture these products. Manufacturing engineering tasks include optimizing material flow, optimizing production processes, optimizing production systems, selecting components, and designing production lines. A common task for manufacturers is temporal planning of production. Product demand can fluctuate with the season, region, weather, and events such as promotions and holidays. Failure to meet demand can result in lost customers, while overproduction can lead to financial losses due to large product storage and disposal. It may be beneficial to share forecast models by grouping similar products; Rodrigo Rivera-Castro et al. propose different methods for forecasting demand in different ways. To generate a forecast for a new product, a forecast model needs to be selected. For this selection process, a k-nearest neighbor algorithm based on Mapper Graph is proposed. By exploiting the topological properties of historical time series data, the authors claim that the selection of the forecasting model is more accurate and faster than other methods. Recent studies have addressed problems that depend on the expertise and experience of machine operators. Machine alternations require reparameterization, but these parameter changes are not based on numerical evidence, but on the manual work and experience of the machine operator. The disadvantages of this are that this reparameterization can only be reproduced to a certain extent and that operators need to be trained for long periods of time to gain the necessary experience with a particular machine type. They propose the joint use of existing ML tools to Using interpolation techniques, the reduced manifolds are used to generate new geometric designs by inferring missing information using clustering techniques. Their research relies heavily on PH (persistent homology) and persistent images. Material flow optimization is the task of optimizing the schedule and flow of materials through the production system. This task includes the transportation of raw materials from the depot to the production site, the transportation of semi-finished products between production lines, and the transportation of finished products from production to the depot. Given the complexity of the production system, material flow optimization is a challenging task because all involved components have different capacities, exchange times, and other constraints. Given these constraints, material flow optimization must be addressed from both a business and a technical perspective. The task of material flow optimization is a multi-objective optimization problem that aims to minimize the cost and time of material flow. In this benchmark task, the material flow from the depot to the production line is proposed to be optimized as a multi-vehicle routing problem using data represented as a point cloud in a multi-dimensional space. The evaluation is performed using PH.

This section describes optimization tasks for independent moving objects (such as grippers and robots) within a manufacturing environment. These objects need to be protected from collisions with other objects, the surrounding environment, and most importantly, with human operators. Protection is achieved through the use of physical cages that the objects operate in, but these cages are complex to construct, inflexible, and expensive. A more cost-effective solution is to use virtual cages that limit objects by virtual boundaries. The task of this work is to synthesize a planar energy-limiting cage with the optimal configuration for a given object. The optimal configuration is found by identifying the gripper and force direction configurations and applying persistent homology (PH). For this purpose, objects and grippers are modeled as a point cloud. Ensuring that products leaving the production line are of consistent quality is a major task in manufacturing engineering. Shipping defective products can lead to loss of reputation and, in the worst case, loss of human life. System-level tests are performed on each product to ensure that quality requirements are met. To generate classification rules for defective products, Ho-Chieh Hsu et al. propose a method of dimensionality reduction using UMAP: OT systems are typically cyber-physical production systems, where the physical parts are controlled by a computer system via sensors and actors. controlled by computer systems via sensors and actors. Building and extending such systems is a difficult task because these systems tend to be very complex and heterogeneous. Therefore, in order to find repeating patterns in these systems, Markus Unterdechler et al. suggest a method of reusing components already established in the system. This increases reliability, reduces costs, and reduces maintenance effort. In their method, UMAP is used for dimensional reduction. A case study in the herbal drug manufacturing industry is presented. In this study, the authors attempt to analyze the degradation of the evaporation process. This is because it is a major factor in manufacturing costs. For the analysis, UMAP is used to reduce the dimensionality of the time series data. For the task of preventive maintenance, Xiaoyu Zhang et al. propose a method of analyzing machine maintenance data. Such data sets are often heterogeneous and multidimensional logs. These data need to be analyzed to find patterns that indicate failure. The authors introduce a visual analysis approach for diagnosing such heterogeneous and multidimensional machine maintenance data (text log data), where UMAP is used for dimension reduction as part of the processing pipeline. Another approach to the analysis of machine maintenance data is presented. In their study, the machine deterioration problem in the precision blanking industry is mitigated by using acoustic emissions to observe machine wear. This approach is based on UMAP and hierarchical clustering of time series data. The authors claim that data visualized in two dimensions allows machine wear to be identified while representing temporal dependencies in the data. in Industry 4.0, security of OT systems is a key concern. OT has very different requirements than information technology (IT) systems. OT has very different requirements than information technology (IT) systems. This reflects the challenges in the security area: only one work covering the topic of OT security was found during the survey; Joaquı ́n Ordieres-Mer ́ e et al. The data is provided as time series data, and they propose to use UMAP in the

Discussion and future research directions

In recent years, interest in the application of topological data analysis (TDA) to industrial production and manufacturing processes has increased; since its first application in 2016, the number of publications has steadily increased, peaking in 2022. It is expected that interest in the application of TDA in this field will continue in the future. The number of publications per specific application varies. One of the popular applications in this survey is very natural from a topological point of view: the analysis of product geometry, surfaces, and features. Of course, there are many other publications that address similar topics besides those works applied to production and manufacturing settings. Another popular application is the analysis of process data in the field of manufacturing engineering. The most widely used TDA method is PH, which 14 works use in some way; UMAP is used in nine works; and the most popular TDA method is PH, which is used in two works, one of which uses UMAP and the other of which uses PH. Several works highlight the advantageous properties of UMAP with respect to topological conservation, but rarely discuss or empirically compare it to other dimensionality reduction methods. The least used method is the mapper algorithm, used by only four works. However, it is a very successful method in other application areas, such as medicine. Therefore, we believe that the potential of the mapper algorithm in Industry 4.0 has not yet been fully exploited. In terms of data types, the most common application of TDA in this context is time series data. In particular, the detection of chatter in machining processes is a very popular task. This application has received a lot of attention in the past few years, and the application of TDA is one of the most competitive approaches among many. Nevertheless, it is a bit surprising that more classical TDA data such as point clouds are not as popular as time series data in this context.

In the application of TDA (Topological Data Analysis) methods to data types, time series data are resolved in all observed methods. The same is true for scalar fields, but here only one work uses the mapper algorithm, while other methods are used more frequently. An interesting observation is that the works using the mapper algorithm are only applied to time-series data (3 works) and wafer maps (1 work). This is a bit surprising, since the mapper algorithm is primarily used for time series data. This is because the mapper algorithm is primarily an algorithm applied to the clustering problem, which is a natural task for point clouds. In the context of this survey, we did not find a single work that uses the mapper algorithm with point clouds. We are not surprised by the cluster (B), where all the workpieces are related only to time series data. This is because a process is essentially a series of actions and is naturally described as a time series of events. However, many other data in Industry 4.0 are modeled as point clouds and can be analyzed with the Mapper algorithm. Therefore, practitioners working with this type of data are encouraged to consider the mapper algorithm for their analysis. In the future, we hope to see TDA applied more in the context of industrial production. Because we believe that there are many possibilities where this could be beneficial. From the perspective of this work, the number of publications is still small but growing. The discussion so far has already shown some areas where we see potential for future research, and some underutilized combinations of input types and methods. However, other areas of potential future research still exist. First, there is great potential in applying TDA to behavioral data. Run-time measurement data from machines has great potential for analysis. It can be used not only for chatter detection, but also for other tasks such as detection of anomalies in production processes, predictive maintenance, detection of security incidents, etc. One of the main topics of OT is cyber security and protection of cyber physical systems, or OT security. Designing intrusion detection and prevention systems is one of many measures to protect the OT environment. There are many different approaches to such systems, one of which is a data-driven approach to anomaly detection. This can again be phrased as a time series classification task; applying TDA methods to this task is a natural fit, but only one work using TDA was found that uses UMAP for dimension reduction. More is expected here, including the application of PH for anomaly detection in the future. Finally, we hope that a regular survey will be conducted in the next few years on the same topic as this work. This will allow us to observe the development of TDA applications in this domain and gain insight into the development of the field. Given the nature of the methods used in this work, it is highly replicable and open to other researchers to extend this work.

Conclusion

This study presented an overview of the current literature on TDA in industrial production and manufacturing processes; showed the interrelationships between TDA methods and the areas of industrial production and manufacturing processes; and provided an overview of the current state of the art on the application of TDA in industrial production and manufacturing processes. This study contributes to the current literature by providing a comprehensive overview of the current state of the art regarding the application of TDA in industrial production and manufacturing processes, showing the areas of TDA application and methods used, and highlighting underutilized combinations of TDA application areas and methods. We ensured the reproducibility of this study by employing a transparent and rigorous method for searching and identifying literature. This method revealed 27 relevant references. These references were manually classified and the results assigned to one of three categories: product-level quality control, process-level quality control, or production engineering. For all the work, a brief description of the data format used and the TDA methodology employed in the particular use case is given. We show that TDA is a particularly suitable method for analyzing complex data sets from sensors and other devices in the domain of industrial production and manufacturing processes. Furthermore, we show that the application of TDA in this area is still in its infancy and that there is great potential for future research.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO