# Multiscale Feature Value Extraction Without Domain Knowledge For Machine Lifetime Prediction.

3 main points
✔️ Improved prediction model for remaining service life using rotating bearings as an example
✔️ Propose a sorting/prediction model that does not require domain knowledge or manual configuration by combining the U-Net structure, which models multiple levels of detail, with GAN
✔️ Further research is needed to realize the unsupervised model needed in the field

written by Sungho SuhPaul LukowiczYong Oh Lee
(Submitted on 26 Sep 2021)

Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

## first of all

Looking at examples of AI applications in the manufacturing industry at exhibitions, it seems that following images and documents (natural language), there are many systems for detecting abnormalities in the deterioration of mechanical parts. In particular, there are a lot of applications that challenge vibration analysis. This paper is about the prediction of wear and deterioration of rotating bearings. I thought that a similar method could be applied to pumps and valves.

(A different approach to the subject area of this paper was presented by ROHM at AEC/APC Asia 2021. Interested parties may wish to refer to it)

PHM (Prognostics and Health Management) technology collects condition information from industrial systems such as manufacturing machines, facilities, and power plants, and predicts failure locations through analysis and predictive verification, thereby detecting system failures and enabling advance maintenance scheduling. The prediction of the remaining useful life (RUL) of rolling bearings, which is one of the PHM technologies, can prevent unexpected failures and improve reliability. Model-based methods estimate the RUL by means of analytical models based on physical laws or mathematical functions. These methods include physical law-based methods, empirical methods, Kalman filter, particle filter, etc. However, these methods require the knowledge of experts to build accurate models in increasingly complex industrial systems. In recent years, with the significant development of machine learning, data-driven methods that do not require expert knowledge have been attracting attention. Data-driven methods use machine learning techniques to capture the direct relationship between the collected machine data and the degradation state.

In recent years, deep learning-based RUL prediction methods have been proposed and have achieved better prediction performance than traditional data-driven methods. Although these deep learning-based RUL prediction methods have been successfully developed, they have not given much importance to three challenging problems.

(1) These methods require domain knowledge to extract features or require manual specification of feature types.

(2) The aforementioned data-driven methods assume that the training and test data are collected by the same sensor under the same working conditions or from the same distribution. However, in industry, such an assumption is not realistic because the working conditions of machines often vary from task to task and the training and test data may be collected from different entities.

(3) For accurate RUL prediction, it is essential to properly determine the health stage (HS) of the machine. This is because there is no noticeable difference in the run-to-failure training dataset for a machine in a healthy state. However, conventional methods predict the RUL without determining the First Predicting Time (FPT), which is the start time of the unhealthy stage.

In this paper, we propose a generalized multi-scale feature extraction method for RUL prediction, which uses Generative Adversarial Network (GAN) to learn the distributions of multiple training data from different bearings and extract domain-invariant generalized prognostic features.

The proposed feature extraction method for FPT and RUL prediction consists of two steps.

In the first step, a multiscale adversarial neural network is trained to reconstruct the oscillatory input signal into generalized prognostic features. Here, three different levels of the 1D U-Net architecture are trained to minimize the loss function of the proposed GAN-based feature extraction method.

In the second step, the generalized features are transformed into nested scatterplot (NSP) images for FPT determination and RUL prediction.NSP is an imaging method for multivariate correlation analysis.Although NSP is a heuristic method, NSP images transformed from raw vibration signals can NSP is a heuristic method, but NSP images transformed from raw vibration signals reduce the effort of feature engineering based on domain knowledge. NSP images can also be combined with convolutional neural networks (CNNs) for feature extraction in rotating machinery fault diagnosis. a CNN-based binary regression model determines the FPT and a CNN-LSTM (long short term memory) model predicts the RUL the RUL. The main contributions of this research can be summarized as follows.

- A novel multiscale feature extraction method designed for HS segmentation and RUL prediction, we formulate 1D feature extraction as a principal signal separation task and introduce the use of U-Net to reconstruct prognostic features for RUL prediction. We also introduce a new domain-invariant generalized solution method based on the GAN scheme to learn invariant representations and predict RUL.

- Converting multiscale prognostic features to NSP images and combining generalized multiscale features reduces computational cost without requiring domain knowledge or manual configuration.

- We also proposed a method for determining HS without a threshold using a CNN-based binary regression model and a method for predicting RUL using a CNN-LSTM model. These methods can predict RUL with less error and higher prognostic accuracy than other existing methods.

- In order to validate the proposed method, we conducted experiments using two rotating machine datasets: the Fanche-Comte Electronics Mechanics Thermal Science and Optics-Sciences and Technologies Institute (FEMTO) dataset and the Xi'an Jiaotong University and the Changxing Sumyoung Technology Company (XJTU-SY) dataset. By conducting experiments on multiple datasets, the effectiveness of the proposed method for different patterns of bearing wear can be verified.

## technique

### Generalized Multiscale Feature Extraction

For HS segmentation and RUL prediction, we use a generalized multiscale feature extraction method based on the GAN scheme and UNet architecture. Figure 1 shows the overall simplified framework of the training procedure for the proposed image extraction model.

In the GAN structure, we use Classification Enhanced GAN (CEGAN) from previous work by the authors of the paper. CEGAN consists of three independent networks: discriminator, generator, and classifier. Unlike conventional GAN methods, the CEGAN classifier is trained on both real and generated data, which prevents the generated minority data from overfitting the majority data in imbalanced data. In other words, a classifier trained only on real data will be biased towards majority data, while the generated data of minority class will be prevented from being biased towards majority data, thus improving the performance of the classifier. Conventional GAN methods use an auxiliary classifier, and the classifier shares the network structure and weight parameters with the discriminator, so the performance of the auxiliary classifier does not lead to the generation of high-quality images.Applying the structure of CEGAN, the proposed GAN method defines three types of independent networks: 1. The proposed GAN scheme defines three independent networks: 1) a multiscale feature extraction network, 2) a discriminator that separates the features generated by the generator from the actual input data and distinguishes the data in different domains, and 3) a CNN-based HS segmenter and an LSTM-based RUL predictor as the classifier network.

Generator

The generator consists of three multi-scale generators, as shown in Figure 1. The proposed generator adopts the basic concept of U-Net to reconstruct one dimensional (1D) time series features by replacing the original 2D convolution with 1D operations.U-Net was devised for image segmentation to improve the segmentation performance. U-Net is designed for image segmentation and combines low-level detail information and high-level semantic information by concatenating feature maps of different levels to improve the segmentation performance. Combining low-level details with high-level semantic information, promising performance has been achieved in various image segmentation problems.The CNN-based U-Net architecture consists of an encoder, a decoder, and a skip connection between the encoder and the decoder. The encoder consists of a multi-level convolutional operation to extract an abstract representation of the input data and a downsampling block to reduce the complexity of the input. The convolutional operations are followed by max-pooling downsampling to encode the input image into multiple different levels of feature representation. The decoder block also consists of multiple levels of convolutional operations, upsampling blocks, and concatenation blocks. The decoder block semantically projects the discriminative features learned by the encoder to a higher resolution image space for dense classification. Unlike the general use of U-Net for image segmentation tasks, we found that the structure of U-Net allows for simultaneous consideration of deep and shallow features, which improves the effectiveness of feature extraction. Therefore, the features can be reconstructed to meet the design requirements. In a typical GAN, Gaussian noise is input to the generator, but here we only provide noise in the form of dropouts that are applied to multiple layers of the generator. The architecture of the generator is shown in Figure 2. The generator consists of an encoder and a decoder which process the input data.

Even though U-Net can consider both deep and shallow features at the same time, UNet with deep-level convolutional operations usually focuses on more local features, and the detailed information of the input may be lost at high feature levels. Therefore, the performance of feature extraction can be improved by integrating different feature levels. Hence, the three multiscale generators have different number of encoder and decoder blocks.

Descriptor (Discriminator)

The proposed GAN is developed for domain adaptation from multiple sources of information. In general, GANs train a discriminator to distinguish between the generated data and the real data. However, the discriminator here also functions as a domain discriminator to distinguish the domain of the data. This is because the proposed framework of generalized multi-scale feature extraction aims to extract the features of wear of rotating bearings from multiple domains. In this paper, the data domain refers to the type of bearings and the conditions under which they have been collected. Adversarial learning allows the generator to extract high-level features that contain domain-unbiased information, making it difficult for the discriminator to classify not only the source domain, but also whether it is real or fake. Therefore, the convolutional layer of the discriminator extracts the input features (reconstructed features of real and fake data from the generator) with Leaky ReLU and outputs the true frequencies and domain classification, respectively, in the two separated linear layers (see Figure 3).

Classifier

As mentioned earlier, for accurate RUL prediction, we need to determine the HS of the machine, but not to train a direct regression model for RUL prediction, because there is no noticeable difference in the run-to-fail bearing wear for machines in a healthy state.To improve the performance of HS partitioning and RUL prediction In order to improve the performance of HS segmentation and RUL prediction, the proposed GAN structure adopts a structure in which the classifier network is an independent network.In Figure 1, the first stage consists of a generalized multiscale feature extraction GHS for HS segmentation, a CNN from the extracted features and a transformed NSP images to perform the HS segmentation model CRUL. In the second stage, a generalized multiscale feature extraction model GRUL is trained for RUL prediction and a CNN-LSTM model CRUL predicts RUL using the transformed NSP images from the extracted features.

Loss function and learning procedure

In this study, we assume that the vibration signals are collected by two high-frequency vibration sensors placed horizontally and vertically on each bearing, and a large number of Ntrain run-to-fail vibration data over the whole life cycle from different bearings can be used to train the proposed DNN. $X_j = {x^i_j}^{n_j}_{i=1}∈R^{N_{samples}},j = 1,2,... ,N_{train}$ denote the nj consecutive training samples from the jth bearing. For each bearing data, there are two samples, one from the horizontal and one from the vertical vibration sensor.

The objective is to extract generalized predictive features to improve the performance of HS partitioning and RUL prediction.HS partitioning requires labeling of the target data for supervised learning in order to distinguish the HS features of the machine under different bearing degradation conditions. The training data set is divided into two health stages based on the acquisition time. The initial part of the entire vibration dataset, which corresponds to normal functioning, is labeled as healthy, and the last part of the sample period, i.e., the part with damaged bearings, is labeled as unhealthy. This is based on the assumption that degradation data for rotating machinery can be obtained until failure. This approach not only reduces the labelling effort, but also improves the prediction of the HS partitioning when the correct answer for HS is not available (most of the open datasets do not have information about HS). As a precondition, we label a small fraction of the run-to-failure dataset where the signal can be clearly distinguished into healthy and unhealthy if no health stage label is given.

Next, we express the RUL label as follows

where $yRUL^i_j$ denotes the RUL label of $x^i_j$. The generator network consists of two sub-generators GHS and GRUL which are decomposed into two sub-generators. These two sub-generators correspond to the discriminators DHS and DRUL with the same structure. For the stability of the learning procedure and the quality of the generated data, WGANGP is applied to the objective function to guide the learning process. The objective function is defined as follows.

where x is the sequential training sample, y is the HS label in the case of HS splitting, RUL label in the case of RUL prediction, d is the relevant label of the domain, λD is a hyperparameter that controls the effect of generalization of the domain, λG controls the relative importance of different loss terms, LCE is the standard cross-entropy loss function and θD, θG, and θC are parameters of the discriminator, generator, and classifier, respectively. Both real and generated data are input to the discriminator, and the actual state and domain classification of the data are the outputs of the discriminators DR and DD, respectively. Discriminator D is trained to minimize LD in order to identify data from different domains and real and generated data simultaneously, whereas generator G is trained to minimize LG. Moreover, the classifier C, which is a CNN-based HS divider CHS and LSTM-based RUL predictor CRUL, is trained to classify HS and predict RUL. In other words, the generator reconstructs generalized features to mislead the discriminator and improve the performance of HS segmentation and RUL prediction. By reconstructing the features close to the real data, the feature values can be generated in a certain range and the time series characteristics can be maintained.

Finally, we define the loss of the classifier for HS splitting and RUL prediction as follows

where LBCE is the standard binary cross-entropy loss, LMAE is the mean absolute error (MAE), LRMSE is the root mean square error (RMSE), and LMAPE is the mean absolute percentage error (MAPE), respectively.

The three different reconstructed features are transformed into three channels of the NSP image by the following two steps. In the first step, the features of the two reconstructed channels are compressed into nested clusters. The features of each channel are mapped to the x- and y-axes, and the three different features reconstructed from the three multiscale generators are colored red, green, and blue, respectively. To represent the intensity of each cluster from the reconstructed features, the count of the mapped values was converted to pixel intensities. In the second step, the three scatter plots are aggregated to produce a single RGB image as shown in Figure 1. The details of learning the proposed multiscale feature extraction method are summarized in Algorithm 1 (see original paper).

### Health stage partitioning and remaining useful life prediction by NSP

By reconstructing the continuous raw vibration time series data into multiscale features and transforming the features into NSP images, the signal processing problem is changed into an image classification problem, using CNN-HS (CNN-HS) for the HS segmentation model and CNN-LSTM (CNN-LSTM-RUL) for the RUL prediction model to classification. The structure of the proposed CNN-HS and CNN-LSTM-RUL is shown in Figure 4. The trained CNN-HS and CNN-LSTM are not used for training, but can classify NSPs on external test datasets that operate under the same conditions as other training datasets. in HS segmentation, the binary regression results for the entire run-to-fail are computed by CNN-HS and There is no need to specify a threshold value. The trained CNN-HS learns the difference between the NSP features of the healthy and unhealthy data in the training dataset, so it can recognize the degradation patterns of all the data. after determining the FPT using CNN-HS, extracting the multi-scale features, and transforming the features into NSP images CNN-LSTM can recognize the degradation pattern of all the data from FPT to end time, CNN in CNN-LSTM extracts the features from the transformed NSP image, LSTM and RUL prediction part calculate the percentage of RUL.

## experimental results

To evaluate the proposed method (denoted as GMFE) on a run-to-failure oscillatory dataset, we usedtwo popular datasets, FEMTO and XJTU-SY dataset. the FEMTO dataset was collected on the PRONOSTIA test rig and has been It has been publicly available since the IEEE PHM 2012 Prognostic Challenge (PHM 2012). The test rig contains an asynchronous motor, a shaft, a speed controller, an assembly of two pulleys, and a tested rotating bearing, as shown in Figure 5(a). The dataset consists of 17 run-to-failure data sets (two rows of vibration data, one horizontal and one vertical) where one bearing was tested. Since each data set has a different time to failure, a failure detection method that adapts to time-varying operating conditions and environments is required. When the amplitude of the vibration data exceeds 20 g, the run-to-fail experiment is aborted and the bearing is considered to be defective.

Three metrics, MAE, RMSE, and MAPE, are used for comparative evaluation.