# Self-encoding Shapelets Applied To Time Series Clustering

*3 main points*✔️ Propose AUTOSHAPE, an autoencoder-based shapelet approach to unsupervised search for discriminative shapelets for time series clustering and learning time series subrepresentations

✔️Proposefour objectives: self-supervised loss for potential representations, universality and diversity loss for both universality and heterogeneity, reconstruction loss to preserve the shape, and a DBI objective to jointly learn the final shapelet for clustering that improves the clustering performance

✔️ Verification that AUTOSHAPE is significantly more competitive in terms of accuracy compared to state-of-the-art methods, and evaluation of its interpretability

AUTOSHAPE: An Autoencoder-Shapelet Approach for Time Series Clustering

written by Guozhong Li, Byron Choi, Jianliang Xu, Sourav S Bhowmick, Daphne Ngar-yin Mah, Grace Lai-Hung Wong

(Submitted on 6 Aug 2022 (v1), last revised 18 Aug 2022 (this version, v2))

Comments: Published on arxiv.

Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

## outline

Shapelets of time series have recently been found to be an effective discriminative subarray for time series clustering (TSC). It means that shapelets are useful for interpreting clusters. Currently, the main challenge in TSC is to discover high-quality variable-length shapelets for discriminating different clusters. In this paper, we propose a novel autoencoder-shapelet approach (AUTOSHAPE), which is the first work to exploit the advantages of both autoencoders and shapelets for unsupervised shapelet determination. This autoencoder is specifically designed for learning high-quality shapelets. Specifically, it has the following three features

To guide latent representation learning, we employ state-of-the-art self-supervised loss to learn unified embeddings for variable-length candidate shapelets (subsequences of a time series) of different variables and propose diversity loss to select discriminative embeddings in the unified space.

- For clustering, we introduce a reconstruction loss to recover the shapelets in the original time series space.

Davies-Bouldin index (DBI) is used to inform AUTOSHAPE about the clustering performance during training.

And we experiment extensively with AUTOSHAPE. The clustering performance evaluation for univariate time series (UTS) compares AUTOSHAPE with 15 representative methods on the UCR archive dataset. For the performance evaluation of multivariate time series (MTS), we evaluate AUTOSHAPE against five competitive methods on 30 UEA archive datasets. The results validate that AUTOSHAPE outperforms all the methods compared. We also interpret the clusters using shapelets and explain that we obtained interesting intuitions about the clusters in two UTS cases and one MTS case, respectively.

(Article author: There are other peculiar patterns of anomalies in time series data besides offsets, drifts, spikes, and oscillations. A similar concept is traditionally used in quality control with control charts using the Western Electric rule, but shapelets can be seen as an extension of this idea, and AUTOSHAPE, introduced in this paper, has the advantage of handling variable-length shapelets. The AUTOSHAPE introduced in this paper can handle variable-length shapelets, which may be an advantage when applied in the field. (The purpose is not anomaly detection but clustering.)

## first of all

TIME series clustering (TSC) has numerous applications in both academia and industry and therefore many research approaches have been proposed for solving TSC The classical approaches to solving TSC problems can be categorized as all-time series based, feature-based, and model-based (i.e., feature-based and model-based approaches). These approaches involve the raw time series, feature extraction, model parameter transformation, and applying K-means, DBSCAN, or other clustering algorithms. The recent trend in TSC is to find some local patterns or features in the raw time series data. Among these approaches, shapelet-based methods have repeatedly demonstrated excellent performance in TSC; Fig. 1 shows an example of a shapelet _{S1} discovered by AUTOSHAPE from one UCR UTS dataset, namely ItalyPowerDemand.

Although shapelets were initially proposed for time series classification problems, representative studies on shapelets show that they are sub-sentences of an identifiable time series and provide interpretable results. Interpretability from shapelets has also been assessed by cognitive measures about humans. Unsupervised shapelets (u-shapelets) were first learned from unlabelled time series data for clustering time series. A scalable u-shapelet method is also proposed to improve the efficiency of shapelet learning in TSC. Furthermore, to improve the clustering quality, Zhang et al. introduced an unsupervised shapelet learning model called USSL. STCN is also proposed to optimize feature extraction and self-supervised clustering simultaneously.

Recently, autoencoder-based methods (e.g., DEC, IDEC) have been applied to the clustering problem with effective results. They optimize for clustering purposes to learn a mapping from raw data space to a lower dimensional space. However, they were developed for text and image clustering, not for time series. Second, several methods based on autoencoders (e.g., DTC and DTCR ) have been proposed for time series. These use autoencoder networks to learn a general representation of the entire time series instance under several different objectives. The learned encoder network is naturally used to embed the raw time series into a new representation, which is then used to replace the raw data for final clustering (e.g. K-means). However, these methods focus on the whole time series instance, ignoring the importance of local features (time series parts) and missing the reason for clustering, namely interpretability.

Unlike the above autoencoder-based work, we learn a unified representation for a subsequence of variable-length time series of different variables (i.e., candidate shapelets). After all candidate shapelets are embedded in the same latent space, it is easy to measure the similarity between the candidates and further determine the shapelets for clustering. Importantly, due to the autoencoder approach, shapelets are no longer restricted to sub-sequences of real-time series, extending the scope of shapelet discovery from raw data.

In this paper, we propose a novel autoencoder-based shapelet approach to the TSC problem, called AUTOSHAPE. To the best of our knowledge, this is the first work that exploits the advantages of both shapelet-based and autoencoder-based methods for TSC. Fig. 2 shows an overview of AUTOSHAPE.

Four objectives are specifically designed to learn the final shapelets for clustering.

(1) Self-supervised loss learns a general unified embedding (candidate shapelet) of time series sub-sentences. Specifically, we employ clusters triplet losses, which are effective for representing time series.

2) After clustering all the embeddings, we propose a diversity loss that learns the top k candidates. The learned candidates are of high quality with two characteristics: they are closest to the centroid of the clusters, resulting in large clusters, and the clusters are far from each other.

3) Decode the selected embedding by reconstruction loss to obtain a decoded shapelet. Such a shapelet preserves the sub-sequence shape of the original time series for human interpretation. The original time series is then transformed concerning the decoded shape lets. The new transformed representation is then passed to build a clustering model (e.g., K-means).

4) After achieving the clustering results, the Davies-Bouldin index (DBI) is calculated to adjust the shapelets.

An autoencoder network is applied to jointly learn a partial representation of the time series to select a high-quality shapelet for the transformation. The shapelets for the transformation are not necessarily restricted to the raw time series but are decoded by the autoencoder. At the same time, the reconstruction loss in AUTOSHAPE preserves the shape of the raw time series subsequence in the final shapelet, rather than learning a subsequence that is very different from the original time series.

Comprehensive experiments have been conducted on both short-variate time series (UCR archive) and multivariate time series (UEA archive). The results show that AUTOSHAPE performs best among the 15 and 6 representative methods compared for univariate time series (UTS) and multivariate time series (MTS), respectively, in terms of the malignantization mutual information content (NMI) and land index (RI).AUTOSHAPE performs best on 36 UTS datasets and best performance in 15 out of 15 and 24 out of 30 MTS datasets, respectively. In addition, ablation studies have validated the effectiveness of the self-supervised loss, diversity loss, and DBI objectives; three examples from the UCR archive (human motion perception, power demand, and images) and one example from the UEA archive (human activity perception EEG) are presented and learned intuition of the shapelets is described.

The main contributions of this paper can be summarized as follows

- To discover discriminative shapelets for TSC in an unsupervised manner, we propose AUTOSHAPE, an autoencoder-based shapelet approach that jointly learns time series partial representations.

- Four objectives, namely self-supervised loss for potential representations, diversity loss for both universality and heterogeneity, reconstruction loss to preserve the shape, and a DBI objective to improve the final clustering performance, are specifically designed to learn the final shapelet for clustering The following is a brief overview of the results of our work.

- Extensive experiments on the UCR (UTS) and UEA (MTS) datasets at TSC have verified that our AUTOSHAPE is significantly competitive in terms of accuracy compared to state-of-the-art methods.

- Although the learned shapelets may not be actual subsequences from the raw time series data, their interpretability is shown in four case studies on the UTS and MTS datasets.

## related technology

We describe the shapelet-based and autoencoder-based methods used in our method.

### Shapelet-based methods

Shapelet methods were introduced in "Time series shapelets: a new primitive for data mining", with emphasis on their interpretability, followed by studies on logical shapelets, shapelet transformations, learning shapelets, matrix profiles, and efficient learning Research on shapelets has been proposed mainly for time series classification. Unsupervised shapelet methods (commonly known as u-shapelets) were also proposed for clustering time series in "Clustering time series using unsupervised-shapelets." The scalable u-shapelet method, a hash-based algorithm for efficiently discovering u-shapelets, was also introduced by Ulanova et al.

k-Shape relies on a scalable iterative refinement method to generate homogeneous and well-separated clusters. k-Shape employs a normalized cross-correlation measure to compute the distance between two-time series. Zhang et al. proposed an unsupervised partial learning (USSL) model for TSC model was proposed. It incorporates shapelet learning, shapelet regularization, spectral analysis, and pseudo-labeling Analysis pseudo-labeling USSL is similar to the time series shapelet learning method LTS for classification.

Self-supervised time series clustering networks (STCNs) optimize feature extraction with one-step time series prediction using RNNs to capture the temporal dynamics of time series and preserve the local structure of time series.

Since unsupervised shapelets are discovered without label information, they can be used not only for the classification of time series but also for clustering time series. Li et al. proposed the ShapeNet framework to discover shapelets for multivariate time series classification. In contrast, this paper is the first work to investigate how to discover shapelets for clustering for both univariate and multivariate time series.

### Autoencoder-based methods

Deep embedding clustering (DEC) is a popular method for simultaneously learning feature representations and cluster assignments for many data-driven application domains using deep neural networks. After learning a low-dimensional feature space, the clustering objective is iteratively optimized.

Guo et al. found that defined clustering loss can corrupt the feature space, resulting in meaningless representations. Their proposed algorithm, Improved Deep Embedded Clustering (IDEC), can preserve the structure of the data-generating distribution with imperfect autoencoders.

Deep temporal clustering (DTC) naturally integrates an autoencoder network for dimensionality reduction and a novel temporal clustering layer for clustering new material representations into a single end-to-end learning framework without labels. DTCR proposes a seq2seq autoencoder representation learning model, integrating a reconstruction task (for autoencoders), a K-means task (for hidden representations), and a classification task (to enhance encoder capabilities).

After training the autoencoder, classical methods (e.g., Kmeans) are applied to the hidden representation. As described later in this paper, we specifically design the loss function of the autoencoder to determine the shapelets for time series clustering.

## AutoEncoder for Shapellet (AUTOSHAPE)

Here, we propose an autoencoder-based shapelet approach called AUTOSHAPE, which, as its name suggests, employs an autoencoder network for shapelet search. of candidate shapelets, while preserving the partial shape of the original time series, allowing for an intuitive understanding of the clusters. Specifically, AUTOSHAPE uses an autoencoder network to learn a general unified embedding of time series sub-sentences (shapelet candidates) with the following four objectives

1. self-supervised loss

2. diversity loss

3. reconstructive loss

4. Davies-Bouldin Index (DBI) objective

In this approach, we use all four objectives to jointly learn the shapelets without labels; Table I summarizes the notations used and their meanings.

### shapelet search

Here, we present in detail self-supervised loss for latent representations, diversity loss for both universality and heterogeneity, and reconstruction loss for learning autoencoders.

1) Self-supervised loss

The goal is to learn a unified embedding of variable-length shapelet candidates of different variables. As a self-supervised loss, we employ clusters triplet loss, which is adequate for representing time series subsets, to learn the embedding unsupervised. The clusters triplet loss function is defined as the sum of (i) the distance between the anchor and multiple positive samples ( _{DAP} ), (ii) the distance between the anchor and multiple negative samples ( _{DAN} ), and (iii) the intra distance _{Dintra} for each positive and negative. Clusters triplet loss is restated.

(See original paper for an explanation of symbols)

Distance between positive (negative) samples is also included. It is included and must be small (large). The maximum distance between all positive (negative) samples is given by Equation 2 (Equation 3).

In-sample loss is defined as

The encoder network maps from the original time series space to the hidden space. The embedding function is

is. This function is trained using self-supervised loss. It can be parameterized by any neural network architecture with the only requirement that it follows a causal ordering (i.e., that future values do not affect current values). Here we implement the encoder network using a Temporal Convolutional Network (TCN). We also implement a vanilla RNN, which is a recurrent network, for the autoencoder. In the following experiments, we use TCN as the default network.

2) Diversity loss

We propose original diversity loss for autoencoders to discover high-quality diverse shapelets.

Following the protocols of prior work USSL and DTCR for selecting diverse shapelets for shapelet transformation, we select diverse shapelets. In the new representation space, we cluster the candidate shapelets. After clustering, several clusters of representations are generated. The candidate closest to the centroid of each cluster is selected. We propose a diversity loss that considers both (i) the size of each cluster and (ii) the distance between the selected candidates for identification.

(See original paper for an explanation of symbols)

Diversity loss is designed to select shapelets with two features. The cluster size of the representation determines the universality of the candidates, while the distance indicates the heterogeneity of the clusters.

3) Reconstruction loss

Next, we introduce a decoder network guided by MSE (Mean Square Error) as the reconstruction loss.

(See original paper for an explanation of symbols)

ANALYSIS: Traditional triplet loss only considers one positive and one negative anchor, does not fully use contextual insights of neighborhood structure, and triplet terms do not always match. To learn general embeddings of the input data, we propose self-monitoring loss, which penalizes for considering many positive and many negative anchors. Furthermore, for diversity loss, we propose to select high-quality shapelets for the shapelet transformation by considering two aspects: size, which represents universality, and distance, which represents diversity. Reconstruction loss supports the interpretability of the final shapelet.

### Shapelet Adjustment

After the shapelet search, the original time series is transformed into transformed representations using the shapelets. Where each representation is a vector and each element is the Euclidean distance between the original time series and one of the shapelets.

Intuitively, it calculates the distance (i.e., the best match position) between the shorter sequence Tp and the most similar subsequence of Tq.

1) DBI Losses

We apply classical clustering methods (e.g., K-means) to the transformed representation and then propose DBI goals to inform some adjustments of the shapelets.

DBI was chosen because it does not require ground truth for measurements and is consistent with unsupervised learning in AUTOSHAPE.

To compute the derivative of the loss function, all of the functions involved in the model must be differentiable. However, the maximum function in Equation 8 is not continuous and differentiable. We, therefore, introduce a differentiable approximation of the maximum function. For organizational clarity, we simplify here as follows.

### global loss function

Finally, the overall loss _{LAS} for AUTOSHAPE is defined as

where λ is the regularization parameter.

By minimizing the overall loss (Eq. 10), the shapelets for the transformation are jointly learned (see Fig. 2). After generating candidate shapelets, (i)_{LTriplet}learns potential representations so that the candidate shapelets capture their properties. ② _{LDiversity} selects candidates with both universality and heterogeneity. (iii)_{LReconstruction}reconstructs the latent representation and preserves the shape of the candidate. It then applies a clustering algorithm (e.g., K-means) to the representation transformed by the selected shapelet candidates. ④_{LDBI} is computed from the clustering results to adjust the shapelets to improve the final clustering performance.

All loss functions model the encoder network, while the reconstruction and DBI losses only construct the decoder network.

For details on the algorithm, see the original paper. It consists of blocks of shapelet search, shapelet variants, and complexity analysis.

## experiment

First, we present comprehensive experiments performed with AUTOSHAPE and 15 related methods on the UCR (univariate) dataset. Next, we report the results of comparing AUTOSHAPE with five related methods in particular on the UEA (multivariate) dataset; the methods compared with AUTOSHAPE are the same for STCN, DTCR, and USSL.

### Experimental Setup

All experiments were run on a machine with two Xeon E5-2630v3 @ 2.4GHz (2S/8C) / 128GB RAM / 64GB SWAP and two NVIDIA Tesla K80 on CentOS 7.3 (64-bit).

The important parameters used in this experiment, i.e. batch size, number of channels, kernel size of a convolutional network, and network depth are set to 10, 40, 3, and 10 respectively. The learning rate was fixed to a small value η = 0.001 and the number of epochs of network training was set to 400. The number of shapelets is chosen from {1, 2, 5, 10, 20}. The length of the sliding window (i.e., the length of the candidate shapelets) is tried in various ranges of {0.1, 0.2, 0.3, 0.4, 0.5}. Each number means a percentage of the length of the original time series (e.g., 0.1 means 10% of the length of the original time series). The number and length of shapelets conformed to LTS, ShapeNet, and USSL.

### comparative approach

Here we compare 15 typical TSC methods and provide brief information about each method below.

K-means: K-means on the whole original time series.

UDFS: Unsupervised discriminant feature selection method (l2,1-norm regularization).

NDFS: Non-negative Disc ruminative Feature Selection by Non-negative Spectral Analysis.

RUFS: Robust unsupervised discriminative feature selection using orthogonal non-negative matrix factorization.

RSFS: Robust spectral learning and sparse graph embedding for unsupervised feature extraction.

KSC: the spectral norm for pairwise scaling distance and centroid calculation for K-means.

KDBA: Dynamic Time Warping Weighted Average Method for K-means Clustering

k-Shape: A scalable iterative refinement procedure that searches for shapes under a normalized cross-correlation measure.

U-shapelet: discover unlabeled shapelets. Discover unlabeled shapelets for time series clustering.

USSL: Learning salient subsequences from unlabeled time series using shapelet regularization, spectral analysis, and pseudo-labels.

DTC: Autoencoder for time series data. An autoencoder for dimensionality reduction of time series and a new time series clustering layer.

DEC: A method for learning feature representation and clustering simultaneously. A method for simultaneously learning feature representation and cluster assignment.A method for simultaneously learning feature representation and cluster assignment using deep neural networks.

IDEC: Scattering data points by manipulating the feature space with autoencoders using clustering loss as guidance.

DTCR: Learning cluster-specific hidden time representations using temporal reconstruction, K-means, and classification.

STCN: A self-supervised time series clustering framework that jointly optimizes feature extraction and time series clustering.

### Experiments on univariate time series

We follow the protocols used in previous studies such as k-Shape, USSL, DTCR, and STCN.

Thirty-six datasets from the UCR archive, a well-known benchmark for time series datasets, were tested. More information about the datasets can be obtained from the UCR Time Series Classification Archive.

We use the normalized mutual information content (NMI) as a metric to evaluate the method, as the results of the rand index (RI) show a similar trend, and we provide background on the RI in the Supplementary Material (see the original paper ). results are averages over 10 runs, and the standard deviation for all datasets is less than 0.005.

1) NMI for univariate time series

All baseline NMI results are taken from the original paper; overall NMI results for the 36 UCR data sets are presented in Table II.

From Table II, we can see that the overall performance of AUTOSHAPE ranks first among the 15 methods compared. Furthermore, AUTOSHAPE performs the best on 10 datasets, much better than the other methods except STCN. the 1-to-1 Wins NMI number of AUTOSHAPE is at least 1.6 times larger than the 1-to-1 Losses of USSL, DTCR and STCN AUTOSHAPE achieve higher NMI counts on some datasets, such as BirdChicken and ToeSegmentation1, while the results on the 1-to-1-Losses dataset are at least 1.6 times larger than USSL (e.g., Ham, Lighting2) and DTCR (e.g., Car, ECGFiveDays), with slightly lower results for the 1-to-1-Losses dataset.

2) Friedman and Wilcoxon tests

For all methods, we run the Friedman and Wilcoxon signed rank tests with Holm's alpha (5%). The Friedman test is a nonparametric statistical test to detect differences in 36 data sets across 15 methods. Our statistical significance is p < 0.001, which is smaller than α = 0.05. Therefore, we reject the null hypothesis and find that all 15 methods are significantly different.

We then perform a post hoc analysis between all methods. The results are visualized by the critical difference diagram in Fig. 3.

The thick horizontal line groups the set of methods that are not significantly different; note that AUTOSHAPE outperforms all other methods except STCN, DTCR, and USSL. However, a comparison with STCN and DTCR shows that instead of a black box, AUTOSHAPE provides a shapelet, which is an identifiable subsequence for clustering; AUTOSHAPE's reconstruction loss does not learn subsequences that are not in the original time series, but rather maintains the details of the final shapelet and making it easier to interpret.

3) Change in the number of shapelets

We compared the impact of a different number of shapelets on the final NMI of AUTOSHAPE in four datasets, BirdChicken, Coffee, SwedishLeaf, and ToeSegmentation1.Fig. 4 shows the NMI of AUTOSHAPE with a varying number of shapelets. NMI for a different number of shapelets.

Four different datasets show different trends, leading to the appropriate number of shapelets for the datasets. For example, in BirdChicken, the NMI is stable when the number of shapelets is varied. For example, for BirdChicken, the NMI is stable when the number of shapelets is changed. For SwedishLeaf, the NMI increases rapidly when the number of shapelets is increased from 1 to 20, and then stabilizes. Therefore, its shape let number is set to 20.

4) Carve-out analysis

To test the effects of LT riplet, LDiversity, and LDBI, we conducted a series of ablation experiments using AUTOSHAPE. we compare AUTOSHAPE and its three ablation models:self-supervised no loss (w /o triplet), diversity loss-free (w/o diversity), and DBI loss-free (w/o DBI).

From Table III, we can see that all three factors make important contributions to the final clustering performance. In particular, the general unified representation learning (self-supervised loss) plays an important role since the NMI results for w/o triplet are always worse than the other two losses. We also find that the choice of candidate shapelets (diversity loss) and the DBI objective clearly and consistently improve the final performance.

5) Comparison with other methods for RI

All baseline RI results were taken from the original paper; the overall RI results for the 36 UCR datasets are shown in Table IV.

From Table IV, we can see that the overall performance of AUTOSHAPE ranks first among the 15 methods compared. Furthermore, AUTOSHAPE performs best on 9 datasets. This is higher than all other methods except STCN; AUTOSHAPE's 1-to-1 win RI number is larger than the 1-to-1 losses of all other methods; AUTOSHAPE's total maximum RI number is larger than USSL and DTCR except STCN and larger than other methods.AUTOSHAPE achieves higher RI counts on some datasets, such as BirdChicken and ToeSegmentation1, while results on the 1:1 loss dataset are only slightly lower than USSL (e.g., Meat, SonyAIBORobotSurface) and DTCR (e.g., Lighting 2, wine) are only slightly lower than the USSL.

6) Network comparison

We compare the performance of the Temporal Convolutional Network (TCN) and recurrent networks (e.g., vanilla RNN ) for the autoencoder. We compare the performance of TCN and vanilla RNN for the final NMI (Fig. 5(a)) and RI (Fig. 5(b)). Finally, we see that the difference in NMI and RI between TCN and vanilla RNN is negligible for most datasets. Statistical tests provide no evidence that either network is better than the other. The only requirement is that the networks follow causal ordering (i.e., that future values do not affect current values).

7) Experiments on the interpretability of UTS

We have further explored the strength of the shapelet method, namely interpretability, by reporting shapelets (k =1, 2) generated by AUTOSHAPE from two datasets. These datasets were chosen simply because they can be presented without much domain knowledge; from Fig. 6 and Fig. 7, we observe that some subsequences of the original time series of the clusters are similar to their shapelets.

Case Study 1: ToeSegmentation1

The ToeSegmentation1 dataset is the left toe of human gait recognition z-axis values from the CMU Graphics Lab Motion Capture Database (CMU). This dataset consists of two categories, "normal gait" and "abnormal gait," where "normal gait" includes gait with limp or leg pain. In this category of abnormal gait, actors are made to appear to have difficulty walking normally.

From Fig. 6, it is easy to see that the top two shapelets S1 and S2 appear more frequently in the normal gait class; S1 represents one unit of normal gait and S2 represents an interval of two consecutive units of gait.

Case Study 2: ItalyPowerDemand

ItalyPowerDemand was derived from a 12-month electricity consumption time series for Italy in 1997. There are two classes in the dataset, summer from April to September and winter from October to March; S1 was trained with AUTOSHAPE, as shown on the left side of Fig. 7. From the trained shapelets we can see that the electricity demand from 5 am to 11 pm in summer is lower than that in winter. This is because, at the time when the data was collected, there was still less heating in the morning in winter and less cooling in the summer in Italy.

### Experiments on multivariate time series

Next, we present the main results obtained from the experiments conducted on the MTS, adopting the NMI as the metric for evaluating the method on the MTS. The results for the Land Index (RI) are omitted as they show a similar trend. Kmeans, GMM, k-Shape, USSL, and DTCR are selected as the methods compared. The results for all six methods are averages over 10 runs, and the standard deviations for all datasets are less than 0.01. The standard deviations of all datasets are less than 0.01.

1) NMI for multivariate time series. The overall NMI results for the 30 UEA MTS datasets are shown in Table V.

From Table V, we can see that the overall performance of AUTOSHAPE is ranked first among the six methods compared.

Moreover, AUTOSHAPE performs best on 22 MTS datasets, which is significantly more than the other five methods. This result shows that AUTOSHAPE can learn high-quality shapelets from different variables.

(2) Friedman test and Wilcoxon test

We perform Wilcoxon-signed rank tests using the Friedman and Holm's alpha (5%); the Friedman test detects differences in the 30 UEA data sets across the six methods. Our statistical significance is p < 0.001, which is smaller than α = 0.05. Therefore, we reject the null hypothesis and find that there are significant differences across all six methods.

We then perform a post hoc analysis between all the methods compared. The results are visualized by the critical difference diagram in Fig. 8, where we see that AUTOSHAPE significantly outperforms the other five methods.

(3) Change in the number of shapelets

We further compared the impact of a different number of shapelets on AUTOSHAPE's final NMI for the four MTS datasets BasicMotions, Epilepsy, SelfRegulationSCP1, and StandWalkJump.

Fig. 9 shows the NMI for six different numbers of shapelets.

The four datasets show different trends, indicating the choice of the appropriate number of shapelets for the dataset. For example, in Epilepsy, when the number of shapelets increases from 1 to 2, the NMI increases rapidly and then stabilizes. Therefore, the number of shapelets is set to 2.

4) Experiments on the interpretability of MTS

Finally, we examine how the learned shapelets are interpreted on the MTS; Fig. 10 shows a shapelet (e.g., k=2) generated by AUTOSHAPE from the Epilepsy dataset. Again, we chose this dataset simply because it can be illustrated without much domain knowledge; the Epilepsy dataset was generated by simulating a class activity performed by a healthy participant. This dataset consists of four categories: walking, running, sawing, and seizure imitation.

## summary

In this paper, we propose a novel autoencoder-based shapelet approach for time series clustering, called AUTOSHAPE. We proposed an autoencoder network that learns a unified embedding of shapelet candidates through the following objectives.

Self-supervised loss is used to learn general embeddings of time-series sub-sentences (candidate shapelets). To select diverse candidates, we propose a diversity loss between candidate shapelets. Reconstruction loss preserves the shape of the original time series to improve interpretability DBI is an internal metric to guide network learning to improve clustering performance On the UTS and MTS datasets, our AUTOSHAPE outperforms 14 other methods and 5 on the UTS and MTS datasets, with extensive experiments showing that our AUTOSHAPE outperforms the other 14 and 5 methods, respectively. We also demonstrate the interpretability of the learned shapelets with three case studies on the UCR UTS dataset and one case study on the UEA MTS dataset. In future work, they plan to study the efficiency and missing values of shapelet-based methods for TSCs.

Categories related to this article