Catch up on the latest AI articles

Analog And Multimodal Manufacturing Data Sets Acquired On The Future Factory Platform

Analog And Multimodal Manufacturing Data Sets Acquired On The Future Factory Platform


3 main points

✔️ In manufacturing settings, the frequency of anomalies is low and the problem of unbalanced data to create machine learning models is always present.
✔️ A dataset was created at the Future Factories Lab at the University of South Carolina for these applications.
Two types of data sets have been collected, an analog data set and a multimodal data set, the latter including image data from synchronized systems.

Analog and Multi-modal Manufacturing Datasets Acquired on the Future Factories Platform
written by Ramy HarikFadi El KalachJad SamahaDevon ClarkDrew SanderPhilip SamahaLiam BurnsIbrahim YousifVictor GadowTheodros TarekegneNitol Saha
[Submitted on 28 Jan 2024]
Comments: accepted by arXiv
  Machine Learning (stat.ML); Machine Learning (cs.LG); Image and Video Processing (eess.IV); Systems and Control (eess.SY)


The images used in this article are from the paper, the introductory slides, or were created based on them.


This paper presents two industrial datasets collected at the Future Factories Lab at the University of South Carolina on December 11 and 12, 2023. These datasets were generated by a manufacturing assembly line that utilizes industrial standards with respect to actuators, control mechanisms, and transducers. The two datasets were generated simultaneously by running the assembly line for 30 continuous hours (with slight filtering) and collecting data from sensors throughout the system. During operation, some defects in the assembly operation occurred due to manual removal of parts required for final assembly. The data sets generated included a time-series analog data set and a time-series multimodal data set that included images of the system along with the analog data. These datasets were generated to provide a tool for further research to enhance manufacturing intelligence. Actual manufacturing data sets can be scarce, not to mention data sets with anomalies and deficiencies. Therefore, these datasets are intended to address this gap and provide researchers with a foundation for building and training artificial intelligence models applicable to the manufacturing industry. Finally, these datasets are the first trials of the data released by the Future Factories lab and are likely to be further tailored to meet the needs of more researchers in the future.


Over the past 100 years, the United States has transformed from an autonomous, interconnected manufacturing powerhouse to one that is heavily dependent on other countries. This transition has created real dangers and barriers for manufacturing, and a McKinsey Global Institute study identified key elements rooted in Industry 4.0, including optimized processes, increased equipment utilization, improved supply chain management, and more efficient inventory management. These are listed below.The shift to data-driven manufacturing is allowing artificial intelligence to have a significant impact in key areas such as predictive maintenance, quality control, worker safety, and process optimization. As a result, the need for industrial datasets is increasing. However, there are many challenges to generating industrial datasets. These include data privacy and security concerns, the complexity of the manufacturing process, the difficulty of generating data sets containing outliers, and the difficulty of handling large data sets.To address these challenges, the Future Factories Lab at the University of South Carolina presents two datasets generated on a manufacturing line using industrial standards with the goal of supporting research to improve intelligence in manufacturing. An analog dataset and a multimodal dataset have been collected, the latter including image data from synchronized systems. These datasets are expected to be useful for the application of artificial intelligence in the manufacturing industry.

Experimental Setup

The Future Factories Lab testbed includes five Yaskawa robotic arms, conveyor systems, and material handling stations.

Robotic arms play a central role in many manufacturing processes: two Yaskawa HC10 robotic arms are responsible for material input and output. Meanwhile, three Yaskawa GP8 robotic arms perform product assembly and disassembly. These robot arms are controlled by YRC1000 and YRC1000micro robot controllers. With their high speed and precision in repetitive tasks, these robot arms are able to assemble products in a coordinated manner. Each robot arm is equipped with a custom-designed 3D printer gripper.

The conveyor system plays an important role in transporting products to their respective stations: four conveyor belts are interconnected and circulate between the robot arms. These conveyors are controlled by a Sinamics GS120 variable frequency drive (VFD), which communicates with a programmable logic controller (PLC). This conveyor system allows for coordination between the robot arms.

A Siemens S7-1500 PLC is used, programmed using Siemens Totally Integrated Automation (TIA) Portal engineering software. The PLC is connected to the robot controller and conveyor VFD using the Profinet communication protocol.

The assembly process takes place as follows First, the R01 robotic arm retrieves unassembled rocket parts from the material handling station and places them on a conveyor. When the conveyor brings the parts to the R02 station, the R02 robotic arm takes the two body parts and places them on the assembly table. Next, the conveyor brings the parts to the R03 station, where the R03 robotic arm attaches the pedestal and assembles the fuselage parts received from R02. Finally, R03 attaches the nose cone to create the finished product, which the conveyor brings to the R04 station, where the R04 robotic arm disassembles the finished product and returns it to its original state, ready for the next cycle. This assembly-disassembly cycle is repeated over a 30-hour period.

Figure 1: Future Factorie testbed setup (View 1)

Figure 2: Future Factorie testbed setup (View 2)

Data metrics

Analog data set

This dataset contains data from a 30-hour run of the assembly and disassembly process. After the experiment was completed, the various sensor values shown in the paper appendix were downloaded and sorted into several CSV files by facility (e.g., R01_Data.csv contains signals for R01).

In addition, the data was cleaned; during the 30 hours of operation, the test bed experienced a small amount of downtime, during which the data was determined to be meaningless. The periods when the testbed was not in operation were filtered out, and the final data set consisted of 325 complete cycles.

During the 30 hours of operation, several anomalies were also reproduced by team members manually removing rocket parts from the tray. These anomalies were grouped into three categories depending on how many of the four parts were missing:.

- NoNoseCone

- NoBody2,NoNose

- NoBody1,NoBody2,NoNose

These anomalies are annotated on the analog data set. That is, cycle 1 is not anomalous, cycle 50 has a NoNoseCone anomaly, and so on. Besides the lack of image data and the annotation of anomalies, the other major difference between this data set and the multimodal data set is the data acquisition rate of 10 Hz.

Figure 2: Structure of the analog data set

Multimodal data set

Like the analog data set, the multimodal data set was generated from the assembly and disassembly of the rocket prototype. It was collected at the same run times and under the same circumstances. In addition to the sensor values shown in the appendix, this data set also includes synchronized image data taken by two cameras mounted on either side of the test bed. As a result, the data acquisition rate was reduced to 2-3 Hz and a total of 166,000 records were collected throughout the entire run time.

While the analog dataset is organized in a tabular format in a CSV file, the multimodal dataset has a different structure. As shown in Figure 3, the images are divided into batches of 1000 samples each and stored in separate folders for each camera view. Each batch has a JSON file containing synchronized sensor values and the corresponding image paths. Due to the large number of records, the dataset folder contains a total of 166 image batch folders and their respective JSON files.

Figure 3: Structure of a multimodal data set


The datasets presented in this paper are widely available to the public for use in research to improve intelligence in manufacturing.

These data sets were collected at the Future Factories Lab at the University of South Carolina. The analog datasets include 30 hours of product assembly and disassembly operating data. The data are also annotated with three artificially introduced anomalous conditions (NoNoseCone, NoBody2,NoNose, and NoBody1,NoBody2,NoNose). The multimodal data set, on the other hand, includes synchronized image data taken by two cameras in addition to the analog data.

These datasets are intended to support the application of artificial intelligence in the manufacturing industry. Data from manufacturing sites are generally difficult to obtain, but datasets containing outliers in particular are extremely rare. Therefore, these datasets are expected to be a valuable resource for researchers.

Further adjustments to the data will be made in the future to meet the needs of the researchers. For example, changes to the type and frequency of anomalies, additional sensor values, and improved image resolution are being considered. Plans are also in place to release larger datasets.

In this way, the Future Factories Lab team aims to contribute to the development of artificial intelligence in the manufacturing industry. These data sets will help researchers in their research activities.

  • メルマガ登録(ver
  • ライター
  • エンジニア_大募集!!
友安 昌幸 (Masayuki Tomoyasu) avatar
JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us