Catch up on the latest AI articles

Federated Learning-based Integrated Object Detection That Leverages Distributed Data While Protecting Privacy

Federated Learning-based Integrated Object Detection That Leverages Distributed Data While Protecting Privacy

Federated Learning

3 main points
✔️ Applying federated learning to quality inspection tasks that leverage distributed data while protecting privacy
✔️ Using YOLOv5 as the object detection algorithm and Federated Averaging as the FL algorithm
✔️ compared to models using non-distributed data sets, Achieved rather better generalization performance

Federated Object Detection for Quality Inspection in Shared Production
written by Vinit HegisteTatjana LeglerMartin Ruskowski
[Submitted on 30 Jun 2023 (v1), last revised 25 Aug 2023 (this version, v2)]
Comments: Will submit it to an IEEE conference
 Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)


The images used in this article are from the paper, the introductory slides, or were created based on them.


Federated Learning (FL: federated learning) has emerged as a promising approach for learning machine learning models on decentralized data without compromising data privacy; it was proposed by Google Inc. in 2017. In this paper, we propose a FL algorithm for object detection in quality inspection tasks, using YOLOv5 as the object detection algorithm and Federated Averaging (FedAvg) as the FL algorithm. We apply this approach to a manufacturing use case. In this use case, multiple factories/clients provide data for training a global object detection model while maintaining data privacy on a non-IID data set. Experiments demonstrate that this FL approach achieves better generalization performance on the overall client test dataset and generates improved bounding boxes around objects compared to models trained on local client datasets. This study demonstrates the potential of FL for quality inspection tasks in manufacturing and provides valuable insight into the performance and feasibility of leveraging YOLOv5 and FedAvg for federated object detection.


Object detection (OD) is one of the most common and useful tasks in deep learning. We will skip the introductory explanation. Federated machine learning offers the possibility to address privacy concerns in collaborative learning scenarios, where multiple clients/stakeholders collaborate on machine learning models without sharing personal data. Thus, the use of horizontal FL for object detection will help the global model by enabling it to obtain a variety of samples from different clients belonging to the same class. This makes the final global federated model more robust compared to locally trained models.

In previous work by the authors, FL was applied to an image classification task in manufacturing quality inspection and showed that this approach can achieve performance comparable to centralized learning while preserving data privacy. In some cases, global models can even generalize better to the feature space of a dataset than models trained using centralized learning. In this paper, we extend our previous work by proposing a FL algorithm for object detection in quality inspection tasks, using YOLOv5 as the object detection algorithm and the union average as the FL algorithm. Object detection is an important task in quality inspection, and detecting and not spreading defects in images is essential to assure product quality. using the FL approach, a global object detection model incorporating data from multiple factories while maintaining data privacy can be train a global object detection model that incorporates data from multiple factories while maintaining data privacy. The authors' experiments demonstrate the feasibility and effectiveness of this approach for quality inspection tasks in manufacturing. In addition, they highlight how FL-based quality inspection services can be integrated into a shared production ecosystem at production level 4 based on skill-based production. service in the marketplace. This means that quality inspection services must have self-description capabilities. To achieve interoperability and provide software to the marketplace, the authors use the Asset Administration Shell (AAS) to describe software services in the form of sub-models.

Related Research

Output from OD (Object Detaction) can also be used for multiple applications, such as locating objects in a fixed environment, object counting, object segmentation, and fault detection and classification. These capabilities are very useful in the quality inspection industry when multiple clients working on similar products/use cases collaborate to train a global model that can detect all types of errors/faults. Coalitional Object Detection (Coalitional OD) is very useful in this scenario, where a global coalitional model can detect all traffic sign samples belonging to a particular class. There are very few papers published in this area and none that apply the algorithm to a specific custom use case; Luo et al. attempt to address this problem by implementing the algorithm on a real data set, but this application is not a general object general in the city It only focuses on images from surveillance cameras for general object detection and does not address specific use cases or perform in-depth analysis on the accuracy of the global model and how it compares to local client models. The paper also uses YOLOv3, which easily outperforms YOLOv5, the current state-of-the-art; the remaining papers that use FL with the YOLOv5 algorithm, Bommel et al. introduce active learning during collaborative OD to solve the unlabeled data problem. Zhang et al. introduce FedVisionBC, a blockchain-based federative learning system for visual object detection to solve FL's privacy challenge; FedCV focuses on creating a framework to automate the process of federative OD Su et al. use RetinaNet with a ResNet50 backbone as their detection algorithm. Su et al. use RetinaNet with a ResNet50 backbone as the detection algorithm. The authors' paper follows a similar algorithm for federated OD using FedAvg and tries to explore the utility of federated OD for quality inspection in shared production scenarios using non-IID (independent same distribution) data sets using different use cases.

The concept of the Asset Administration Shell (AAS) serves as an implementation of the digital twin in the context of Industry 4.0. Acting as a digital representation of an asset or service, an AAS consists of various sub-models that encompass all relevant information and functionalities of an asset or service. This includes its features, characteristics, properties, and capabilities; the AAS facilitates communication through a variety of channels and applications and serves as the critical link between the physical object and the connected, digital, distributed world. Initially employed to create digital representations of physical assets, AAS can now also accommodate software modules. Its integration is a fundamental prerequisite for the incorporation of production modules in the new production architecture: toward 2025, the flexible production network aims to work seamlessly through the adoption of the digital platform Gaia-X. In order for assets such as machines and services to be integrated into the European data platform Gaia-X in the future, certain technical standards must be met, including considerations regarding security and skill descriptions. This relevant information will be encapsulated within the AAS to ensure compliance and allow assets to participate effectively within the Gaia-X ecosystem.


The use case of quality inspection USB sticks in manufacturing was extended by annotating the previous dataset in the YOLO format used by the YOLOv5 algorithm. To further extend the ease of use of this algorithm, we introduce a new use case in which two companies/clients manufacture a cabin and a windshield, as shown in Figure 1. The quality inspection use case here is to detect whether the cabin comes with or without a windshield.

Figure 1. Examples of cabins with windshields (four types) and cabins without windshields.

The FL algorithm employed in this paper is FedAvg. A neutral server is utilized to perform a union average over the model weights from all clients. Certain assumptions are made regarding the active participation of all clients in each round of communication and the reliability of sharing model weights. Before starting the training process, all clients agree on hyperparameters such as standardized label nomenclature, YOLO model architecture, local epochs, optimizers, and batch sizes. Each local training procedure is then initiated.


The use cases for USB and Cabin quality inspection are described in this section.Starting with USB quality inspection, there are three clients, each with three classes: "OK", "Not Okay", and "Hidden". The datasets are non-independently identically distributed (non-IID), and the distribution of each class and dataset is shown in Figure 2. Client 1 has a Huawei USB stick, Client 2 has a blue-brick style USB stick, and Client 3 has a red-brick style USB stick. Each client consists of three classes, and each client's "Not OKAY" class has a different error type. As shown in Figure 3, Client 1's USB error has a small sticker mark indicating damage to the USB port, Client 2's USB error has a damaged USB port, and Client 3's error has a rusty USB port. Similar to the use cases presented in the paper, which showcased the success of federated image classification for custom quality inspection, the paper also shows that the federated OD algorithm can learn all the different types of USB errors that the global federated model can learn and, more importantly, can draw a complete bounding boxes can be drawn, the goal is to see if the same results can be achieved.

Figure 2. training dataset distribution and label instances for client 1 (left: Huawei), client 2 (center: SF blue) and client 3 (right: SF red)

Figure 3. example of a small subset of the USB quality inspection data set, client 1 (Huawei on the left), client 2 (SF blue in the middle), client 3 (SF red on the right)

The second use case involves two companies/clients that manufacture cabins with and without windshields. The main use case is to create an object detection model to classify and correctly detect objects in a given video or image. Client 1 manufactures only blue cabins and blue windshields (types A and B), while client 2 manufactures red cabins (a slightly different design than the blue cabins) and windshields (types C and D) (see Figure 5). Class instances of the training dataset for each client are in Figure 4. Each client had a total of about 600 images, of which 15% each were used for validation and testing. The dataset was created with the cabins placed on the chassis (only the cabins are annotated) and with three different backgrounds. Different lighting conditions, shadows, blurred images, and various other parameters were also introduced by the creation of the custom dataset.

Figure 4. training dataset distribution and label instances for client 1 (left: blue cabin) and client 2 (right: red cabin)

Figure 5. small subset of the in-flight quality inspection data set, client 1 (blue in-flight on the left), client 2 (red in-flight on the right)


As mentioned earlier, we have two customers involved in the manufacture of cabins and windshields of different designs and types. Their local quality testing models use their respective local datasets (see Figure 5) to train YOLOv5 models that can detect the presence or absence of a windshield in a given frame. Both clients' models achieved greater than 95% accuracy on their respective local test datasets. Client 1's model was evaluated on a blue cabin with no windshield and a cabin with type A and B windshields, while Client 2's model was tested on a cabin with no windshield and a cabin with type C and D windshields. In other words, Client 1 will produce cabins with windshield types C and D in addition to its existing production, while Client 2 will produce cabins with windshield types A and B. A locally trained model based on each client's old data set was tested with the new cabin and windshield combinations. However, the results showed that although the local models correctly classified the images as either "cabin without windshield" or "cabin with windshield," the bounding boxes produced were inaccurate, and sometimes a portion of the windshield was cut off. In some cases, false positives were detected and assigned labels with lower confidence scores, as shown in the lower left of Figure 6. While each client could potentially share datasets and train a centralized YOLOv5 model for quality checks, for personal and competitive reasons, they could not share their local raw image data. As a result, both clients would need to create new additional data for new combinations, annotate the data set, and retrain the entire model to classify the new "cabin with windshield" images. However, manually creating and annotating a dataset for each client is tedious. This is where FL plays an important role, allowing the development of a final global model that can accurately detect objects for both clients without having to share raw image data.

Once local learning is complete, taking into account assumptions and hyperparameters as described in "Methods," the model weights achieved by each client are sent to a neutral server. Upon receiving the model weights from all clients, the server performs a union average and sends the updated global weights back to each client. This process is called one communication round (CR). Through multiple CRs, the global model is gradually improved and performs better on both clients' test data sets. In this particular use case, each client runs the global model received from the server on its local test data set, provides feedback on the accuracy of the global model, and sends new local weights accordingly. The average accuracy of the test data set of all clients is the stopping parameter. Once the server receives the accuracy of the previous global model on all clients' local test data sets and calculates that the average accuracy is greater than 96%, the previous global weights are sent to the clients as the final global coordination model. In this particular use case, the global model was achieved after 10 CRs and 15 local epochs. The output of this model for a similar test data set demonstrated very high accuracy, with very accurate bounding boxes around the objects. Figure 6 shows the output of the global association model for similar images. It can be observed that the confidence score of the predictions is very high and the bounding box accurately encompasses the windshield. There are no false positives and the model output shows robustness to blurred images. A similar setup was used for the USB quality inspection data set (Figure 3), and the final global model correctly classified the various errors and accurately drew bounding boxes around specific USB sticks. The global federated USB stick model was even able to detect combinations not seen in the dataset, such as applying the sticker error from Client 1 to the USB stick from Client 2. This is consistent with the results presented in the paper [7], which focuses on federated image classification settings.

Figure 6. output of models trained on the local dataset, client1 (top left) and client2 (bottom left), and the global coalition YOLOv5 model (right column) for the unseen windshield type dataset.


This paper focuses primarily on the in-flight quality inspection use case and experiments based on this scenario; a summary of the results for the USB federated OD model is shown in Figure 8, with a particular concentration on evaluating the performance of the different models in the in-flight quality inspection domain. Throughout the experiments, we refer to the locally trained model for Client 1 as the "Blue cabin model," the model for Client 2 as the "Red cabin model," and the global federated model as the "FedOD."

1) Test Blue Cabin model, Red Cabin model, and FedOD model as test data sets for new cabin and windshield combinations.

2) Evaluation of live object detection for three models with various cabin and windshield combinations.

3) Testing of three models against images obtained from the quality inspection module of the manufacturing process.

In the first experiment, the test data set consists of a blue cabin with windshields of types C and D and a red cabin with windshields of types A and B. The objective is to evaluate the performance of the models for these unknown combinations; in the second experiment, three models (blue cabin, red cabin, and FedOD) are run simultaneously to detect objects in frames with different cabin and windshield combinations. The goal of this experiment is to compare the outputs of the models in different scenarios involving frames with multiple cabins. Finally, the third experiment uses images from the quality inspection module at SmartFactory-Kaiserslautern (SF-KL) as input for the model. It is important to note that the background and lighting conditions of these images are very different from the training data set, adding a new level of complexity to the evaluation process.

Integration with Industry 4.0 shared production architecture

One possible solution is to explore how to offer the authors' quality inspection AI software services to various companies on the Gaia-X platform. To this end, we oriented the provision of production services in the public Gaia-X data space. In this case, potential users of the service can download the service in their own dataset and use it in their own production line.To provide quality inspection services in Gaia-X, we described quality inspection services (their features, characteristics, properties, and capabilities) in a sub-model of AAS. Currently, there is no standard sub-model template available to describe the capabilities of AI services; with the help of the Gaia-X connector, you can connect to the relevant data space and provide an AAS-based description of software services. That is, they can be found in the Gaia-X service catalog, and service providers can offer software services through the marketplace (Figure 7).

Figure 7. industry 4.0 data space for software services

When a customer connects to the data space, they can browse the available software services in the marketplace, select the one that matches their requirements, download it (as a Docker container) and use it on the production line. All this operation is performed with the help of a generalized description of the services available via AAS. The customer simply downloads the service or, if the service is already running on the customer's side, updates the model weights from the global coordination model. Since the quality inspection service is based on the FL algorithm, customers also have the opportunity to contribute to the service by improving the quality of their models through additional rounds of training on their own local datasets. The main challenge here, however, is to ensure that all customers have similar data classes and use cases. Each FL model is trained for a specific use case, and accurately describing the possible use cases is necessary to offer the product in the marketplace. Some precautions can be taken, such as trying to automatically assign class labels from different clients [24], but a trustworthy environment must be in place to guarantee cooperation.

Results and Discussion

In this section, we present the results of the federated USB quality test and the experiments conducted in Subsection III-C on the detailed cabin quality test use case. We focus on three main experiments to compare the performance of the different models: the Blue Cabin model, the Red Cabin model, and the Global Coalition OD model (FedOD). That is, the global coalition model was achieved in 15 local epochs for 5 CRs, followed by the client1, client2, and client3 models trained in 150 epochs based on their respective local data sets. We see that the global OD model can not only predict errors for all clients, but also detect errors (rust) for client3 on client1's USB stick.

Figure 8. live comparison of a federated global model and a model trained on a local dataset

In a first experiment, we evaluated these models on a test data set that included blue and red cabins with different combinations of windshields. The results showed that the blue cabin model was unable to accurately classify and detect the red cabin and windshield. Similarly, the red cabin model struggled to detect blue cabins and windshields and generated incorrect bounding boxes for these images. In contrast, the FedOD model successfully detected all different cabin and windshield combinations and produced very accurate bounding boxes for most test images, demonstrating excellent performance. Detailed results of this experiment can be found in Table I. Table I shows the average precision (AP) values for different IoU thresholds ranging from 0.50 to 0.95 and the mean average precision (mAP) at an IoU threshold of 0.5. The FedOD model achieved an AP [.50:.05:.95] of 0.93 and a mAP of 1.0 for a wide range of IoU thresholds The robust performance over a wide range of IoU thresholds was demonstrated. To further investigate the accuracy of the Blue Cabin and Red Cabin models, a second experiment was conducted in which each model was tested specifically for the corresponding cabin and windshield color combination. In addition, the FedOD model was tested with both combinations to allow for direct comparison. The results of this experiment are shown in Table II. The mAP and AP[.50:.05:.95] values in Table II show that the FedOD model outperformed the local model in terms of accuracy. It consistently achieved higher mAP and AP scores, suggesting that the global federated OD model is better at predicting accurate bounding boxes even when faced with unknown combination types. These results highlight the effectiveness of the FedOD model in the cabin quality inspection use case; the superior performance of the FedOD model provides strong evidence of FL's advantages in collaborative object detection scenarios.

Table I

Comparison of map metrics for blue cabin, red cabin, and fedod models on unknown test data set (ap=average precision, apm=ap for medium size objects, apl=ap for large size objects, ar=average recall).

Table II.

The intensively trained yolov5 models (client1 and client2) are trained on the local dataset and the fedod model is tested on a combination of windshields not present in the training dataset.

For the second experiment, we developed custom code to run all three models (Blue Cabin model, Red Cabin model, and FedOD) simultaneously and in parallel on a live video stream. This setup allowed us to directly compare the output of each model and observe visible differences. Multiple cabin combinations were tested within a single frame, as shown in Figures 9 and 10. Each figure consists of three windows. The upper left window displays the output of the FedOD model and represents the global federated OD model. The upper right window shows the output of the Blue Cabin model trained using Client 1's local data set, and the lower left window shows the output of the Red Cabin model trained using Client 2's local data set. Figures 9 and 10 show the same pattern of windows, with the output labels changed to 0 and 1 to represent "cabin without windshield" and "cabin with windshield," respectively, providing a clear visual comparison of the model output across the different bounding boxes.

Figure 9: Comparison of a federated global model and a model trained on local data with live object detection.

Figure 10: Comparison of a federated global model and a model trained on local data with live object detection.

In Figure 9, the frame contains four cabins in the combinations available in the training dataset; the FedOD model shows superior performance, correctly detecting all four cabins with high confidence for each client combination. Both models correctly identify only their own design type, proving the poor performance of these models in Table I. Moving on to Figure 10, a red design cabin from Client 2 and a windshield type A from Client 1 were tested. The results are surprising: the red cabin model successfully classified objects with a high reliability score, but the bounding box drawn was inaccurate and cut off part of the windshield. The blue cabin model, on the other hand, failed to fully detect this particular object. Figure 10 also shows the reverse scenario of the previous test case. Here, a blue cabin with a Type C windshield was tested for Client 2. The blue cabin model correctly classifies the object, but as in the previous scenario, it struggles to generate an accurate bounding box. The output of the red cabin model is interesting: it seems to classify the object as a 'cabin with windshield' because the windshield type C was part of its training data set. In contrast, the FedOD model not only accurately classifies both objects, but also draws accurate bounding boxes for these unknown combination types.

In a third experiment, the same model was tested against images obtained from the demonstrator in the quality inspection module of the SF-KL, as described in subsection III-C. Figures 11, 12, and 13 show the results from these tests. It is worth noting that the images taken in this environment have significantly different background and lighting conditions compared to the images used in the training data set; the consistent use of the same image set in all three figures allows for a direct comparison of the output produced by each model. In Figure 11, we see that the output of the Blue Cabin model fails to correctly classify instances of "cabins without windshields." Furthermore, the model predicts a large number of false positives for trailers and even frames where no objects are present. Similarly, Figure 12 shows the performance of the red cabin model, which also struggles to accurately detect objects in images from the quality inspection module. This model exhibits misclassifications and false positives, especially on frames with trailers and no objects. The results obtained from our global federated OD model on the same test image are truly surprising: the enhanced algorithm combining the power of FL and OD shows a significant improvement in accuracy and precision compared to the previous individual client models. Figure 13 shows the output of the FedOD model on a test image. This figure shows the outstanding performance of the FedOD model in predicting bounding boxes with remarkable accuracy and confidence scores. Notably, the model did not show any false positives when faced with different types of trailer detections or frames containing only background but no objects. These findings provide compelling evidence of the versatility and generality of our federated OD model, especially in detecting identical objects in diverse and previously unseen environments and in detecting various object combinations that have not been seen in the training model.

Figure 11: Output of the model trained only on the blue cabin dataset (blue cabin model) for the quality inspection image of the demonstration machine.

Figure 12: Output of the model trained only on the red cabin dataset (red cabin model) for the quality inspection image of the demonstration machine

Figure 13: Output of the global coordination model (FedOD model) for the quality inspection image of the demonstration machine

Our future work will focus on the implementation of the standard AAS sub-model for AI service function description soon to be published by the Industrial Digital Twin Association [22]. Typical use cases require a standard, vendor-independent method of describing software functionality. An accurate description is really important for a federated learning approach to allow each partner to participate in the global model creation. As mentioned previously, the Google AI Model Card was used as a reference in this study [23]. In addition, an updated version of the Gaia-X data space connector will be implemented to provide simultaneous access and service downloads to multiple customers from the Gaia-X marketplace [25].


This paper presented a comprehensive investigation into the effectiveness of a global federated OD model (FedOD) for quality inspection used in a shared production environment. combining the power of FL and object detection achieved significant improvements in accuracy and precision compared to individual client models We have achieved a significant improvement in accuracy and precision compared to the individual client models. Our experimental results showed remarkable performance of the FedOD model across multiple scenarios: the FedOD model outperformed the local model by accurately detecting all combinations and generating highly accurate bounding boxes. The achieved Average Precision (AP) and Mean Accuracy (mAP) scores further demonstrated the robustness of the model to multiple unknown test data combinations. In addition, the simultaneous evaluation of the three models on a live video stream provided valuable insight: the FedOD model consistently performed well in object detection, even in the presence of different cabin and windshield combinations. In addition, experiments on images obtained from a quality inspection module demonstrator validated the versatility of the FedOD model. Despite widely varying background and lighting conditions compared to the training dataset, the model showed excellent performance in correctly classifying objects and generating accurate bounding boxes. It significantly outperformed the Blue Cabin and Red Cabin models, which had challenges with accurate detection and generated false positives. These results highlight the effectiveness and versatility of the FedOD model to detect cabin and windshield combinations in unknown environments while maintaining data privacy. It offers promising implications for real-world applications in industry. In conclusion, our study demonstrates the remarkable potential of the global FedOD model for many such custom use cases: the combination of FL and object detection not only increases accuracy and precision, but also enables detection of unknown object combinations and diverse environments. This research contributes to the advancement of collaborative learning approaches and paves the way for more efficient and effective quality inspection systems in various industries.

One of the authors:Mr. Vinit Hegiste
Machine learning engineer and PhD. student at RPTU Kaiserslautern -Landau.

友安 昌幸 (Masayuki Tomoyasu) avatar
JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO

If you have any suggestions for improvement of the content of the article,
please contact the AI-SCHOLAR editorial team through the contact form.

Contact Us