Frontiers Of Manufacturing Service Recommendation Combining Knowledge Graph And ChatGPT

Manufacturing 28/09/2024

3 main points

✔️ Integrates knowledge graphs and LLMs to provide a fast and accurate way to identify manufacturers
✔️ Leverages data extraction and embedding techniques from the web to build a manufacturing services knowledge graph
✔️ Highly accurate QA system significantly increases reliability and efficiency of manufacturing services discovery

Building A Knowledge Graph to Enrich ChatGPT Responses in Manufacturing Service Discovery
written by Yunqing Li, Binil Starly
[Submitted on 9 Apr 2024]
Comments: Accepted by arXiv
Subjects: Artificial Intelligence (cs.AI)

code：

The images used in this article are from the paper, the introductory slides, or were created based on them.

Summary

This study explores how to build a knowledge graph to help manufacturing system integrators identify new manufacturing partners and mitigate risk through supply chain diversification. It proposes a method to improve the accuracy and completeness of ChatGPT responses using the Manufacturing Services Knowledge Graph (MSKG). This research integrates structured and unstructured data from the digital footprints of small manufacturers across North America to develop a manufacturing services knowledge graph. The knowledge graph and learned graph embedding vectors are used to address complex queries in the digital supply chain network to improve reliability and interpretability. This approach can scalably form a global manufacturing services knowledge network graph that integrates knowledge graphs across multiple industries, geographic boundaries, and business domains. The published dataset contains over 13,000 manufacturer web links, manufacturing services, certifications, and location entity types.

Introduction

With the advance of digitization, manufacturing industries are increasingly adopting a data-driven approach. In particular, integrators of manufacturing systems are seeking effective means to identify new manufacturing partners and mitigate risk through supply chain diversification. The Manufacturing Services Knowledge Graph (MSKG) is a tool developed to address these needs, providing reliability and interpretability for complex queries.

Manufacturing Services Knowledge Graph (MSKG) Overview

The MSKG is built by integrating structured and unstructured data from the digital footprint of small manufacturers across North America. This knowledge graph includes data on manufacturers' web links, manufacturing services, certifications, and locations, and ties these data together to support supply chain optimization and risk management.

Figure 1. comparison of ChatGPT and MSKG-enhanced ChatGPT responses

Background and Objectives of the Study

The purpose of this study is to leverage MSKG to improve the accuracy and completeness of ChatGPT responses. Specifically, we aim to address the following challenges faced by integrators of manufacturing systems

Identify new manufacturing partners
Diversification of supply chain
risk reduction

To address these challenges, we use knowledge graphs and learned graph embedding vectors. This improves reliability and interpretability for complex queries in digital supply chain networks.

Scalability of Approach

The approach proposed in this study is scalable to form a global manufacturing services knowledge network graph that integrates knowledge graphs across multiple industries, geographic boundaries, and business domains. This scalability makes it applicable to other geographies and industries, and it is expected to function as part of a broader digital ecosystem.

Related Research

Knowledge graphs (KGs) are used to link concepts across domains such as medicine, social networks, and chemistry; KG embedding models transform entities and relations into low-dimensional vectors and preserve the KG structure. These models are useful for machine learning tasks such as clustering and link prediction; Mohamed et al. explored knowledge graph embedding for drug target prediction and clustering, and Wang et al. used it for drug recommendation.

While the construction of KGs from structured data is well established, construction from unstructured data such as text and multimedia is challenging due to unreliable extraction and lack of datasets. Recent attempts include COVID-KG (from scientific literature) and industry KG from text in the Chinese automotive sector. Extracting accurate information from websites is difficult due to noise and outdated HTML structures; natural language processing (NLP) and topic label generation (TLP) techniques such as BERT and GPT-4 are important for processing large unstructured texts.

Knowledge mapping in the industrial sector is important for visualizing knowledge, data, and relationships; methods such as LangChain and LlamaIndex use LLM for data processing, and the Industrial Ontologies Foundry and Industry 4.0 Manufacturing Ontologies complemented by ontology-driven approaches. These form the basis for services such as manufacturing service discovery and equipment queries to support industrial problem solving and decision making. In addition, Siddharth et al. are working on the extraction of engineering knowledge from patents. However, there is a lack of literature on mapping, integrating, and analyzing real-time manufacturing data. This gap is due to the limitations of current LLM-based methods in industry knowledge mapping for evolving manufacturing data integration.

Question and answer (QA) systems combine information retrieval and knowledge-based methods to provide accurate answers. Knowledge-based QA uses KG to retrieve answers, and graph embedding converts KG data into vectors to help ML and neural networks reason. Knowledge-based QA provides structured context, which enhances the ability of the LLM to generate and interpret more accurate and contextualized answers.

Recent studies have emphasized the integration of KG and LLM to improve QA systems; Daull et al. explored how KG can help improve LLM and reduce errors; Truong and Coleen emphasized incorporating KG for accurate answer generation; Linyao et al. together to improve answer quality and factual reasoning. While these developments show promise for improving QA systems and accuracy, there is limited research applying these methods, especially in the context of supply sourcing from the manufacturing industry. Tailoring these integrations specifically to the manufacturing industry could significantly improve service discovery and supply chain process optimization.

Architecture

This section describes the integrated architecture of the Manufacturing Service Knowledge Graph (MSKG) and ChatGPT, designed to enhance manufacturing service discovery. Interaction between clients in the manufacturing industry and ChatGPT occurs through QA. Upon receiving a client's input question, the application forwards the question to the OpenAI GPT-4 endpoint with a request to translate it into a query statement that can be used in the graph database The OpenAI endpoint retrieves the relevant manufacturing capabilities from the MSKG query statement to which it responds. The data retrieved will help build a comprehensive answer to address the client's initial question.

Figure 2 shows the architecture for enhancing ChatGPT using MSKG.

Figure 2. architecture for enriching ChatGPT with MSKG

In addition, MSKG is updated in near real-time by a wide range of manufacturer websites. The adoption of Schema.org vocabulary extensions within the manufacturing domain allows manufacturers to use HTML tags that attach specific manufacturing service tags to their websites. When manufacturers add these tags to their websites, they are associated with the MSKG ontology, making query search results more current and accurate.

Process Workflow

This section describes the overall procedure for building MSKG and enhancing QA for ChatGPT. The process consists of four main parts. Textual Knowledge Extraction, KG Design, Graph Embedding, and Knowledge-Driven QA.

Figure 3 shows the process workflow for enhancing a question and answer (QA) system through a knowledge graph (KG) designed from information available on the Internet.

Figure 3. from information on the Internet to a Knowledge Graph (KG) designed to enhance Question and Answer (QA) systems.

Textual Knowledge Extraction: information extraction procedures are performed from manufacturer websites and other data sources to obtain data to be imported into MSKG. MSKG is built after bulk import derived by entities extracted from Wikidata. learned from MSKG. Dimensionality reduction and multi-label classification are performed based on graph embedding vectors.
KG design: includes KGs with 4 types of node labels and 4 types of relation labels. Examples of node and relation types are shown.
Graph Embedding: embedding vectors are learned from MSKG subgraphs using graph embedding techniques (Node2Vec and GraphSAGE). The embedding vectors are used for downstream tasks of manufacturer recommendation and multi-label classification.
Knowledge-driven QA: MSKG-based QA systems will be built to address complex questions related to manufacturing services discovery; QA system evaluations will be conducted by P@N and MRR metrics for manufacturer recommendations.

Data Integration and Enrichment

The process is designed to standardize and integrate the collected data to build a knowledge graph. This process involves the following steps

Data Standardization:
- Convert data collected from different sources into a consistent format. This ensures data consistency. Data standardization involves unifying data formats, converting units, and integrating data fields. Examples include unifying date formats and converting numeric data to units.
Entity Matching:
- Match and consolidate data from different sources about the same entity. This eliminates duplicate data and improves data integrity. Entity matching is performed using criteria such as name similarity, address matching, and product ID commonality. For example, data from the same manufacturer collected from different sources are merged as a single entity.
Enrichment:
- It enriches the data by obtaining additional information from external data sources. This expands the information contained in the nodes and edges of the knowledge graph. Enrichment adds company financial information, industry reports, patent data, etc. This makes the content of the knowledge graph more detailed and comprehensive.

Table 1 shows the entity types extracted.

Table 1 Extracted entity types

Table 2 shows a sample service extraction.

Table 2: Service Extraction Sample

Knowledge Graph Construction

The integrated data is used to build a knowledge graph. The knowledge graph consists of nodes (entities) and edges (relationships) and includes information such as manufacturers, products, services, certifications, and geographic locations.

Node generation:
- Entities such as manufacturers, products, services, certifications, geographic locations, etc. are generated as nodes. This identifies each entity individually and clarifies their interrelationships. Node generation identifies entities based on data attributes and defines each as a separate node.
Edge generation:
- The relationships between entities are represented as edges. For example, a relationship between a manufacturer and the services it provides. Edge generation defines relationships based on interactions and dependencies between entities. For example, if manufacturer A provides service X, an edge is formed between A and X.

Figure 4 shows the general structure of MSKG.

Figure 4: Typical MSKG Structure

Table 3 shows the total number of entities and relationships in the KG.

Table 3: Total KG Entities and Relationships

Embedded Graph

The graph embedding module learns relationships between nodes and embeds the nodes into a high-dimensional vector space. This makes it easier to compute node similarities and improves the accuracy of responses to complex queries.

Use of node2vec:
- Based on the method of Grover and Leskovec (2016), node2vec captures the neighborhood information of nodes in a random walk and generates an embedded vector. node2vec is a method for capturing the context of nodes in a network, representing similar nodes as a proximity vector. This allows for efficient learning of node features.
Graph convolutional networks (GCNs):
- Based on Kipf and Welling's (2017) method, GCN improves prediction accuracy by integrating node features and their neighbor information GCN is a deep learning approach for graph-structured data that combines node attributes with information from neighboring nodes. This improves the accuracy of node classification and link prediction.

Knowledge-Driven QA

Background

Building a QA system for manufacturing service discovery requires addressing the complex and dynamic nature of the manufacturing industry. A key challenge is integrating detailed industry-specific data into KG and continually updating it to reflect new developments and market trends. It also requires accurate modeling of the complex relationships within the manufacturing supply chain. Building an effective QA system is challenging because the manufacturing industry requires a high degree of accuracy and reliability, plus limited access to proprietary data.

Evaluation Method

There are several ways to evaluate a QA system, including Mean Reciprocal Rank (MRR), Precision at N (P@N), Recall, F1 score, and human evaluation. P@N measures the percentage of correct responses among the top N responses returned by the recommender system. The Precision at N (P@N) metric (N=10,100,300) is used to evaluate a manufacturer's recommendation performance and assess the system's capability:

where $is$ the number of services relevant to the target manufacturer among the top N results and $NtopN_{top}Ntop the$ number of services provided by the top N results. MRR is also used to evaluate the effectiveness of the recommendations; MRR is expressed as

where rank_i is the rank of the first related manufacturer for the i-th query.

These metrics are chosen because they require accurate, top-ranked responses in manufacturer discovery; P@N evaluates the accuracy of top recommendations, while MRR assesses the effectiveness of the system in identifying the most relevant manufacturers first.

Establishment of QA System

The study employs an index approach to classify and organize text retrieved from manufacturers' websites, based on a number of techniques used in the KG construction process. The main contributions are as follows:

We propose a mechanism to extract and organize domain-specific text from independent websites of small manufacturers. This will allow for natural interaction with technical domain-specific text.
Integrating continuously evolving KG into LLM provides a new solution for identifying manufacturing capacity and changing the landscape of manufacturer recommendations.
We present a novel integration of bottom-up ontology construction and advanced machine learning models to efficiently build MSKG from structured and unstructured data sources. This approach streamlines the integration of diverse data and improves the accuracy and relevance of KGs.
An advanced graph-based QA system designed to address the complex queries associated with digital supply chain networks, combining KG and graph embedding technologies to perform in-depth analysis and provide highly accurate, similarity-based recommendations.

System Performance

Figure 5 shows an example of combining MSKG and ChatGPT to solve a simple question.

Figure 5. solving an easy level problem combining MSKG and ChatGPT

Figure 6 shows an example of combining MSKG and ChatGPT to solve a difficult question.

Figure 6. solving a hard level problem combining MSKG and ChatGPT

Result

Verification of Text Extraction Results

In the text extraction results, the number of negative classes may be greater than the number of positive classes due to the lack of textual information on the main page of the manufacturer's website. the ROC and PR curves are calculated to show the reliability of the model. the ROC and PR curves indicate that the extraction model has high performance. In particular, the authentication extraction model shows the highest AUC-ROC score, while the location extraction model shows the lowest performance.

Figure 7: ROC curve and PR curve

Cutoff values for accuracy, repeatability, and F1 score calculations for specific data extraction models are optimized for each data type. This approach improves the overall performance of data extraction and increases the reliability of the data used to build the MSKG.

Graph Embedding and Its Downstream Task Results

A 100-dimensional vector space was obtained from the Node2Vec and GraphSAGE embedding results, with T-SNE used for dimensionality reduction. Figures 8 and 9 compare the clustering performance of manufacturers with service-related attributes; GraphSAGE shows better clustering definition than Node2Vec, with the ability to more clearly distinguish service features.

Figure 8. visualization of T-SNE using GraphSAGE for manufacturer's service-related attributes

Figure 9. visualization of T-SNE using node2Vec for manufacturer's service-related attributes

Node2Vec generated embedded vectors were used in the multi-label classification task. These vectors were trained and evaluated using the MLP model. Training accuracy was 98.90%; multi-label prediction accuracy, F1 score, recall, and precision were 98.72%, 94.62%, 99.93%, and 89.85%, respectively.

Evaluation of MSKG-Based QA

The Appendix includes a detailed analysis of the questions related to manufacturing service discovery and their corresponding MSKG responses compared to the GPT-4 responses. For simple questions, either the GPT-4 or the MSKG can answer, but for more complex questions such as Q13 and Q14, integration of the MSKG and GPT-4 is essential.

Table 5 shows the results of the manufacturer recommendation evaluations, with GraphSAGE slightly ahead of Node2Vec in Q13 and Node2Vec superior in Q14. It shows that the performance of the recommendation function varies depending on the number of services offered by the manufacturer.

Table 5: Manufacturer's recommended ratings

Discussion

The study adopted a bottom-up approach, collecting raw data from manufacturers' websites and constructing a knowledge graph (KG) with four entity types and their corresponding relationships. However, due to some websites lacking basic SEO codes, we were only able to extract information from more than 13,000 of the 17,000 companies. Future work includes expanding MSKG and integrating other relevant data.

Future research will extend the current framework to enhance understanding of the KG context through LLM training and pre-training strategies, it states.

Conclusion

This paper presented a framework that leverages a knowledge graph (KG), which is updated in near real-time, to enhance manufacturing service identification and manufacturer recommendation. The constructed MSKG has four entity types and corresponding relation types, including manufacturing services, with a total of 13,240 entities and 58,521 relations, including text content from some manufacturers in North America.

Knowledge graphs and learned graph embedding vectors are used to support QA in ChatGPT and to answer questions from clients in the manufacturing industry, leveraging the transformation between human natural language and graph query language. The evaluation results show that the proposed MSKG-based QA can effectively address complex questions in manufacturing service discovery.

The scale of MSKG can be extended to include domains adjacent to the manufacturing supply chain and specific industrial supply chains. Future frameworks will aim to integrate LLM and knowledge models to allow for richer searches, according to the report.

Categories related to this article

友安昌幸 (Masayuki Tomoyasu): JDLA G certificate 2020#2, E certificate2021#1 Japan Society of Data Scientists, DS Certificate Japan Society for Innovation Fusion, DX Certification Expert Amiko Consulting LLC, CEO