Monday, 5 August 2024



Smart and user-centric manufacturing information recommendation using multimodal learning to support human-robot collaboration in mixed reality environments.The future manufacturing system must be capable of supporting customized mass production while reducing cost and must be flexible enough to accommodate market demands. Additionally, workers must possess the knowledge and skills to adapt to the evolving manufacturing environment. Previous studies have been conducted to provide customized manufacturing information to the worker. However, most have not considered the worker's situation or region of interest (ROI), so they had difficulty providing information tailored to the worker. Thus, a manufacturing information recommendation system should utilize not only manufacturing data but also the worker's situational information and intent to assist the worker in adjusting to the evolving working environment. This study presents a smart and user-centric manufacturing information recommendation system that harnesses the vision and text dual encoder-based multimodal deep learning model to offer the most relevant information based on the worker's vision and query, which can support human-robot collaboration (HRC) in a mixed reality (MR) environment. The proposed recommendation model can assist the worker by analyzing the manufacturing environment image acquired from smart glasses, the worker's specific question, and the related manufacturing document. By establishing correlations between the MR-based visual information and the worker's query using the multimodal deep learning model, the proposed approach identifies the most suitable information to be recommended. Furthermore, the recommended information can be visualized through MR smart glasses to support HRC. For quantitative and qualitative evaluation, we compared the proposed model with existing vision-text dual models, and the results demonstrated that the proposed approach outperformed previous studies. Thus, the proposed approach has the potential to assist workers more effectively in MR-based manufacturing environments, enhancing their overall productivity and adaptability.

Meanwhile, extended reality (XR), encompassing augmented reality (AR), virtual reality (VR), and mixed reality (MR) has been gaining popularity in various applications, including manufacturing and human-robot collaboration (HRC). Smart glasses like HoloLens 2 [10] are commonly used as smart devices to achieve MR experiences [11]. Some studies have been conducted to visualize and interact with manufacturing information and virtual objects using VR and AR [6,8,12]. For example, in the assembly of specific products, 3D parts can be visualized in the AR environment to demonstrate how they should be assembled [6,12]. Moreover, the MR environment allows workers to automatically check the location of real objects in their surroundings to conduct their tasks more effectively [8]. However, there are inherent limitations in providing information tailored to the worker's specific judgments and work situation. Thus, there is a need for further research that addresses this limitation by offering customized information according to the worker's queries and the specific requirements of their tasks. This improvement will be crucial in aiding workers in the evolving smart manufacturing environments.

 This study aims to utilize multimodal deep learning with text and vision dual encoders to recommend the most relevant manufacturing information for HRC to the worker in an MR environment. The proposed approach is designed to consider the worker's current situation and question based on the hybrid of Vision Transformer-based image and BERT-based text models. Thus, it can leverage vision and natural language processing to recognize the relationship between visual images and queries. Finally, it calculates a relation score based on the information on physical objects identified in the image captured by smart glasses, the worker's questions, and related manufacturing documents. Image features are obtained using the Vision Transformer model, while text features for the question-document pair are generated using a BERT model. These two types of features are then combined and used as input to the regression module, yielding a relevance score between the vision and the question-document data. To enhance the model's performance, we incorporate the object class obtained through object detection into the image, query, and document tuple in the training process. In addition, we employ mask tokens to create another tuple of image, query, and document data. By applying the contrastive loss for vision and text features during the training process, the proposed method learns the relevance between images, queries, and documents more effectively by measuring the differences between the features obtained from the two data, providing the most appropriate recommendation information to the worker for HRC. Finally, the recommended information can be visualized through the worker's MR glasses.


visit our website : cad.sciencefather.com contact mail : cadquery@sciencefather.com Social media link : Instagram : https://x-i.me/LiEf Twitter : https://x-i.me/4CXu pintrest: https://x-i.me/Clmc Blogger : https://x-i.me/6zpt Tags: #sciencefather #professor #lecturerpoliticalscience #scientist #scholar #Researcher

No comments:

Post a Comment

Leveraging Predictive AI in Telecommunications with RAN Intelligent Controller (RIC)

  In the dynamic landscape of telecommunications, the RAN Intelligent Controller (RIC) has emerged as a transformative technology. The trans...