Research On Multimodal Decision-Making Technology For Audio-Visual Perception In Artificial Intelligence Co-Pilot For Flight

Posted on:2024-06-04

Degree:Master

Type:Thesis

Country:China

Candidate:Z Z Lei

Full Text:PDF

GTID:2542307088996459

Subject:Mechanics

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence(AI)technology,applying AI technology to aircraft cockpits has become a research hotspot.The flight co-pilot robot receives voice commands from the pilot and executes the corresponding actions,thus sharing part of the workload of the pilot,which allows the pilot to focus more on unexpected situations that may arise,thereby improving flight safety.Based on the commands in the standard cockpit checklist of the 737-800 model and the cockpit environment,the overall process of the AI flight copilot audiovisual multimodal decision robot from receiving pilot voice commands to completing task decisions was studied.The main research contents and contributions are as follows:(1)A YOLOv5-based target instrument detection and positioning method was designed to achieve accurate positioning of target instruments within the camera field of view.First,a cockpit instrument dataset was constructed,and the YOLOv5 cockpit instrument recognition model was trained,achieving a recognition accuracy of 98.7%.Then,a target instrument positioning method based on the cockpit instrument recognition model and depth camera was designed,which can accurately locate the target instrument among many similar instruments.(2)Processing methods for auditory and visual inputs were designed to extract the required input information for the multimodal decision-making system.For auditory input,a method was designed to convert pilot voice commands into text instructions understandable by the AI copilot.To address the limitation of the small sample size of open source and self-made cockpit scene speech datasets,transfer learning was used to train a speech recognition model for cockpit scenes,achieving a recognition accuracy of 94.18% on the test set.Secondly,the speech recognition text was segmented using Chinese word segmentation methods to facilitate the robot’s understanding of the command intent.A specialized noun part-of-speech tagging dictionary for cockpit scenes was constructed,and the Jieba library word segmentation method was used for accurate segmentation of the speech recognition text.For visual input,methods for recognizing knob positions based on instrument recognition models and Hough transform image recognition technology and for judging instrument status based on instrument recognition models and HSV color proportion reading were designed,providing necessary information for the decision-making of the AI copilot.(3)A multimodal decision-making system for the AI copilot with auditory and visual inputs was designed,and a method for completing decision-making actions with a mechanical arm along a predetermined trajectory was used,achieving fast and safe operation of the mechanical arm in the complex environment of the cockpit.Finally,a simple AI copilot model was constructed,and the entire decision-making process from receiving voice commands to completing tasks was tested,with the model accurately completing multiple representative command tasks,and the test results meeting expectations.

Keywords/Search Tags:

Artificial intelligence, Multimodal perception in audio-visual, Speech recognition, Object recognition, Decision systems

PDF Full Text Request

Related items

1	Research On Fusion Method Of Object Recognition Based On Vision
2	Ethical Decision-Oriented Research On Urban Traffic Scenes Based On Object Recognition Algorithm
3	Design And Implementation Of Real-time Civil Aviation Speech Recognition Algorithm Based On Deep Learning
4	Research On Key Technologies Of Speech Recognition Based On Shipboard Environment
5	Design Of Embedded Speech Recognition System For Automotive Electronic Control
6	Road Complex Environment Traffic Object Perception And Event Intelligent Recognition Based On Video Images
7	Research On Speech Recognition Technology For Unmanned Combat Platform Control
8	The Application Of The Driving Language Speech Recognition In Sparse Algorithm
9	Multimodal Emotion Feature Recognition System Based On Driver's Vision And Speec
10	Research On Road Traffic Behavior Recognition Based On Deep Learning