Research On Key Technologies And System Implementation Of Multimodal Human-computer Interaction And Dialogue

Posted on:2020-02-19

Degree:Master

Type:Thesis

Country:China

Candidate:P P Liu

Full Text:PDF

GTID:2518306305995989

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Multimodal information can improve the naturalness and efficiency of human-computer interaction and dialogue,and make services better.In this paper,some interactive forms are explored and studied.(1)When a robot is searching for a service object in noisy environment,the speaker recognition technology combined with face recognition is preferable to voiceprint recognition.This paper proposes a speaker recognition scheme based on image information and constructs a dataset with 916 samples,in which each sample includes 20 consecutive images.The paper achieves the task of speaker recognition based on image information through two steps:all mouth areas of the faces are found by face recognition technology to perform lip movement detection,and the faces which are detected by lip movements are recognized.The lip movement detection model can be obtained by SVM or CNN+LSTM by constructing sample features.The experiment results show that speaker recognition based on image information can achieve high accuracy.(2)Abnormal sound generally represents the occurrence of abnormal events.When there is no one or only the elderly in the family,the monitor and feedback of abnormal sound by robot can help know the family situation.In this paper,a scheme of abnormal sound detection is proposed.The scheme collects dataset with Qt platform,extracts acoustic features with improved MFCC algorithm,and trains sound classifier with SVM.The scheme classifies each segment of sound as normal and abnormal,and smoothes the classification results of adjacent several segments to get the final feedback by voting when it is applied to actual test.Experiments show that the proposed abnormal sound detection scheme can achieve good results.(3)When human-computer dialogue is conducted,the multimodal information can better reflect the richness of the dialogue content.Through the existing research results,this paper can obtain seven kinds of modal information commonly used in interactive scenes;the modal information is characterized and fused with reference to natural language processing methods;and expression sentences of multimodal information fusion are obtained in the form of encoder-decoder based on deep neural network.Experiments show that this method can achieve the basic effect of multimodal information fusion.The research of multimodal human-computer interaction and dialogue is mainly presented to the public in the form of software functions.As the carrier of software,robot hardware is also very important.This paper introduces the outline,structure and circuit design of the robot based on a research robot,and verifies the above research results with it.

Keywords/Search Tags:

Human-computer interaction and dialogue, Speaker recognition, Abnormal sound detection, Multimodal information fusion, Deep learning, Machine learning

PDF Full Text Request

Related items

1	Multimodal Emotion Recognition Based On S-ELM-LUPI Paradigm
2	Research On Multimodal Emotion Recognition And Human-computer Interaction In Virtual Environment
3	Research On Machine Abnormal Sound Detection Based On Deep Learning
4	Human Interaction Detection Based On Multimodal Fusion
5	Research On Abnormal Sound Recognition Technology Of Machine Fault Based On Deep Learning
6	Abnormal Sound Detection Based On Self-Supervised Learning
7	Research On The Key Technology Of Task-Oriented Dialogue Policies Based On The Deep Reinforcement Learning
8	Study On Human-Social Robot Interaction Based On Multimodal Sensors Fusion And Perception
9	The Research Of Emotion Recognition Integrating Text And Speech Features Based On Deep Learning
10	Research On Video Abnormal Behavior Detection Algorithm Based On Deep Learning