Font Size: a A A

Research On Key Technologies And System Implementation Of Multimodal Human-computer Interaction And Dialogue

Posted on:2020-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:P P LiuFull Text:PDF
GTID:2518306305995989Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Multimodal information can improve the naturalness and efficiency of human-computer interaction and dialogue,and make services better.In this paper,some interactive forms are explored and studied.(1)When a robot is searching for a service object in noisy environment,the speaker recognition technology combined with face recognition is preferable to voiceprint recognition.This paper proposes a speaker recognition scheme based on image information and constructs a dataset with 916 samples,in which each sample includes 20 consecutive images.The paper achieves the task of speaker recognition based on image information through two steps:all mouth areas of the faces are found by face recognition technology to perform lip movement detection,and the faces which are detected by lip movements are recognized.The lip movement detection model can be obtained by SVM or CNN+LSTM by constructing sample features.The experiment results show that speaker recognition based on image information can achieve high accuracy.(2)Abnormal sound generally represents the occurrence of abnormal events.When there is no one or only the elderly in the family,the monitor and feedback of abnormal sound by robot can help know the family situation.In this paper,a scheme of abnormal sound detection is proposed.The scheme collects dataset with Qt platform,extracts acoustic features with improved MFCC algorithm,and trains sound classifier with SVM.The scheme classifies each segment of sound as normal and abnormal,and smoothes the classification results of adjacent several segments to get the final feedback by voting when it is applied to actual test.Experiments show that the proposed abnormal sound detection scheme can achieve good results.(3)When human-computer dialogue is conducted,the multimodal information can better reflect the richness of the dialogue content.Through the existing research results,this paper can obtain seven kinds of modal information commonly used in interactive scenes;the modal information is characterized and fused with reference to natural language processing methods;and expression sentences of multimodal information fusion are obtained in the form of encoder-decoder based on deep neural network.Experiments show that this method can achieve the basic effect of multimodal information fusion.The research of multimodal human-computer interaction and dialogue is mainly presented to the public in the form of software functions.As the carrier of software,robot hardware is also very important.This paper introduces the outline,structure and circuit design of the robot based on a research robot,and verifies the above research results with it.
Keywords/Search Tags:Human-computer interaction and dialogue, Speaker recognition, Abnormal sound detection, Multimodal information fusion, Deep learning, Machine learning
PDF Full Text Request
Related items