Font Size: a A A

User Intention Understanding And Feedback Generation For Intelligent Speech Interaction

Posted on:2018-05-06Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y S NingFull Text:PDF
GTID:1368330566488075Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently intelligent speech interaction systems have attracted the attention of majority users and become widespread in all segments of society.The main purpose of these systems is to provide the feedbacks that satisfy users' speech queries.In these systems,how to analyze user intention,and generate harmonious,natural and expressive feedbacks is not only the most critical problem.In this paper,based on the large scale of multimodal data on the social network platforms,we construct the cross-modal emphasis detection model,the user intention understanding framework and the expressive speech synthesis model.The main contributions of this paper are summarized as follows:Firstly,we propose an emphasis detection model based on Bayesian network using local intonational features.To consider the local prominence of emphatic speeches,we first integrate the local intonational features(such as the tilt parameter)and global acoustic features(such as fundamental frequency,duration and energy,etc),and then improve the emphasis detection performance of the specific style of corpus by leveraging the ability to model the correlation between features.Secondly,we propose a low-resource emphasis detection framework based on bidirectional long short-term memory(BLSTM),which uses multilingual BLSTM to learn the cross-lingual knowledge among different languages to improve the emphasis detection performance of low-resource languages with the large scale of complementary data.To address the problem that the emphasis detection performance will be degraded with limited training data,we propose a multilingual BLSTM model structure,which uses the cross-lingual knowledge to benefit both of the low-resource and high-resource languages.Thirdly,we propose a multi-task deep learning framework for user intention understanding,which first uses multi-task learning strategy to model the correlations of the text focus that encodes semantically salient information and the speech emphasis that represents the prominence of the utterance,and then combines the location information that reflects users' dialect conventions to understand the intentions in speech interaction systems.To improve user experience of the existing speech interaction systems,we propose a multi-task deep learning model.In this model,we first analyze users' language expression characteristics on the Internet.Through data observation,we then integrate the text focus that encodes semantically salient information and the speech emphasis that represents the prominence of the utterance,and further incorporate location information that reflects users' dialect conventions with focus and emphasis for user intention understanding.Finally,we propose an imporved emphatic speech synthesis framework based on context parameters in the decision tree.To address the limited training data problem,this framework adopts large scale of neutral corpus to extend the leaf nodes of the general decision tree based on speaker adaptation techniques,thus improving the naturalness and voice quality of the synthesized speeches.Besides,we also propose a feedback generation method based on intention understanding and construct a technique architecture of intelligent speech interaction by integrating user intention to generate feedbacks according to users' real demand,thus enhancing the satisfaction of intelligent speech interaction.
Keywords/Search Tags:Emphasis, Emphasis Detection, Emphatic Speech Synthesis, User Intention Understanding, Feedback Generation
PDF Full Text Request
Related items