Font Size: a A A

Robust Spoken Intent Recognition Based On Deep Learning

Posted on:2022-05-29Degree:MasterType:Thesis
Country:ChinaCandidate:J QinFull Text:PDF
GTID:2518306551970939Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
Intent recognition is essentially a text classification task in natural language processing.Intent recognition tasks can be divided into spoken intention recognition and text intention recognition.Spoken intention recognition uses the text output by speech recognition as input,so the accuracy of spoken intention recognition will be affected by the accuracy of speech recognition.Generally,the accuracy of speech recognition will be affected by environmental noise,and most problems can be solved by speech enhancement and speech noise reduction.However,due to the diversity of speech expressions and the uniqueness of each person's pronunciation,there is still a gap between the recognition content of speech recognition system and the real expression content of users.Aiming at the problem of low prediction accuracy and poor robustness of the downstream spoken intention recognition model caused by inaccurate speech recognition text,this paper combined with other useful information output by speech recognition,such as N-best text,phonemes,sentence scores,etc.,and carried out a research on the robustness of spoken intention recognition by using deep learning algorithm.The main work of this paper is as follows:(1)this paper combines the output of N-best text,acoustic model score,and language model score from the speech recognition module to propose a discrete bucketing method and a weighted sentence vector method to improve the robustness of spoken intention recognition.Generally,the speech recognition module will produce multiple recognized texts for a piece of input audio,and each recognized text corresponds to an acoustic model score and a language model score.In order to make better use of these information,the discrete bucket method divides the corresponding n-best text into different sections according to the acoustic model scores and language model scores for splicing experiments,and coarse-grained fusion of n-best text information and corresponding sentence score information.The weighted sentence vector method gives n-best text different weights by calculating the normalized acoustic model score and language model score,and integrates n-best text information and corresponding sentence score information in a fine-grained way.The two methods make full use of the output information of speech recognition module and improve the robustness of spoken intention recognition.(2)Existing pre-training language models usually use standard written text for pre-training,and perform poorly on noisy speech recognition text.Based on the Bert pre-training model,this paper proposes a pre-training method combining sentence phoneme information,which enhances the representation ability of word vectors in the pre-training language model at the pronunciation level and improves the robustness of spoken intention recognition.(3)Adversarial training is a common method to improve the robustness of deep learning tasks.For the spoken intention recognition task,this paper proposes an adversarial training method based on the existing 1-best adversarial training method combined with speech recognition N-best text,which further improves the prediction accuracy of spoken intention recognition in speech recognition text.The research of this paper is carried out on the data set of Snips and logistics outbound calls,and sufficient comparative experiments are designed and completed to verify the effectiveness of the proposed discrete bucket method,weighted sentence vector method,pretraining language model method combining phoneme information,and adversarial training method.The final experimental results show that the proposed method can effectively improve the robustness of spoken intention recognition.
Keywords/Search Tags:Intention Recognition, Robustness, Phoneme, Pre-training
PDF Full Text Request
Related items