Font Size: a A A

Keyword Spotting Based On Sub-word Decoding And System Combination

Posted on:2020-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhaoFull Text:PDF
GTID:2518306524963219Subject:Control Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and mobile communication technology,audio data has been growing on a large scale.In order to make use of these speech information,speech keyword retrieval technology is proposed.It can efficiently retrieve the key words that users want to retrieve and return important voice information in a large amount of voice data.It has a broad application prospect in the fields of information service and public security.The retrieval algorithm based on large vocabulary continuous speech recognition is the main algorithm in the current speech keyword retrieval.However,there are still problems such as low detection rate of out-of-vocabulary and difficulty in judging candidate words.This paper focuses on the accuracy of the recognition phase in speech keyword detection,the detection of out-of-vocabulary and system fusion:(1)The speech recognition system based on Deep Neural Network(DNN)is built as the baseline system.Furthermore,we use Deep feedforward sequential memory networks(DFSMN)and time-delay Neural networks(TDNN-chain).The word error rate of TDNN-chain in THCHS-30 dataset was lower than that of DFSMN and traditional DNN,with a relative decline of 2.2% and 1.7%,respectively.(2)A keyword retrieval system based on weighted finite-state transducer is established.It is proposed that the key words score should be regulated by improving the threshold selection formula,and appropriate candidate words should be reserved by setting the threshold value.The experimental results show that the performance of keyword detection with regular confidence score is improved by 65.3%.(3)The tone information and position information are used as decoding units to detect Chinese keywords.Different from non-tonal languages such as English,Chinese has tones,and the addition of tone information or location information can help the subword language model to capture the segmentation information between words,which can reduce false alarm of keyword and out-of-focus word detection.The use of sub-words with tone information and position-related information resulted in a 30%improvement in keyword detection performance and a 54% reduction in false alarm rate.(4)A system fusion strategy based on adaptive weighting is proposed.ATWV(Actual Term Weighted Value,ATWV)is adopted as the sum and weight of non-zero terms as the coefficient,so that the weight of the subsystem is more reasonably allocated and the best keyword detection performance can be obtained.Compared with the single system detection results with the best performance,the performance of keyword detection after fusion has been improved by 12.3%.The fusion method proposed in this paper is 25% of the time cost based on the linear logistic regression method.
Keywords/Search Tags:Speech Recognition, Spoken Term Detection, Sub word, System Combination
PDF Full Text Request
Related items