Language Recognition Based On High Level Semantic Feature Extraction And Mismatch Compensation Of Data Sets

Posted on:2020-04-23

Degree:Master

Type:Thesis

Country:China

Candidate:R X Tang

Full Text:PDF

GTID:2428330590474450

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With great development of artificial intelligence research,the application of automatic speech recognition and related technologies have become more and more extensive.As an important part of automatic speech recognition,the language recognition plays an important role in many fields such as automatic speech transformation and so on.In recent years,the results of language recognition emerge in endlessly.Compared with the results in the last century,the performance of language recognition has been significantly improved.However,the language recognition still faces many problems such as useless of high-level semantic information,poor performance on short utterance and vulnerability to mismatch between different data sets.With the increasing importance of language recognition technology,a new research method is urgently needed to increase the performance of language recognition.In order to improve the performance of language recognition model,we propose two methods,i.e.,the method of long short-term memory network based on temporal information and the method of embedding vector based on high-level semantic information.In the former method,we firstly introduce the extraction ability of temporal information and high-level semantic information in language recognition.Based on this ability,we propose a new structure of network using high-level semantic information of speech segments such as bottleneck features.Furthermore,we make use of the extracted embedding vector contained the high-level semantic information instead of the traditional method,i.e.,i-vector,to build a new model of language recognition to increase the performance.Experiments show that we achieve 30.07% and 20.60% relative improvement respectively over the baseline based on ivector using cosine distance as classifier.In order to solve the problem of data mismatch in language recognition,we propose a method based on the factorized hidden variability subspace.By making use of matrix decomposition in subspace based on the relevant information of the input speech segment,this method modifies the output of statistical information pooling layer of the network to increase the performance of language recognition.Compared with the original model,the performances of the two methods are improved by about 12.6% and 23% respectively,and the performances of the baseline system which is based on i-vector using support vector machine(SVM)as classifier are improved by about 10.10% and 10.88%.The kernel function of the SVM classifier is radial basis function.

Keywords/Search Tags:

language recognition, Long Short-Term Memory network, embedding vector, factorized hidden variability subspace, channel compensation

PDF Full Text Request

Related items

1	Chinese Sign Language Recognition Based On Convolutional Network And Long Short Term Memory Network
2	Research On Video Action Recognition Based On Improved Long Short-term Memory Network
3	Research And Application Of The Short-term Memory Network For Adjusting Gate Length
4	Acceleration Gesture Recognition Based On Long-short Term Memory Network
5	Sentiment Analysis Of Short Text Based On Improved Bidirectional LSTM Neural Network
6	Research On Group Behavior Recognition Based On Multi-stream Architecture And Long Short-term Memory Network
7	Long Short Term Memory Recurrent Neural Network Application To Handwritten Recognition
8	Research On Automatic Answering Technique Of English Test
9	Research On Chinese Text Classification Method Based On Long And Short Term Memory Network
10	Online Handwritten Math Expression Label Recognition Based On Long Short Term Memory Recurrent Neural Network