Font Size: a A A

Language Recognition Based On High Level Semantic Feature Extraction And Mismatch Compensation Of Data Sets

Posted on:2020-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:R X TangFull Text:PDF
GTID:2428330590474450Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With great development of artificial intelligence research,the application of automatic speech recognition and related technologies have become more and more extensive.As an important part of automatic speech recognition,the language recognition plays an important role in many fields such as automatic speech transformation and so on.In recent years,the results of language recognition emerge in endlessly.Compared with the results in the last century,the performance of language recognition has been significantly improved.However,the language recognition still faces many problems such as useless of high-level semantic information,poor performance on short utterance and vulnerability to mismatch between different data sets.With the increasing importance of language recognition technology,a new research method is urgently needed to increase the performance of language recognition.In order to improve the performance of language recognition model,we propose two methods,i.e.,the method of long short-term memory network based on temporal information and the method of embedding vector based on high-level semantic information.In the former method,we firstly introduce the extraction ability of temporal information and high-level semantic information in language recognition.Based on this ability,we propose a new structure of network using high-level semantic information of speech segments such as bottleneck features.Furthermore,we make use of the extracted embedding vector contained the high-level semantic information instead of the traditional method,i.e.,i-vector,to build a new model of language recognition to increase the performance.Experiments show that we achieve 30.07% and 20.60% relative improvement respectively over the baseline based on ivector using cosine distance as classifier.In order to solve the problem of data mismatch in language recognition,we propose a method based on the factorized hidden variability subspace.By making use of matrix decomposition in subspace based on the relevant information of the input speech segment,this method modifies the output of statistical information pooling layer of the network to increase the performance of language recognition.Compared with the original model,the performances of the two methods are improved by about 12.6% and 23% respectively,and the performances of the baseline system which is based on i-vector using support vector machine(SVM)as classifier are improved by about 10.10% and 10.88%.The kernel function of the SVM classifier is radial basis function.
Keywords/Search Tags:language recognition, Long Short-Term Memory network, embedding vector, factorized hidden variability subspace, channel compensation
PDF Full Text Request
Related items