Font Size: a A A

Research Of IV-PTN Hybrid System Based On Dynamic Weights For Language Recognition

Posted on:2020-06-27Degree:MasterType:Thesis
Country:ChinaCandidate:J LinFull Text:PDF
GTID:2428330590978620Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Language recognition is a technology that automatically recognizes the language of speech data.It is a front end of speech recognition and other related applications,as well as an important branch of voice technology.Similar to speaker recognition,language is also a speech feature and thus can be characterized in a specific way.In order to meet the high requirements of the current language recognition system for less response time,this paper proposes a dynamic weight-based IV-PTN fusion system to improve the performance of short-term speech while maintaining good performance in long-term speech.In this way,the system response time is reduced.This paper first introduces several language recognition methods.According to language characteristics,we can recognize a language with its acoustic features,phoneme features,prosodic features etc.;as for technical characteristics,methods include probabilistic models,neural networks models.Besides,this paper specifically emphasizes on the PTN?Phonetic Temporal Neural?language recognition system based on neural networks and the Ivector?Identity vector?language recognition system based on probabilistic model,considering industrial application trends in recent years.To enhance robustness of Ivector system,this paper combines Ivector system and PTN system and build a new IV-PTN fusion system,which results in better performance under short-term speech conditions.After visualizing different misjudgment features of systems with T-SNE algorithm,we noticed that the domains of high-frequency misjudgment between Ivector system and PTN system do not match.Therefore,we added a dynamic weight module to the back end of the fusion system.The module can automatically assign different fusion weights according to the misjudgment probabilities of different subsystems over a particular segment of speech,and thus,we achieve better performance for system fusion.AP16-OL7 and AP17-OL3,which contains ten different languages respectively,are used as data sets in the experiments,while((6(6(6(6(6?6?,EER,ER and DET are used as evaluation indicators.We discussed the PTN and Ivector language recognition methods,and build IV-PTN fusion systems with fixed weights and with dynamic weights based on the two methods.Experimental results show that the fusion system with fixed weights is of better performance than a single subsystem.Compared with fixed-weights systems and systems that model score vectors with dynamic weights,the model proposed in this paper,which models the misjudgment features with dynamic weights,can achieve better results.
Keywords/Search Tags:Language Recognition, Probabilistic Model, Ivector, Neural Network, Hybrid System, Dynamic Weight
PDF Full Text Request
Related items