Font Size: a A A

Feature Comparison And System Improvement In Multilingual Recognition

Posted on:2020-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q LiFull Text:PDF
GTID:2518306518967159Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
This paper focuses on the recognition of multilingual and studies both speech features and system models.details as follows:The paper aims at solving the problem of poor recognition of short utterance with a duration less than 1s?confusing and open-set language recognition,using I-VECTOR model as background to compare the performance of different speech features in above three tasks.Experiments show that Bottleneck feature(BNF)which surpasses the Shifted Delta Cepstra(SDC)?the shifted-delta Phone Log likelihood Ratio(SD-PLLR)and the Sub-band Envelope Features achieves the best recognition result in the above three tasks.In order to further improve the recognition results,the paper proposes some optimization schemes for the above BNF-I-VECTOR system.First,The paper uses noise reduction and variable-speed enhancement in the front-end of system,then replaces the traditional Probablistic Linear Discriminate Analysis(PLDA)?Cosine Distance Scoring(SDC)models with Extreme Gradient Boosting(XGBoost)?Random Forest(RF)and Support Vector Machine(SVM)of machine learning classification models.Combining above optimization schemes and proposes the best improved system for different test tasks through experiments.This paper studies the short utterance language recognition model and compares the recognition results of I-VECTOR?Phone Recognition Followed By Language Model(PRLM)?Parallel Phone Recognition followed by Language.Model PPRLM)?end-to-end time delay neural network(TDNN)?X-VECTOR based on TDNN network structure and Bi-directional Long Short-term Memory Networks(BLSTM)neural network in short utterance recognition.Experiments show that the I-VECTOR models has the best recognition result in the above models.In order to further improve the result of short utterance language recognition,the Speaker ResNet network which is based on the TDNN residual module is studied,and a improved Speaker ResNet system which fuses the noise reduction?TSM algorithm?multihead-attention and learnable dictionary coding layer(LDE)is proposed.The improved Speaker ResNet system surpasses the baseline Speaker ResNet system and gets the best recognition results on short utterance language recognition with a duration less than 3 seconds.
Keywords/Search Tags:Short Utterance Language Recognition, Confusing Language Recognition, Speech Feature, Language Recognition Model, Neural Network
PDF Full Text Request
Related items