Feature Comparison And System Improvement In Multilingual Recognition

Posted on:2020-04-26

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Li

Full Text:PDF

GTID:2518306518967159

Subject:Electronics and Communications Engineering

Abstract/Summary:

PDF Full Text Request

This paper focuses on the recognition of multilingual and studies both speech features and system models.details as follows:The paper aims at solving the problem of poor recognition of short utterance with a duration less than 1s?confusing and open-set language recognition,using I-VECTOR model as background to compare the performance of different speech features in above three tasks.Experiments show that Bottleneck feature(BNF)which surpasses the Shifted Delta Cepstra(SDC)?the shifted-delta Phone Log likelihood Ratio(SD-PLLR)and the Sub-band Envelope Features achieves the best recognition result in the above three tasks.In order to further improve the recognition results,the paper proposes some optimization schemes for the above BNF-I-VECTOR system.First,The paper uses noise reduction and variable-speed enhancement in the front-end of system,then replaces the traditional Probablistic Linear Discriminate Analysis(PLDA)?Cosine Distance Scoring(SDC)models with Extreme Gradient Boosting(XGBoost)?Random Forest(RF)and Support Vector Machine(SVM)of machine learning classification models.Combining above optimization schemes and proposes the best improved system for different test tasks through experiments.This paper studies the short utterance language recognition model and compares the recognition results of I-VECTOR?Phone Recognition Followed By Language Model(PRLM)?Parallel Phone Recognition followed by Language.Model PPRLM)?end-to-end time delay neural network(TDNN)?X-VECTOR based on TDNN network structure and Bi-directional Long Short-term Memory Networks(BLSTM)neural network in short utterance recognition.Experiments show that the I-VECTOR models has the best recognition result in the above models.In order to further improve the result of short utterance language recognition,the Speaker ResNet network which is based on the TDNN residual module is studied,and a improved Speaker ResNet system which fuses the noise reduction?TSM algorithm?multihead-attention and learnable dictionary coding layer(LDE)is proposed.The improved Speaker ResNet system surpasses the baseline Speaker ResNet system and gets the best recognition results on short utterance language recognition with a duration less than 3 seconds.

Keywords/Search Tags:

Short Utterance Language Recognition, Confusing Language Recognition, Speech Feature, Language Recognition Model, Neural Network

PDF Full Text Request

Related items

1	Research On Sign Language Recognition In Sign Language To Speech Conversion
2	Researching Of The Mogolian Language Model Based On Speech Recognition
3	Research And Implementation Of Mongolian-Chinese Mixed Language Speech Recognition System Based On Deep Learning
4	Application Research On Statistical Language Model Of Large Vocabulary Continuous Speech Recognition System
5	Recurrent Neural Network Language Model For Continuous Speech Recognition
6	Research On Tibetan Language Model For Continuous Speech Recognition
7	Researching And Building Of The Mongolian Large Vocabulary Independent Continuous Speech Recognition System
8	Cross-language End-to-end Speech Recognition Research For Endangered Language
9	Research On Statistical Language Model Of Large-Vocobulary Continuous Speech Recognition System
10	Design And Implementation Of Speech Recognition System Based On DNN-LSTM