Font Size: a A A

Research Of Automatic Speech Recognition Of The Asante-Twi Dialect For Translation

Posted on:2021-02-09Degree:MasterType:Thesis
Country:ChinaCandidate:Boakye-Yiadom Adwoa AgyeiwaaFull Text:PDF
GTID:2428330602971931Subject:Speech Recognition
Abstract/Summary:PDF Full Text Request
The Automatic Speech Recognition(ASR)phase is the first and most important phase of any Speech-Translation system,with the speech database being the single most important resource.However,high-grade ASRs require a very large speech database as a resource.The AsanteTwi dialect,which falls within the Akan language,is considered extraordinarily underresourced and the collection of information is a serious obstacle to deal with.This thesis presents a new way of building low-resourced dialect ASR systems using a small database and has achieved good results in this respect.First,the characteristics of the dialect were analyzed and then a typical Asante-Twi speech database was compiled manually,setting the stage for many more speech recognition works and putting not only the Asante-Twi dialect,but also Ghana,Africa,on the globe.In order to help with the selection of a good algorithm and features for a good speech recognition system of Asante-Twi speech,three different ASR systems with different features and methods have been built using the Kaldi toolkit.The three ASRs were analyzed and compared with each other because no Asante-Twi dialect-related work has been carried out and there is therefore no comparative reference point.To enhance the performance of the ASR systems,all feature extraction methods of the systems are improved using Cepstral Mean and Variance Normalization(CMVN)and Delta(?)dynamic features.In addition,the acoustic model unit of each ASR system using the GMM-HMM pattern classifier algorithm has been improved by training two context dependent(triphone)models,one on top of the other,and both on top of context independent(monophone)models to deliver better performance.For the first Asante-Twi ASR system,MFCC feature extraction method was used.For the second Asante-Twi ASR system,MFCCs with different context dependent parameters from the first ASR were used,and for the third ASR system,PLP feature extraction method was used.The SRILM Language Modeling toolkit was used for modeling N-gram language models.All ASR systems were developed on 348 different utterances using 29 speakers of different age groups and genders in a noise-free environment.70% of this speech database was used to train all the systems and 30% was used to for testing.Word Error Rate(WER)and Sentence Error Rate(SER)are used as metrics for measuring the accuracy performance of the systems.As the correct parameter setting for triphone models were used,the second ASR system saw about 50% reduction in %WER and %SER values for the first triphone model and about 10% reduction in %WER and %SER values for the second triphone model as compared to the first ASR system.Decoding results show that the second ASR system was the most efficient system of all the ASR systems in percent WER and SER because it produced the lowest values of 5.15% WER and 5.56% SER obtained from context dependent triphone models.The third ASR system,using the same triphone parameters as the second ASR,was the worst performing of all three.Thus,MFFCs are found to be the most suitable feature extraction technique when using noise-free data with context-dependent acoustic models being the best method for GMM-HMM acoustic modeling on a limited amount of data.However,the second ASR system can be further improved by using new,robust features such as the discriminative learning approach used for triphone modeling(i.e.adding LDA+MLLT and LDA+MLLT+SAT to delta features of the MFCC)by first increasing the Asante-Twi dataset to a larger amount of data to pave way for further improvement of a reliable speech recognition framework using DNNs.
Keywords/Search Tags:Asante-Twi dialect, Automatic Speech Recognition(ASR), Kaldi toolkit, MFCC, PLP, GMM-HMM
PDF Full Text Request
Related items