Research On Automatic Speech-Text Alignment For Mongolian Long Audio

Posted on:2021-05-05

Degree:Master

Type:Thesis

Country:China

Candidate:M J Niu

Full Text:PDF

GTID:2428330620976427

Subject:Computer Science and Technology

Abstract/Summary:

Automatic Speech Recognition(ASR)system based on deep learning has been widely applied in various fields,and the acoustic model is trained on the large-scale speech database.However,at present,Mongolian speech database are relatively small,which can not meet the requirements of Mongolian large vocabulary continuous speech recognition system.Therefore,it is urgent to expand the Mongolian speech corpus.Manually recorded speech database not only costs a lot of manpower and material resources,but also has differences from actual application scenarios.In the era of big data,the Mongolian long speech and corresponding transcriptions can be obtained from internet and relevant institutions.And these resources help to expand the Mongolian speech database.Aiming at Mongolian TV drama audio,the thesis studies the Mongolian speech-text alignment methods based on ASR technology.The main contents and innovations of the thesis is described below:First of all,aiming at the speech-text alignment for Mongolian TV drama audio,the thesis realizes the automatic segmentation of Mongolian TV drama audio and improves the dialogue segmentation algorithm.The thesis uses Voice Activity Detection based on Double-Thresholding to delete mute parts in the audio.And the Hidden Markov models are built in order to detect and delete the Social Signals information that appears frequently in the Mongolian spoken dialogue.And then,the thesis segments dialogue based on Bayesian distance matrix.The experiments show that the False Detection Rate of dialogue segmentation based on Bayesian distance matrix is 4.22% lower than that of the traditional dialogue segmentation based on Bayesian information.Secondly,the thesis proposes the speech-text alignment algorithm based on the intermediate code RNN language model adaptation.The algorithm converts all Mongolian words into intermediate code and trains a general RNN language model.Then the RNN language model is fine-tuned using dramatic texts.At meanwhile,the LDA feature is connected to the RNN network to generate a topic-related adaptive RNN language model.After speech recognition using the new RNN language model,Every word in ASR results and dramatic texts is divided into stem and suffix.The algorithm will discard suffixes and leave stems.The stem is the unit of subsequent alignment.Compared with the baseline system,the proposed alignment algorithm based on the intermediate code RNN language model adaptation improves the Recall by 7.95% and the F-score by 4.88%.The alignment performance is further improved.At last,the thesis proposes the speech-text alignment algorithm based on the phone confusion matrix.The speech is decoded by the acoustic model to generate the phone sequence and the phone sequence of the dramatic texts is generated by G2 P model.At the same time,the thesis extracts a part of the speech to calculate the Mongolian phone confusion matrix.According to the confusion matrix,Levenshtein alignment algorithm and Needleman Wunsch alignment algorithm are all improved.Compared with the baseline system,the alignment algorithm based on the phone confusion matrix improves the Recall by 10.42% and the F-score by 2.97%.

Keywords/Search Tags:

Speech-text alignment, Audio segmentation, Language model, Phone alignment, Speech Recognition

Related items

1	Research On Unannotated Long Chinese Speech Text-speech Alignment
2	Research Of Mandarin Text-Speech Alignment Based On SailAlign
3	Text-Speech Alignment Based On General Speech Recognition
4	Research And Application Of Speech And Text Automatic Alignment Technology Based On Text Similarity Algorithm
5	Research On Text-Audio Alignment
6	Speech-Text Soft-Alignment With Semantic And Monotonic Constraints For End-to-End Speech Recognition
7	Audio-Visual Asynchrony Modeling and Analysis for Speech Alignment and Recognition
8	Research Of Long Speech And Text Alignment
9	Research On Automatic Construction Of Speech Corpus And Speech Minimized Labeling
10	Researching And Building Of The Mongolian Large Vocabulary Independent Continuous Speech Recognition System