Research On Unannotated Long Chinese Speech Text-speech Alignment

Posted on:2015-04-08

Degree:Master

Type:Thesis

Country:China

Candidate:Y Y Wang

Full Text:PDF

GTID:2298330428951922

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Text-Speech Alignment, based on the technique of Automatic SpeechRecognition, is a process of aligning the speech and text in time. In recent years, withthe rapid development of the internet, more and more speech and text data areavailable, aligning these text and speech in time is the key to use them. So theText-Speech Alignment attract more researchersâ€™ interests.The Text-Speech Alignment is the key technology in the field of speechrecognition. The conventional method use the speech recognizer to recognize thespeech to get the recognition result including the time information, and align it withthe original text to get the common part, then use this part to get the correspondspeech. These aligned data are used to train the acoustic model, evaluate speech, buildthe corpus automatically and multimedia information retrieval and etc. To improvethe accuracy and robust of this technology, we need a speech recognizer trained by alarge number of labeled data, which are obtained by using Hugeamounts of labor power as well as material, financial resources and time.This paper discusses the overseas and domestic research status and proposes aText-Speech Alignment algorithm, which is independent of the speech recognizertrained by lots of labeled data. Using this algorithm, we can obtain the aligned dataautomatically, the use them to train an continuous speech recognition system based oncontext-dependent tri-phone to show the application.The contributions of this paper mainly include the following aspects.Firstly, to rid the dependency of the labeled data, we propose a Text-SpeechAlignment based on the open speech recognizer (Google Voice Recognition, GVR)and the language model, which is constructed using the finite state automation. Usingthis algorithm, we can get the aligned speech-text data automatically. In particular, thefirst job is to submit the speech data to GVR to obtain the recognition text, but thistext doesnâ€™t include the time information, which is the key element of Text-SpeechAlignment. To get the time information, we recognize the speech again. In this recognition, the acoustic model is trained by the original speech and text data, and thelanguage model is based on the finite state automation. By this recognition, we get thetime information and complete the Text-Speech Alignment.Next, using the data obtained in the last step to train an acoustic model, thenutilize and improve the SailAlign algorithm based on this trained acoustic model toalign the speech and text data effectively to finish the corpus construction. It has beendemonstrated that the accuracy of this alignment can be increased to95%under thetext noisy is10%or less.At last, an continuous speech recognition system based on context-dependenttri-phone is constructed to test the performance of the proposed Text-SpeechAlignment algorithm. And in the feature extraction step, we add the pitch into thefeature, because of its good discrimination to the voiced and unvoiced sounds, therecognition accuracy is higher than the recognizer only based on the MEL cepstrumparameters.

Keywords/Search Tags:

Text-Speech Alignment, Finite State Automation, SpeechRecognition, Language Model

PDF Full Text Request

Related items

1	Research On Automatic Speech-Text Alignment For Mongolian Long Audio
2	Research Of Mandarin Text-Speech Alignment Based On SailAlign
3	Text-Speech Alignment Based On General Speech Recognition
4	Research And Application Of Speech And Text Automatic Alignment Technology Based On Text Similarity Algorithm
5	Research Of Long Speech And Text Alignment
6	Development Of Dai Language Text-to-speech Conversion System
7	Research On Language Model Corpus Expansion And Text Error Correction Algorithm For Speech Transcription
8	Speech-Text Soft-Alignment With Semantic And Monotonic Constraints For End-to-End Speech Recognition
9	Speech Recognition System Building Based On Finite State Graphs
10	GMM Based Connected Digits Speech Recognizer And The State Of The Art Of Language Modeling For Large Vocabulary Speech Recognizer