Font Size: a A A

Research On Mandarin Speech Retrieval Technique Based On Confusion Network

Posted on:2011-04-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X S HuangFull Text:PDF
GTID:1118330332460137Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid improvement of internet and multimedia technology, overwhelming audio documents emerge everyday. Therefore in the field of information processing, it has been a hotspot how to search and classify these speech documents effectively. At present, the research on speech retrieval has been mostly based on statistical pattern recognition theory, in which speech signal is considerd from two aspects:acoustic level and language level. As a rising technology, lattice to the moment is capable of describing these two things. The lattice reserves several candidate results in the searches stage in the form of transcript, so it especially adapts to the task of speech document retrieval. Confusion network which is obtained by lattice pruning, is more compact in structure and can improve the correct recognition rate. Hence it's very promising that a lattice is used as the inputs of a speech retrieval system. Technology of speech retrieval based on lattice and confusion network has been paid more and more attention.Lattice building and indexing strategy in the process of query searching are two significant composition of the technology of speech retrieval. The thesis firstly studied the generation of a confusion network, the searching strategy in the process of indexing and the calculation of confidence measure in a speech signal retrieval system. Whereafter we put emphasis on discussing how to enrich the information about acoustic and language level feaures. The approaches were proposed that tone information model was appended to acoustic model and prosody information model was appended to language model. The main contributions in the thesis were shown in the following aspects:Firstly, as the segmentation results of continuous speech was dissatisfactory in the case of low SNR, an approach based on candidate selection was proposed for continuous speech signal segmentation. The method used a candidate selection for sevral segmentation obtained by different methods, in order to increase the correct rate. Our experiments showed that the results of our proposed method were more close to manual segmentations.Secondly, aiming at the technology of speech retrieval based on lattice structure, an algorithm based on privot was proposed for the generation of a confusion network. With no distinct decrease of indexing performance, the structure of lattice was more compact. The size of indexing was cut short and the additional information was more abundant. At the same time, aiming at the searching strategy, an improved DMLS method was proposed to compensate the errors of inserting, deleting and substituting errors using minimum edit distance in a syllable recognition machine in the stage of indexing. Furthermore, for the calculation of confidence measure in a speech retrieval system, another method was proposed using mutual information of two neighbouring syllables as the confidence measure. And we got a novel confidence measure by combining mutual information of context syllables with posterior probability of a syllable. The validity of our poposed approaches was proved through simulation experiments.Thirdly, in order to attain more comprehensive information in the confusion network, tone model was built to merge into a confusion network to improve the whole performance of a speech retrieval system. Furthermore, tone nucleus was used to extract tone features instead of a whole syllable, on the basis of which we constructed a multi-space probability distribution HMM tone model based on tone nucleus. To obtain the ultimate acoustic model of a confusion network, the built tone model was combined with original acoustic model in a confusion network constructure. Speech retrieval experiments were implemented with original language model unchangeable. The results of our experiment proved that tone features were effective appendex information in a speech retrieval system.Finally, to improve the performance of a speech retrieval system, we attempted to append prosody information in a confusion network. First of all we studied the problem of prosody event detection with acoustic features, lexical features and syntax features separately. The built prosody model was confused to original acoustic model and language model in confusion network. The speech indexing experiments indicated that prosodic features did help to improve the performace of a speech retrieval system. In conclusion, the thesis has researched on the issue of Chinese continous speech based on confusion network. The main contributions were the amelioration for the generation of a confusion network and the search strategy in the process of indexing. The method based on privot was proposed for the generation of confusion network, and the improved DMLS method was proposed for the searches stage. Additionally, appendent information was exploited to optimize our speech retrieval system aiming at acoustic model and language model. We combined tone features information with original acoustic model and combined prosodic features information with original language model. The experiment results indicated that our proposed approaches could attain good results and did improve the performance of a speech signal retrieval system.
Keywords/Search Tags:Speech retrieval, Confusion network, Lattice, Tone recognition, Prosody detection
PDF Full Text Request
Related items