Research & Implementation Of Multimedia Lecture Retrieval Based On Content

Posted on:2010-01-30

Degree:Master

Type:Thesis

Country:China

Candidate:Z X Liu

Full Text:PDF

GTID:2178360275970228

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

In the latest decades, there is a dramatic increase in the availability of the on-line academic lectures material. These multimedia resources could potentially change the way people learn. With the development of information retrieval based on text and automatic speech recognition technology, the problem that how to find the interested content from these multimedia academic resources has attracted more and more attention from many researchers.At present, the research work on multimedia lectures retrieval is just at the beginning stage. Most of the related work focused on the automatic speech recognition (ASR) technology, and tried to extract more useful information by mining the speech content. Meanwhile, they have also been trying to integrate some text retrieval technology to improve the retrieval performance. This thesis systematically discussed several key issues in the process of speech retrieval, and proposed some effective algorithms according to different requirements. At last, it analyzed these algorithms by the experiments. The detail work is in the following: a) Make a comparison of the state of art technologies on spoken document retrieval, and analyze the key problems in the process of spoken document retrieval.b) Discuss the problems with adaptation of language models and vocabulary in spoken document, and propose a language model adaptation algorithm based on the context of n-grams. In order to reduce the effect of OOV (out-of-vocabulary) words, it took advantage of a secondary vocabulary and proposed a method to extract new words.c) Considering the time order of indexes, it proposed an algorithm to build audio index based on posterior probability; according to the different forms of queries, it proposed two kinds of similarity criteria combining the vector model. In order to process the OOV problems, it proposed a sub-words index method based on posterior probability, and made use of a hybrid index (word level index and sub-word level index) to improve search performance. At last, it gave two pruning approaches to reduce the redundancy in the index.d) Based on the HTK package, make an implementation of the spoken document retrieval model proposed in this thesis, and analyze detail algorithms by a lot of experiments.

Keywords/Search Tags:

spoken document retrieval, audio index, posterior probability, sub-words index, model adaptation

PDF Full Text Request

Related items

1	Research On Syllable Lattice Based Chinese Spoken Document Retrieval Method
2	Research On Lattice Based Spoken Document Retrieval
3	Research On Web-based Mandarin News Retrieval
4	Audio parsing and rapid speaker adaptation in speech recognition for spoken document retrieval
5	The Research Of Index Techonology Based On Semantic Web Document
6	Studies On Affinity Propagation Based Pseudo-Relevance Feedback And Document Expansion For Spoken Document Retrieval
7	Research On Self-indexing Algorithms For Highly Repetitive Document Collections Based On FM-index
8	Research On Mandarin Spoken Document Retrieval Based On Lattice
9	Audio Index Based On LSH Distance And Retrieval System
10	Research On Document Retrieval Based On Index Optimization And Text Snippet Mechanism