Font Size: a A A

Topic Classification Of Speech Documents Based On The Word Fragment Network

Posted on:2011-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2178330332460553Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Topic classification of speech documents means dividing large number of documents into one or more different topics which are defined before according to the attribute and content of the documents. Topic classification of speech documents is one of the most important researching aspect with extensive application prospect, which rises more and more attention. As the era of multimedia comes, it is essential to classify the speech documents.Under this current situation, this paper based on the system of a traditional Chinese speech classification, achieve two recognition results by way of HTK including the one-best structure which is single access and the lattice structure which is multi-access. After analyzing characteristic of speech classification, traditional text categorization method is improved and classification system is implemented. Experimental documents are divided into 4 topics which include 748 speech documents compose with more than 8700 documents for the experimental of the system.This system uses one-best structure as the basic line of the speech classification system, in contrast of the lattice structure. The basic line is based on the framework of the traditional text classification system. In the lattice's system confusion network is used to optimize lattice. Word fragment are extracted from confusion network in text classification system, instead of the segmentation process in traditional system, and highlight the competitions to all the keyword candidates, providing enough candidates for classification. The method uses confusion network in classification structure, and highlight the competitive relationships between the candidates to avoid the error caused by the optimal path search process in the traditional method, so classification correctness is guaranteed due to the adequate candidates. And the introduction of a posteriori probability can be effectively reduces the classification errors coursed by recognition incorrectness.By use of the classical singular value decomposition method the topic-center classifier is constructed.Compared with one-best systerm,the experiment results show that this presented mothod which introduces confusion netword to the topic categorization system provides more competitions. Furthermore,it can improve classification performance when the number of term-vector is sparse.
Keywords/Search Tags:speech recognition, topic classification, lattice, confusion network
PDF Full Text Request
Related items