Font Size: a A A

Research On Unstructured Information Management Of Digital TV

Posted on:2008-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:L ZhaoFull Text:PDF
GTID:2178360212976217Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of digital television, the number of TV programs grows significantly, so how to quickly find the programs that the user likes becomes an urgent task. An efficient way is using an index.While both the usually used database and the full-text indexing is not suitable for this. First, Traditional database are not suitable for management those unstructured text. Second, full-text indexing is not suitable for our digital TV media database which will be used in an embedded system. For example, the inverted index model which has good performance needs to segment the text into words. The most commonly used Japanese segmentation tool-Chasen has too much space cost (more than 23M), and can not extract word that represent the text's meaning, so all the words in the text will be indexed, this makes much more space cost. In addition, digital TV program database requires better dynamic performance, but researches on how to improve dynamic performance are only a little.This structure of this thesis is as follow: First, we discuss the index model and the factors which affect dynamic performance. Second, we present a Japanese segmentation tool with some feature extraction functions which is suitable for using in an embedded system. Third, we present an improved hybrid index update strategy for inverted index model, theoretical analysis show that shows it has a better dynamic performance. The main content and results of this thesis are as follows:1. Compare and analysis the virtue and shortcomings of the index models usually used2. Present a Japanese segmentation tool with some feature extraction...
Keywords/Search Tags:Text indexing, segmentation Dictionary, Inverted index, Hybrid index update strategy
PDF Full Text Request
Related items