Font Size: a A A

Study Of Text Categorization And Image Restoration In Modern Information Retrieval

Posted on:2007-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:T LiuFull Text:PDF
GTID:1118360185967803Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
As the information society developed rapidly, especially with the World Wide Web popularized globally, the information continues to increase exponentially. On one hand, people take advantages of the large amount of information; on the other hand, there is a growing need for tools helping people better find useful information in those tremendous amounts of information for the reason that it is difficult to separate the useful information from the redundant parts manually. In order to reach a much more efficient retrieval system, the vast data should be classified automatically. So text categorization attracts more and more attention. Some key technologies including word segmentation, feature selection, categorization algorithm in learning-based automatic text categorization and blind image restoration in image retrieval are studied in this dissertation. A series of research achievements have been obtained as the consequences of this study. The main contributions of the study are summarized as follows:Chinese word segmentation is the fundamental task and the first step for Chinese text categorization. Segmentation dictionary-based generally addresses the ambiguity problem. While non-Dictionary-based method have the high precision relatively, it is difficult to be realized for its high time and space complexity. The maximal matching (MM) method is the most common and frequently used dictionary-based method. It is a greedy search routine that walks through a sentence trying to find the longest string of character starting from a given point in the sentence that matches a word entry in a pre-compiled dictionary. The most successful segmentation dictionary-based...
Keywords/Search Tags:Information Retrieval, Text Categorization Word, Segmentation, Feature Selection, Image Retrieval, Image, Preprocessing
PDF Full Text Request
Related items