Font Size: a A A

Research Of Chinese Text Categorization Algorithms Based On Information Entropy

Posted on:2008-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2178360215968990Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, Web has been developed into a global, massive, distributed and shared information space. It provides a new means for people to search information. But with the explosive increase of information on the Internet, it avalanches abundance irrelevant information with user's request and the relevant information for user is covered up. In the complicated information, automatic classifier plays an important role in finding the needed information and in effectively using the shared information. It improves the efficiency of information retrieval by effectively organizing and managing information.This paper firstly introduces the research status of text categorization at home and abroad, secondly we study and discuss the key technique of text categorization, including Text express model, Chinese word segment, Feature selection and Classify methods. Focusing on the Chinese word segmentation technology, once again we propose a new key phrase extraction algorithm. Experiments show that our extraction system can cut out named entity basically. Next, we introduce the frame of text categorization system based on entropy. Using information entropy theory, we present a new text categorization method. It takes entropy to measure the contribution that the new text makes to categorization set, and uses this entropy value to judge which the new text will belong to. Finally, we design and verify the text categorization model based on entropy. The experimental results show that the performance of text categorization model based on entropy is a relatively stable algorithm, and prove the effectiveness of the algorithm.
Keywords/Search Tags:Text categorization, Feature selection, Chinese word segmentation, Categorization algorithm, Information entropy
PDF Full Text Request
Related items