Font Size: a A A

The Study Of Chinese Text Categorization Based On Concept

Posted on:2006-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:Z F WuFull Text:PDF
GTID:2168360155450334Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text categorization is such a procedure that it can classify the text automatically by computer, and the categories have been defined before classify. It's a hot topic in our study area and it's also a basic work in the area of natural language disposal.The method of text categorization that based on concept is such a method that it uses the concept instead the term as its feature. So that terms that have the same meaning will be linked to the same concept and one term that have different meanings in different areas will be divided into several different concepts. As the feature the concept is better than term. By using such method , we overcomes the weakness of using the term as the feature directly.In this paper we used the "HotNet" knowledge system to map the feature from the keyword space to the concept space. And In our system of text categorization we put forward three new ideas. The first, we found the names of people or institute have the relation of synonym and some of the names also appeared in varied areas. These name need to mapping to concept space too. The second, we improved the method of weight computation and we prove it's better than the TFIDF method. The third, we modified the naiVe Bayes algorithm and our result is better than before.
Keywords/Search Tags:Text Categorization, Naive Bayes, KNN, HowNet, Chinese Word Segmentation
PDF Full Text Request
Related items