Font Size: a A A

Research On Ontology-Based Semantic Text Categorization

Posted on:2009-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y J HeFull Text:PDF
GTID:2178360245474833Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text classification plays an important role in text mining and information retrieval systems. It can improve the result of queries; provide intuitive navigation and browsing mechanisms; and find similar texts. Therefore, the research of text classification becomes a very important issue in text mining.The most important issue in text classification is mathematical expressions of text data. In the most classification algorithms, the text or document is always represented using Vector Space Model. This representation is very simple, but raises one severe problem: the high dimensionality of the features pace and the inherent data sparsely. In addition, this representation also can't solve text data's synonym problem and polysemy problem. All these problems interfere with classification learning processes greatly and make their performances be dramatically dropped. The main technologies to solve the problem are weight adjustment and dimensionality reduction, but these methods have their own defects. Weight adjustment doesn't solve those problems effectively, so it improves the quality of classification a little. Although dimensionality reduction solves high dimensionality, it cost highly. To prevent the problems mentioned before, this text proposed semantic feature vector representation and semantic similarity calculation based on HowNet.Finally, Chinese text classification system bases on semantic SVM, semantic k-NN and semantic simple vector distance have been tested. The experiments show that semantic classification algorithm has higher F1 value than the traditional text classification algorithm, and semantic SVM performance is best.
Keywords/Search Tags:text classification, semantic feature vector, HowNet, sememe, semantic weight algorithm, semantic similarity
PDF Full Text Request
Related items