Font Size: a A A

Text Classification Method Based On WordNet

Posted on:2009-09-11Degree:MasterType:Thesis
Country:ChinaCandidate:D W LiuFull Text:PDF
GTID:2178360242489900Subject:Computer applications
Abstract/Summary:PDF Full Text Request
This thesis introduces the superiority of importing the concept in the condition that the dimensions are too many for the vector space used in the text representation, the semantic information included is not enough and the limitations of the morphology statistics in the vector space. It introduces the structure of the semantic dictionary WordNet, many kinds of relationship among the concepts and the definition and applications of the Concept Chain. Each synset(synonymy set) in the WordNet can be seen as a concept that makes clear for the semantics. The lemma of text will be replaced by its synsets. This paper represents a Concept-based Vector Space Model which reflects the more abstract version of the semantic information instead of the Eigenvector Space Model for the text. This model adjusts the weight of the Eigenvector Space by importing the hypernymy-hyponymy relation between synonymy sets and the Concept Chain in the WordNet, the summing degree of the concept and the frequency of the concept inverse category.This thesis presents an algorithm based on the semantic analysis for the text classification. It gives a system for the text classification in which the text classification technical and the information supplied by the WordNet are integrated and the Naive Bayes classifier is used. This paper gives the details of the building of the Vector Space Model based on the semantics. The experiments are carried on the system based on the semantics and the system based on the lemma. The experimental results show the model is feasible and efficient and it can achieve higher precision ,recall and Fl-value.
Keywords/Search Tags:Text Classification, WordNet, Concept Vector Space Model, Bayesian classifier
PDF Full Text Request
Related items