Font Size: a A A

Research And Improvement Of Automatic Text Classification Algorithm Based On The Vector Space Model

Posted on:2007-11-10Degree:MasterType:Thesis
Country:ChinaCandidate:X WangFull Text:PDF
GTID:2178360185451586Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Text categorization can provide information retrieval more efficient searching strategies and good query results. With the rapid growth of the information resources on Internet, it has become more and more important for information processing.The paper gives an overall introduction of key techniques of automatic text categorization, basing on the Vector Space Model. It analyzes the shortcomings of Vector Space Method and Bayes and put forwards better methods to improve them.The paper introduces feedback learning into Vector Space Method text classifier, let text categorization system have capability of self-learning, and it releases attribute independence assumption of Naive Bayes text classifier, an improved text classification model based on Bayes theorem called Stump Network is presented, amending the Naive Bayes text classifier. Experiment shows that two revised Text categorization model is used to the need of Text categorization, and improve the performance of former ones.
Keywords/Search Tags:Text categorization, Vector Space Method, Naive Bayes, feedback, attribute independence assumption, Stump Network
PDF Full Text Request
Related items