Font Size: a A A

Correlation Between The Text Classification. Word

Posted on:2008-11-01Degree:MasterType:Thesis
Country:ChinaCandidate:X F WangFull Text:PDF
GTID:2208360215997812Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the Internet, the information stored in the web isexplosively increasing,the information include text information, voice information, imageinformation and so on, the storing and transmission technique of text information isrelatively not complicated, and text is easy to upload and download, so the majorityinformation exists in the form of text, trader this background people urgently need atechnology to anylyze and filter text information rapidly. Text categorization technologycan help to solve this problem, it can effectively organize and manage text information,help users to search the destination information promptly and precisely.This paper analyes the theories and technology of Text Categorization and make aresearch on the impact of words relevancy on text categorization on the basis of Bayesclassified method.Traditional Naive Bayes Classified method is applied to many kinds of TextCategorization research because it is fast and easily used, it assumes that all attributes aremutually independent, this asuumption makes the calculation simple, but it is not true inreality, a lot of experiment also indicates that in some cases, the performance of NaiveBayes Classifier is not good.Based on above, this paper firstly studies the Naive Bayes Classifier, then modifiesNaive Bayes Classifier under the condition of considering attributes relevancy, introducesBayes Classifier based on estimating words relevancy, furthermore we discuss BayesianNetwork and make the use of the feature that Bayesian Network could express thedependence of the items to apply Bayesian Network to Text Categorization consideringwords relevancy, we make a restraint on the factor independence asuumption and introduce2-P Bayes Classifier considering strong relevant parents, at last we make experiment on thethree models: Naive Bayes Classifier, Bayes Classifier based on estimating wordsrelevancy,2-P Bayes Classifier, the result indicates that if we consider the wordsrelevancy, the performance of text categorization gets improved, words relevancy ispositive to text categorization.
Keywords/Search Tags:Text Categorization, Words Relevancy, Naive Bayes, Bayesian Network, 2-P Bayes
PDF Full Text Request
Related items