Font Size: a A A

The Research And Implement Of Naive Bayes Text Classification Algorithm

Posted on:2008-05-23Degree:MasterType:Thesis
Country:ChinaCandidate:J M ChenFull Text:PDF
GTID:2178360215990927Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The task of data mining is mining useful information from a mass of data. Text's mining is becoming one of the focuses of data mining with the rapid development of the Internet because that text is the main information carrier of web pages. The text classification is the base and center of text's mining.The automatic method of text classification based on machine learning was becoming main stream after 1990s stage by stage. it has short period, high efficiency, and high consistency of the results. Though automatic text classification has so many merits, the accuracy of its results is not satisfied till now. Text classification gets a wide stage in the age of the information in Internet increasing rapidly. It is confronted with opportunities and challenges, and the study focuses how to improve the accuracy of the text classification result.Naive Bayes classifier is proved to be one of the most effective classifier and be used widely. It applies statistical theory to text classification .There is an "independence hypothesis" in Bayesian classifier method: examples of the emergence of each attribute are independent from the examples of other attributes appear, the practical application of such conditions are not easily satisfied, and because of the special version of the related characters may have new meaning in a special text;First of all, this paper described text classification system, the content includes text information expressing. Extracting and the method of text classification. Subsequently article discussed Bayes classifier model and algorithm. Specifically for breaking the confine of independence hypothesis on Naive Bayes classification method, While training the text, the higher characters to relevant intensity carries out amalgamation, the experimental data indicates, this improved method can improve the algorithm accuracy appreciably.
Keywords/Search Tags:text classification, independence hypothesis, relativity, Mutual Information
PDF Full Text Request
Related items