The Research And Implement Of Naive Bayes Text Classification Algorithm

Posted on:2008-05-23

Degree:Master

Type:Thesis

Country:China

Candidate:J M Chen

Full Text:PDF

GTID:2178360215990927

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

The task of data mining is mining useful information from a mass of data. Text's mining is becoming one of the focuses of data mining with the rapid development of the Internet because that text is the main information carrier of web pages. The text classification is the base and center of text's mining.The automatic method of text classification based on machine learning was becoming main stream after 1990s stage by stage. it has short period, high efficiency, and high consistency of the results. Though automatic text classification has so many merits, the accuracy of its results is not satisfied till now. Text classification gets a wide stage in the age of the information in Internet increasing rapidly. It is confronted with opportunities and challenges, and the study focuses how to improve the accuracy of the text classification result.Naive Bayes classifier is proved to be one of the most effective classifier and be used widely. It applies statistical theory to text classification .There is an "independence hypothesis" in Bayesian classifier method: examples of the emergence of each attribute are independent from the examples of other attributes appear, the practical application of such conditions are not easily satisfied, and because of the special version of the related characters may have new meaning in a special text;First of all, this paper described text classification system, the content includes text information expressing. Extracting and the method of text classification. Subsequently article discussed Bayes classifier model and algorithm. Specifically for breaking the confine of independence hypothesis on Naive Bayes classification method, While training the text, the higher characters to relevant intensity carries out amalgamation, the experimental data indicates, this improved method can improve the algorithm accuracy appreciably.

Keywords/Search Tags:

text classification, independence hypothesis, relativity, Mutual Information

PDF Full Text Request

Related items

1	Research And Improvement To Text Classification Algorithm
2	The Study On Feature Selection Methods For Automatic Text Categorization
3	Study On Short Text Classification Algorithms Based On Mutual Information
4	Research Of Chinese Text Classification Algorithms Based On VSM
5	Text Emotional Classification Based On Text Mining
6	Classification Of Chinese Text Subject Classification And Emotion Based On Machine Learning
7	Research On Text Classification Algorithms Based On Word Vector
8	The Calculation And Use Of Anchor Text Similarity Based On Topic Relativity Of Source Web Pages
9	Study Of Mutual Information Feature Selection In Chinese Text Classification
10	The Research And Implementation Of Chinese Text Classification Based On Feature Selection And LDA