Text Classification Algorithm Research Based On Naive Bayes

Posted on:2019-10-10

Degree:Master

Type:Thesis

Country:China

Candidate:W He

Full Text:PDF

GTID:2428330590995350

Subject:Circuits and Systems

Abstract/Summary:

PDF Full Text Request

The rapid development of Internet technology has made people enter the era of big data.As the main channel to obtain information today,the Internet is becoming more and more closely related to human beings.Then most of the information exists in the Internet is text data,so finding a method that can process text data effectively and classify text data accurately has become an important research field today.As one of the classical algorithms in machine learning algorithms,Naive Bayesian algorithm has become an important research content of text classification algorithms because of its simple model,fast classification speed and high classification efficiency.For the naive Bayesian text classification system,on the one hand,the traditional naive Bayesian theory is based on the assumption that all features are independent of each other,that is,the feature words are independent of each other.It affects the performance of the classifier to some extends,so if you can find some ways to weaken or eliminate the feature independence assumption,the performance of the classifier will be improved accordingly.On the other hand,for the massive data,if the feature extraction is not implemented,this will increase the burden of the classification system and reduce the performance of the classifier.Therefore,the paper chooses the three directions of the text classification system,and proposes feature weighting with IGDC for Naive Bayes text classification(IGDCNB),deep weighting with IGDC for naive Bayesian text classification algorithm(IGDC-DWNB),improved feature size customized fast correlation-based filter for Naive Bayes(IFSC-FCBF)text classification.The main contributions of this article:(1)We researched and improved the naive Bayesian feature weighting algorithm model,and proposed a feature weighting with IGDC for Naive Bayes text classification(IGDCNB).The model calculates the information gain of features in each category and each document in a new way,and combines the information of two dimensions by linear normalization,which greatly weakens the feature independence hypothesis of naive Bayes.(2)We researched the deep feature weighting model for naive Bayes and modifed the training method of the conditional probability of naive Bayes.On the same time IGDC is applied to the deep feature weighting of naive Bayes and proposed deep weighting with IGDC for naive Bayesian text classification algorithm(IGDC-DWNB).Experimental results show that the model can further weaken its feature condition independence hypothesis.(3)It is the first time to use the fast correlation-based filter(FCBF)for text classification.The application fields of FCBF algorithm and its defects in text classification has been sumed up.We improved the calculation method of feature correlation and optimized its algorithm steps and proposed improved feature size customized fast correlation-based filter for Naive Bayes(IFSC-FCBF)text classification.In the same feature dimensions,the superior features can be selected more quickly and consumed less time.

Keywords/Search Tags:

Naive Bayes, text classification, feature weighting, deep weighting, fast correlation-based filter

PDF Full Text Request

Related items

1	Research On Text Classification Algorithm Based On Naive Bayes Method
2	Improvement Of Navies Bayes Text Classification Algorithm Based On Unbalanced Dataset
3	Research On Text Classification Algorithms Based On Machine Learning
4	Research On Bayesian Networks-Based Text Classification Algorithms
5	Research On Text Classification Based On Feature Selection And Feature Weighting Algorithm
6	Research On Chinese Information Classification Based On Improved Bayesian Algorithms
7	Research And Application Of Naive Bayesian Classification
8	Research On Improved Multinomial Naive Bayes Text Classification Algorithms
9	Improvement And Application Of Naive Bayes Aglorithm Based On Attribute Selection Weighting
10	Research On Chi-square Statistic Feature Selection Method And TF-IDF Feature Weighting Method For Chinese Text Classification