Font Size: a A A

Text Classification Algorithm Research Based On Naive Bayes

Posted on:2019-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:W HeFull Text:PDF
GTID:2428330590995350Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
The rapid development of Internet technology has made people enter the era of big data.As the main channel to obtain information today,the Internet is becoming more and more closely related to human beings.Then most of the information exists in the Internet is text data,so finding a method that can process text data effectively and classify text data accurately has become an important research field today.As one of the classical algorithms in machine learning algorithms,Naive Bayesian algorithm has become an important research content of text classification algorithms because of its simple model,fast classification speed and high classification efficiency.For the naive Bayesian text classification system,on the one hand,the traditional naive Bayesian theory is based on the assumption that all features are independent of each other,that is,the feature words are independent of each other.It affects the performance of the classifier to some extends,so if you can find some ways to weaken or eliminate the feature independence assumption,the performance of the classifier will be improved accordingly.On the other hand,for the massive data,if the feature extraction is not implemented,this will increase the burden of the classification system and reduce the performance of the classifier.Therefore,the paper chooses the three directions of the text classification system,and proposes feature weighting with IGDC for Naive Bayes text classification(IGDCNB),deep weighting with IGDC for naive Bayesian text classification algorithm(IGDC-DWNB),improved feature size customized fast correlation-based filter for Naive Bayes(IFSC-FCBF)text classification.The main contributions of this article:(1)We researched and improved the naive Bayesian feature weighting algorithm model,and proposed a feature weighting with IGDC for Naive Bayes text classification(IGDCNB).The model calculates the information gain of features in each category and each document in a new way,and combines the information of two dimensions by linear normalization,which greatly weakens the feature independence hypothesis of naive Bayes.(2)We researched the deep feature weighting model for naive Bayes and modifed the training method of the conditional probability of naive Bayes.On the same time IGDC is applied to the deep feature weighting of naive Bayes and proposed deep weighting with IGDC for naive Bayesian text classification algorithm(IGDC-DWNB).Experimental results show that the model can further weaken its feature condition independence hypothesis.(3)It is the first time to use the fast correlation-based filter(FCBF)for text classification.The application fields of FCBF algorithm and its defects in text classification has been sumed up.We improved the calculation method of feature correlation and optimized its algorithm steps and proposed improved feature size customized fast correlation-based filter for Naive Bayes(IFSC-FCBF)text classification.In the same feature dimensions,the superior features can be selected more quickly and consumed less time.
Keywords/Search Tags:Naive Bayes, text classification, feature weighting, deep weighting, fast correlation-based filter
PDF Full Text Request
Related items