Font Size: a A A

Research On Text Classification Algorithm Based On Naive Bayes Method

Posted on:2021-01-14Degree:MasterType:Thesis
Country:ChinaCandidate:B W ZhaoFull Text:PDF
GTID:2428330614453844Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,information technology has developed rapidly.Internet users have entered a new era.Massive data has also brought users an unprecedented experience.Although users can retrieve more information to meet a wider range of needs,advances in science and technology are often accompanied by some new problems.A large amount of raw data is cluttered,which brings great inconvenience to users,so text classification technology came into being.The text classification technology can automatically classify texts based on the feature words contained in the text,and has been widely used in information retrieval,natural language processing,and other fields.At present,many methods have been applied to text classification,such as Naive Bayes,KNN,decision tree,SVM,etc.,but how to choose efficient and accurate methods to achieve better results in text classification is an urgent problem.This paper mainly focuses on the research of Naive Bayes algorithm and proposes two improved Naive Bayes algorithms: one is a weighted Naive Bayes text classification algorithm based on Poisson distribution,and the other is based on feature depth weighting.Naive Bayes Tree Text Classification Algorithm.The main work of this paper is as follows:(1)The research background and development status of text classification are introduced,the definition of text classification is explained,the specific process of text classification,the algorithm principles and advantages and disadvantages of several classic classifiers are introduced in detail.(2)A weighted Naive Bayes text classification algorithm based on Poisson distribution is proposed,which improves the problem of insufficient accuracy of Naive Bayes algorithm in text classification.The Poisson random variable is first introduced into the derivation process of Naive Bayes,and then the text feature words are weighted by the information gain rate,which weakens the influence of the attribute independence assumption on the classification accuracy.Finally,experiments on two sets of classic datasets 20-newsgroups and Sogou News dataset show that the method has greatly improved the accuracy,recall,and F1 value compared with several other algorithms such as KNN and SVM,and has guaranteed the execution efficiency.At the same time,the classification accuracy is improved.(3)A naive Bayes text classification algorithm based on feature depth weighting is proposed to further improve the accuracy of text classification of Naive Bayes algorithm.Through a hybrid model combining a decision tree algorithm and a naive Bayes algorithm,it takes full advantage of the characteristics of Naive Bayes to perform better on small data sets.First,a decision tree is constructed to filter large data sets layer by layer,and then A naive Bayesian model is constructed on the leaf nodes of the decision tree for a small amount of data,and the naive Bayes algorithm is weighted based on the depth of features appearing in the decision tree,which not only guarantees the integrity of the data but also weakens the independence of attributes Assumed impact.Finally,the experimental results on the 20-newsgroups and Sogou News datasets demonstrate that the method proposed in this article greatly improves the accuracy of text classification compared to several other algorithms such as Naive Bayes,Decision Tree and SVM,and the execution time is maintained at the same level as Naive Bayes and decision tree related algorithms.The effectiveness of the method is shown.
Keywords/Search Tags:Text Classification, Naive Bayes, Poisson Distribution, Feature Weighting, Decision Tree
PDF Full Text Request
Related items