Research On News Classification Based On Improved Naive Bayes

Posted on:2021-04-07

Degree:Master

Type:Thesis

Country:China

Candidate:X F Lai

Full Text:PDF

GTID:2370330623481119

Subject:Statistics

Abstract/Summary:

PDF Full Text Request

With the rapid development of artificial intelligence and the continuous updating of data mining technology,text classification has become the most commonly used application scenario in natural language processing,and it has been widely used in public opinion analysis,machine translation,and chat robots.There are many text classification technologies at this stage,but Naive Bayes Classifier(NBC)has become one of the most commonly used classification models with solid mathematical theory and simple and efficient performance.The Naive Bayes classification model has good classification performance in many fields,but the classification model also has certain limitations,such as the need to meet the conditional assumptions that are independent of each other,and this conditional assumption is actually used in practice.Often difficult to satisfy.Based on this condition,it is assumed that researchers have extended the four aspects of extended structure,feature selection,feature weighting,and the combination of Naive Bayes model and other models,and have achieved good results.Based on previous research,this paper uses Principal Component Analysis(PCA)to improve the Naive Bayes classification model.Naive Bayes classification model based on principal component analysis,referred to as PCA_WNBC model.In this paper,the principal components of the principal component analysis are mutually independent,which effectively alleviates the conditional assumption that Naive Bayes is independent of each other;and then uses the variance contribution rate of the principal components as the feature weight of the attribute,eliminating the same attribute for different categories Defects of the same value(all weights are 1).After the above analysis,this paper applies the PCA_WNBC model to the example of news text classification.Using web crawler technology,use Python to crawl ten categories from the Internet,each category has 1200 articles,and a total of12,000 news texts are used as training sets.Randomly select 3000,6000,9000,and12000 articles in 12000 articles as the horizontal,NBC,PCA_WNBC,logistic regression,K-nearest neighbor,and support vector machine as the longitudinal,and evaluate each from four directions: accuracy,recall,value,and training timeClassification performance of classification models on different datasets.The conclusions are as follows: on different data sets,the accuracy of the PCA_WNBC model is about 5% higher than that of the NBC model;when the amount of data increases,the classification performance of the PCA_WNBC model is better than that of logistic regression,K nearest neighbors,and support vector machines.

Keywords/Search Tags:

Naive Bayes, Principal Component Analysis, Web Crawler, News Classification

PDF Full Text Request

Related items

1	Credit Risk Management Research Of E-business Based On Naive Bayes Model
2	Research And Application Of Several Improved Naive Bayesian Classification Algorithms
3	Improvement And Research Of Naive Bayes Classification Algorithm
4	Research On Feature Weighted Multinomial Naive Bayes Algorithms And Applicaitons
5	Empirical Research On Stock Selection Based On Naive Bayes,Linear Discriminant,Quadratic Discriminant Classification Algorithms
6	Research And Application Of Naive Bayesian Classification Algorithm In Rainfall Prediction
7	Application Of Spatial Weighting And Higher-Order Principal Component Analysis In Multivariate Geoscience Information Synthesis
8	Analysis Of Commodity Reviews Based On Text Mining
9	How To Effectively Use The Principal Component Of Principal Component Analysis
10	Research On FMRI Data Classification Method Based On Statistical Features