Font Size: a A A

Research On Improved Naive Bayes Classification Model For Imbalanced E-commerce Review Text

Posted on:2020-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:D X ZouFull Text:PDF
GTID:2438330590457913Subject:Statistics
Abstract/Summary:PDF Full Text Request
With the gradual maturation of the application of Big Data technology,text mining has gradually become a hot topic in data mining.E-commerce review data is a text mining example.Every day there will be a large number of orders,resulting in huge evaluation data,which is the application of text mining technology.It can mine the potential information containing great value,to help businesses understand customer needs,so that consumers can buy their favourite items.So text mining technology has a significant application in e-commerce review data.In this paper,we will focus the sample imbalance problem of e-commerce review data.The purpose of this paper is to optimize the Na?ve Bayes algorithm,which can effectively improve the accuracy of text classification,according to the characteristics of sample imbalance in e-commerce commodity review data.In order to improve the classification accuracy of unbalanced e-commerce review data,this paper mainly carries out work from three aspects: sample space,model algorithm and Ensemble Learning:(1)If we do not modified the sample space of unbalanced data,then the results of the classification will be easier to favor the more categories of samples.This paper combines the sampling method and the Word Mover's Distance weighted method based on Word2 Vec to build balance sample structure.(2)Further more,it will optimize the Multinomial Naive Bayes algorithm,so that the model can make full use of the sample information for model training.(3)In order to improve the predictive ability of the model,considering the method of Ensemble Learning.On the basis of the optimized Na?ve Bayes algorithm proposed in this paper,the integration of the model,repeated iterative training of several weak classifiers,will finally form a strong classifier.Compared with a single classifier,the integrated classifier has a better generalization ability.
Keywords/Search Tags:Review data, Sentiment classification, Unbalanced data set, Na?ve Bayes, Sample weighted, Word Mover's Distance, Ensemble Learning
PDF Full Text Request
Related items