Font Size: a A A

Research And Application Of Naive Bayesian Classification

Posted on:2020-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:X Y ZhouFull Text:PDF
GTID:2428330599453370Subject:Applied statistics
Abstract/Summary:PDF Full Text Request
This thesis mainly studies Naive Bayesian text categorization method.Through attribute selection and attribute weighting,the weight of attributes in the model is constantly adjusted,and the traditional Naive Bayesian method is improved.Different methods to measure the correlation between attributes are studied.Random forests are selected to determine the final subset of attributes.Subsequently,the number of topics is determined because the category of comments is unknown beforehand.This thesis uses LDA topic model to estimate the optimal number of categories by using maximum likelihood function.Then,attribute weighting is carried out.Because each attribute has different effects on different categories,it will also assign a weight value to different categories of attributes.In this thesis,TF-IDF and DC-TF-IDF are used to weigh the attributes respectively.By comparing the accuracy of the model,it is found that the accuracy of TF-IDF weighted Naive Bayesian classifier is one point lower than that of DC-TF-IDF weighted Naive Bayesian classifier in both film review data and Tmall's commentary data.Therefore,DC-TF-IDF weighted Naive Bayesian classifier is selected to evaluate the data.The data of a woman's clothes commentary of Tmall were classified and predicted,and the scores and comprehensive scores of all kinds of film reviews were calculated.Through this classification method,we can classify the comments reasonably,and provide a more convenient and clear comment content for the general consumers,which is convenient for everyone to browse and refer to.
Keywords/Search Tags:Naive Bayes, Property selection, Attribute weighting
PDF Full Text Request
Related items