Font Size: a A A

Research On Improved Multinomial Naive Bayes Text Classification Algorithms

Posted on:2019-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:L G ZhangFull Text:PDF
GTID:2348330566458595Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the explosive growth of text information on the Internet,automated processing of massive text data has become a challenge.Text automatic classification is an important way to divide text into predefined classes,which can help people to retrieve,query,and utilize information.Common text classification algorithms include: Bayesian networks,decision trees,support vector machine,and artificial neural networks.Because of its unique expression of uncertainty knowledge,rich probabilistic expressive ability,and integrated incremental learning characteristics of prior knowledge,multinomial na?ve Bayes has become one of the most popular methods among many methods.The main assumption of multinomial na?ve Bayes is that that attribute variables are independent of each other given the class of the document,which is obviously not ture in reality.As some methods for mitigating attribute conditional independence assumptions of multinomial na?ve Bayes,scholars have proposed many improved algorithms from five directions: attribute weighting,attribute selection,instance weighting,instance selection and structure extension.Current research results show that few attribute weighted and attribute selected methods can significantly improve the classification accuracy while maintaining its model simplicity and low time complexity.In addition,no shcolar has discriminatively choosen different models from the direction of model selection to improve the classification performance of multinomial na?ve Bayes.In view of these deficiencies,this paper makes an in-depth study on improving multinomial na?ve Bayes from the directions of attribute weighting,attribute selection and model selection.Specifically,combined with the efficiency of attribute weighted filters and the effectiveness of deep attirubte weighting,two attribute weighted na?ve Bayes improved algorithms are proposed;Combining with efficiency of attribute selected filters and effectiveness of attribute selected wrappers,an attribute selected multinomial na?ve Bayes improved algorithm is proposed;Integrated with the complementary to class distribution of multinomial na?ve Bayes and complement niave Bayes,a discriminative model selection improved multinomial na?ve Bayes algorithm is proposed.In summary,the main comtributions and innovations of this paper include:1)Two attribute weighted multinomial naive Bayes text classification algorithms are proposed.One is an information gain ratio-based attribute weighted multinomial na?ve Bayes text classification algorithm,the other is a decision tree-based attribute weighted multinomial na?ve Bayes text classification algorithm.These two algorithms respectively combine the efficiency of information gain ratio attribute weighting and that of decision tree attribute weighting with the effectiveness of deep attribute weighting.They can not only significantly improve the classification accuracy of multinomial na?ve Bayes,but also have lower time complexity compared to CFS-based attribute weighted metrhod.2)An information gain ratio-based attribute selected multinomial na?ve Bayes text classification algorithm is proposed.The algorithm combines the efficiency of filter-based attribute selection method with the effectiveness of wrapper-based attribute selection method,which is a hybrid attribute selection method.Specifically,the algorithgm first sorts the attributes based on the information gain ratio,and then uses 9 times 5-fold cross validation method to select a attribute subset with a larger information gain ratio.The algorithm not only uses the low time complexity of filter-base attribute selection method,but also utilizes the high classification accuracy of wrapper-based attribute selection method.3)A discriminative model selection multinomial na?ve Bayes text classification algorithm is proposed.The algorithm integrates the advantages of complementary of multinomial na?ve Bayes and complement na?ve Bayes,which is a hybrid algorithm.The algorithm first uses the entire training dataset to train two classification models MNB and CNB for each training document.Then it uses the nearest neighbor algorithm to find the nearest training document for the document to be classified,and uses the most reliable model of the training document to predict the class label of the test document.
Keywords/Search Tags:Multinomial Na?ve Bayes, Text Classification, Attribute Weighting, Attribute Selection, Model Selection
PDF Full Text Request
Related items