Font Size: a A A

Research On Naive Bayes Classifiers And Its Improved Algorithms

Posted on:2021-02-05Degree:MasterType:Thesis
Country:ChinaCandidate:X L ZhangFull Text:PDF
GTID:2518306032466414Subject:Mathematics, operational research and cybernetics
Abstract/Summary:PDF Full Text Request
With the rapid development of databases in the present century,the amount of information has surged,and tens of thousands of data contain a lot of useful information.In order to mine valuable knowledge and use it,the technology of data mining has emerged.As an important research content in data mining,and classification is a major aspect of its research.The general naive bayes classification algorithm has the most extensive application and high comprehensive efficiency.However,its biggest defect is the assumption that the attributes are independent of each other,which has a certain impact on its classification performance.This paper aims at the inherent shortcomings of the condition hypothesis of naive bayes classification,combines feature selection and frequent closed term set in association rules to improve the performance of naive bayes classifier and make the classification more accurate.The main work of this paper is as follows:Firstly,a naive bayes classification algorithm based on support vector machine is proposed.Because the example contains a large number of unrelated features,the recognition ability of the entire learning system is weakened to a certain extent.It is usually necessary to select features before learning an algorithm,only retain the most relevant features.Based on this,this paper uses support vector machine to make feature selection,combines naive Bayes theory to propose a naive Bayes classification algorithm based on support vector machine,and compares it with the original naive Bayes classification algorithm through numerical experiments,it is verified that the proposed algorithm is more ideal.Secondly,because some of the well-known association rule mining algorithms in the process of generating rules,we need to use support-confidence to delete patterns that have no practical meaning.In view of the shortcomings of this framework,the concept of correlation measurement in statistics is introduced,and a new interest model is proposed,which is not only feasible but also more efficient through empirical analysis.Then,because a data set usually produces a large number of frequent itemsets,and the smaller the minimum support is,the larger the number is.However,the number of frequent closed itemsets is much smaller than that of frequent itemsets.It not only preserves the complete information of frequent itemsets,but also infers all frequent itemsets and their supporting degree based on the frequent closed itemsets,and infers all the rules generated by frequent itemsets through the association rules generated by the closed frequent itemsets.In this paper,if complex closed term set of association rules is used as training set,and the dependence of attributes in frequent closed term set is used to reduce the error caused by the conditional independence assumption of naive Bayesian classification algorithm.A naive Bayes classification algorithm based on frequent closed term sets is proposed.Numerical experiments verify that the performance of the proposed new algorithm far exceeds that of the general naive bayes classification algorithm.Finally,Summarizes the main research contents of this paper,and puts forward the direction of further research.
Keywords/Search Tags:Data mining, association rules, feature selection, naive Bayes, frequent itemsets, frequent closed itemsets, support-confidence framework
PDF Full Text Request
Related items