Font Size: a A A

Research And Application On Naive Bayes Classification Algorithm

Posted on:2015-05-19Degree:MasterType:Thesis
Country:ChinaCandidate:M AFull Text:PDF
GTID:2298330467485813Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In the information century data has become a valuable asset. The scientific and industrial community always have a great concern on how to extract out valuable information and knowledge from these data efficiently and accurately. In the data mining research field, Bayesian classification is one of the most important classification methods, while Naive Bayes is a simple Bayesian classification method. It has a solid theoretical foundation and so simplicity, speediness and stability comparing with other methods, therefore it has been widely used. But Naive Bayes model assumes that the given characteristics between the conditions under classification features are independent from each other. In practical applications this assumption is often not true, which brings restriction in Naive Bayes method results. For this reason, many scientists have tried to study the assumption of independence between the attributes to improve the performance of Naive Bayes classifier, attribute weighting is a good method.With the fast development of information technology, the amount of information is exponentially growing, which contains a lot of commercial value. Massive data processing and massive computing are a common problem in the data mining. Data mining was initially used to work with small scale data which has well-structured data. But with the increase of data amount, the traditional algorithms of data mining is not enough to do this job. Cloud computing is effective to deal with large-scale data and large-scale computing. If we are able to parallelize the traditional data mining algorithms, and deploy them to run on a cloud computing platform, the problem of effective data mining can be solved. Whether parallelize the data mining algorithm appropriately is the key to solve this issue by using the cloud computing platform.In this paper, we introduce the theory of Naive Bayesian classification and analyze a number of improved methods of Bayesian classification algorithm and focus on the impact of attribute weighting to the classification results. Accordingly, this paper proposes a differential evolution algorithm based on Naive Bayesian classification algorithm combination with attribute weighting. To optimize the value of property weight by using differential evolution algorithm. The experiment proved that the algorithm can improve the accuracy of classification.Then, we introduce the cloud computing platform based on Hadoop and MapReduce programming models as well as detailed analysis of the parallel process of Naive Bayesian classification algorithm. We propose and implement distributed platform based on Hadoop parallel Gaussian Naive Bayesian classification algorithm to deal with large-scale continuous data. The experiments show that the algorithm can not only improve the classification accuracy but also to speed up data processing.
Keywords/Search Tags:Naive Bayes, Data Mining, Differential Evolution, Cloud Computing, Hadoop Platform, MapReduce Model
PDF Full Text Request
Related items