Research And Application On Naive Bayes Classification Algorithm

Posted on:2015-05-19

Degree:Master

Type:Thesis

Country:China

Candidate:M A

Full Text:PDF

GTID:2298330467485813

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

In the information century data has become a valuable asset. The scientific and industrial community always have a great concern on how to extract out valuable information and knowledge from these data efficiently and accurately. In the data mining research field, Bayesian classification is one of the most important classification methods, while Naive Bayes is a simple Bayesian classification method. It has a solid theoretical foundation and so simplicity, speediness and stability comparing with other methods, therefore it has been widely used. But Naive Bayes model assumes that the given characteristics between the conditions under classification features are independent from each other. In practical applications this assumption is often not true, which brings restriction in Naive Bayes method results. For this reason, many scientists have tried to study the assumption of independence between the attributes to improve the performance of Naive Bayes classifier, attribute weighting is a good method.With the fast development of information technology, the amount of information is exponentially growing, which contains a lot of commercial value. Massive data processing and massive computing are a common problem in the data mining. Data mining was initially used to work with small scale data which has well-structured data. But with the increase of data amount, the traditional algorithms of data mining is not enough to do this job. Cloud computing is effective to deal with large-scale data and large-scale computing. If we are able to parallelize the traditional data mining algorithms, and deploy them to run on a cloud computing platform, the problem of effective data mining can be solved. Whether parallelize the data mining algorithm appropriately is the key to solve this issue by using the cloud computing platform.In this paper, we introduce the theory of Naive Bayesian classification and analyze a number of improved methods of Bayesian classification algorithm and focus on the impact of attribute weighting to the classification results. Accordingly, this paper proposes a differential evolution algorithm based on Naive Bayesian classification algorithm combination with attribute weighting. To optimize the value of property weight by using differential evolution algorithm. The experiment proved that the algorithm can improve the accuracy of classification.Then, we introduce the cloud computing platform based on Hadoop and MapReduce programming models as well as detailed analysis of the parallel process of Naive Bayesian classification algorithm. We propose and implement distributed platform based on Hadoop parallel Gaussian Naive Bayesian classification algorithm to deal with large-scale continuous data. The experiments show that the algorithm can not only improve the classification accuracy but also to speed up data processing.

Keywords/Search Tags:

Naive Bayes, Data Mining, Differential Evolution, Cloud Computing, Hadoop Platform, MapReduce Model

PDF Full Text Request

Related items

1	Research On Algorithms For Naive Bayes Classification And Its Tools Based On Hadoop
2	Research Of Data Mining Classification Algorithm Based On Cloud Computing And The Solar Wind Data
3	Based The Hadoop Platform Job Scheduling Algorithm
4	Data Mining Based On Hadoop Platform
5	The Research Of Mapreduce Implementing Of Text Classification Algorithm Based On Mass Data
6	Research On Massive Data Mining Algorithm Based On Cloud Computing Cotton Storage
7	Research On Massive Digital Image Data Mining Based On Hadoop Cloud Platform
8	The Research Of MapReduce Job Scheduling Algorithm Based On The Hadoop Platform
9	Research Of Meteorologlcal Date Mining Based On Gloud Computing And Bayes
10	Research On Classification Algorithm Of Massive Data Based On Cloud Computing