Based On Particle Swarm Optimization Weighted Naive Bayes Classification Study

Posted on:2012-07-07

Degree:Master

Type:Thesis

Country:China

Candidate:J Lin

Full Text:PDF

GTID:2208330338455287

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining is a young and vibrant area of research. Classification is one of the research areas of data mining. Bayesian classification is a broad classification method, which is one of the Bayesian classification methods, and can be comparable with the decision tree and neural network classification. In large database applications, it also showed high accuracy and speed. But the naive Bayes needs to meet an assumption that the impact of each attribute value on a given class is independent of other attribute values. This class conditional independence limits the accuracy of naive Bayesian classification, because sometimes there are some attributes existing a certain relationship between them and do not meet the independence assumption. Weighted naive Bayesian classification is an extension of naive Bayesian classification, setting each attribute a different weight, thus weakening the class independence assumption to make it more satisfied with the actual situation. A key issue to weighted naive Bayesian classification is to determine the weights, so in this paper, Weighted Naive Bayes based on Particle Swarm Optimization (WNB-PSO) was proposed, using particle swarm optimization algorithm to automatically search for the weights, and the fact that this algorithm can really improve the accuracy of naive Bayesian classification through experiments was illustrated.Classification is actually a process of structuring a model or a classifier. This process can be divided into two steps: learning and classification. First, the data is randomly divided into training data and test data. Training data is used for learning stage, through particle swarm optimization algorithm to determine the weight, while test data is used for the classification stage to test the classification accuracy of the algorithm. Accuracy will be used both in the particle swarm optimization fitness function and the performance evaluation on the classifier in classification phases. In the learning stage (or the training stage), according to Bayes theorem, under the situation that a given tuple's category is unknown, posterior probabilities of each class are calculated, which is given by the Bayesian formula, then the tuple is assigned to the class owning the maximum probability value. Weighted Naive Bayes algorithm adds the weights to the posterior probability formula, so that the probability of each attribute obtains a different proportion. For the determination of weights, this paper used particle swarm optimization algorithm in the training data to search for the best weights. In the process of searching, accuracy was used to be the fitness function of particle swarm optimization. Initially, the size of the particle swarm and the number of iterations were set, and the position of each particle is a random value, that is, the weights are arbitrary. In each iteration, if the fitness of each particle's current position was better than the previous best fitness, the previous best position of the particle was updated to current location, at the same time, the speed should be updated. When the number of iterations or the threshold reached the set value, the weights were found. In the classification stage, the weights found were used to structure classifier and then the classifier was tested on test data, counting the number of correct classification tuples and the number of incorrect classification tuples to obtain classification accuracy for evaluating the accuracy rate, finally the accuracy obtained by Naive Bayes classifier was compared. Experimental data was used from the UCI data sets.The main contents of this article were as follows:1. The development process of data mining was reviewed, and the data mining process, the type of data for mining, the data preprocessing, and several major classification algorithm were described.2. The Bayesian classification was systematically studied, first some of the relevant knowledge of probability theory was introduced, and the Bayes theorem was described, and then Bayesian classification algorithms, Bayesian networks, and weighted naive Bayesian classification were introduced.3. The particle swarm optimization algorithm was described, the basic particle swarm algorithm and the improved versions were elaborated, including the algorithm with inertia weight and the algorithm with shrinkage factor, and the algebraic analysis and analytical analysis were performed.4. Weighted naive Bayesian based on PSO classification algorithm was proposed, and was analyzed experimentally.Creative results of this paper were as follows:1. A general process of Bayesian classification was establish, from data collection to classification prediction, each step of which was described in detail.2. Weighted naive Bayesian based on PSO classification algorithm was proposed.Experiments showed that the weighted Bayesian classification based on particle swarm optimization algorithm outperformed the naive Bayesian classification in accuracy indeed and can be used on most of the data sets.

Keywords/Search Tags:

Data Mining, Classification, Bayes, Weighted, Particle Swarm Optimization

PDF Full Text Request

Related items

1	Data Mining Method Research Based On Rough Set And Particle Swarm Optimization
2	Adaptive Weighted KNN Text Classification
3	Prediction Of Protein Contact Map Based On Weighted Naive Bayes Classifier And Extreme Random Tree
4	Construction And Application Of Weighted Bayesian Model
5	Classification Rule Data Mining Based On PSO
6	The Research Of Web Data Mining Based On Particle Swarm Optimization (PSO)
7	The Particle Swarm Optimization And Its Application
8	The Research Of Particle Swarm Optimization Based On Classification
9	Clustering Algorithm Analysis Of Data Stream Mining Based On Particle Swarm Optimization
10	Research On Attribute Reduction Method Based On Particle Swarm Optimization