Font Size: a A A

Study On Naive Bayesean Algorithm Based On Attributes Weighting And Reduction

Posted on:2014-04-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q YangFull Text:PDF
GTID:2268330401485901Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a complex process that extracting knowledge which people are interested in from large amounts of data. Data mining includes numerous fields, during which data classification is one of the important research fields. Classification is a data analysis method, whose main function is to use classification function or classification model to assign each data item in the database to a given class. There are many algorithms for classification, among which the Bayesian algorithm is based on Bayesian theorem, so it has a solid mathematical theory foundation. Bayesian classification algorithm mainly uses the prior probability to obtain a posteriori probability through a series of calculation. This method is very simple and easily to be understood, therefore Bayesian algorithm has been further studied and applied in many fields. The types of Bayesian algorithm can be divided into naive Bayesian algorithm and Bayesian network. The Bayesian algorithm based on attributes weighting and reduction is mainly studied in this paper. The work content and achievements are as follows:(1) Study of the naive Bayesian algorithm based on attributes weighting. The naive Bayesian algorithm has the characteristics of simplicity and high efficiency, but it completely ignores the dependencies between attributes, which has serious impacts on the classification effect when the degree of attribute dependency is high. For the assumption that the attributes are independent of each other in naive Bayesian algorithm, a method based on attributes weighting is proposed to weaken the limitation of independence. The attribute weighting method uses a combined method of covariance theory and hi-square fitting statistics to determine the weight coefficient. Covariance theory expresses the corelation between attributes mainly through the covariance of the attribute value, and chi-square fitting statistics adopts the frequency that attribute appears to determine the weight coefficients and the final weight coefficients are determined by combining the above two methods. In this way, the proposed method considers both the attribute values and attribute frequency of appearance, which is able to express the dependencies between attributes well. It is showed that the classification accuracy of the improved algorithm has been improved to some extent through three set of comparison experiments.(2) Study of naive Bayesian algorithm based on the attributes reduction. The naive Bayesian algorithm is only ideal for discrete data in terms of classification effect, while continuous data and high-dimensional data can’t be classified until they have been preprocessed. Data preprocessing includes data discretization, dimension reduction and so on. Aimed at the problem that naive Bayesian algorithm is not sensitive to high dimensional data, multiple dimension reduction methods which consist of the principal component analysis, information entropy and independent component analysis are adopted to deal with it. After using the above methods for data processing, the weighted naive Bayesian algorithm is then used for classification. The experiments show that it has different effects when using different dimension reduction methods to classify data, during which the Principal Component Analysis is better in dimension reduction, while the information entropy method is slightly worse in classification accuracy.
Keywords/Search Tags:classification algorithms, Naive Bayesian algorithm, attribute weighting, weight coefficient, dimension reduction
PDF Full Text Request
Related items