Research On Attribute Selection Algorithm Based On Analysis Of Correlation Between Attributes

Posted on:2010-04-20

Degree:Master

Type:Thesis

Country:China

Candidate:J Z Shao

Full Text:PDF

GTID:2178360275473280

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data mining is a new technology which extracts potential and useful information from lots of daily transactional data. Data mining algorithm often has strict requirement of data set such as good integrity, little data redundancy, weak attributes correlation and so on. However, the daily transactional data may be incomplete, redundant or indistinct, etc. So, preprocessing is usually required to be performed upon raw data before applying data mining algorithms. Attribute selection is an important method of data preprocessing, by which the noise of data sets can be reduced and the algorithms of data mining can be more effective.In this paper, we firstly introduce attribute selection related theory and basic concepts of information theory. Then we detailedly analyze the static organizational structures and dynamic running processes of algorithms in the package of attribute selection. Then we introduce the existing correlation-based evaluation methods and describe the new analysis of redundancy between attributes and the evaluation criterion of max-relevance and min-redundancy in great depth. Finally, two novel attribute selection algorithms based on the analysis of correlation between attributes are designed. One is attributes redundancy removal algorithm, which uses decision independent correlation and decision dependent correlation to respectively measure the relevance between one attribute and the class attribute and the redundancy between one attribute and another attribute. The other is rank-wrapper algorithm, which is two-stage approach. In this algorithm, first rank method uses the criterion of max-relevance and min-redundancy to select some good attribute subsets, and then wrapper method uses cross-validation to select the best attribute subset. The classification algorithms Naive Bayes and C4.5 are used to evaluate the result of attribute selection. As testified by experiments, in most of data sets, these attribute selection algorithms can effectively select attributes and maintain classification performance at the same time.

Keywords/Search Tags:

Data mining, Attribute selection, Information theory, Weka, Correlation

PDF Full Text Request

Related items

1	Research On Attribute Selection Algorithm Based On Classification Theory Of Corelation Between Attributes
2	Some Data Mining Algorithms Based On Information Theory
3	Research And Practice Of Network Teaching Data Analysis Based On Weka Platform
4	Improvement And Application Of Naive Bayes Aglorithm Based On Attribute Selection Weighting
5	Research On Feature Selection Algorithm Based On Rough Set Theory
6	Research On Mining Technology Application Based On Weka Platform College Enrollment Admission Data
7	Analysis And Research Of Attribute Selection Methods In Data Mining
8	Attribute selection in machine learning based on information theory (Spanish text)
9	The Design And Implementation Of A Wireless Modeling System Based On The WEKA Data Mining
10	Design And Realization For Online Diagnoses System Based On Medical Data Mining