Font Size: a A A

Research On Attribute Selection Algorithm Based On Classification Theory Of Corelation Between Attributes

Posted on:2009-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2178360242474641Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data Mining is a new subject to extract useful imformation from large quantity of daily transactional data and has been fast developing since 1990s. The daily transactional data may be incomplete, redundent or indistinct, etc. so preprocessing is usually required to be performed upon raw data before applying algorithms of Data Mining. Attribute Selection is an important method of data preprocessing, by which the dimension and noise of data sets can be reduced and the algorithms of Data Mining can be more effective.This paper introduced the overview and general structure of open-source Data Mining platform Weka and analyzed the code organizations and running processes of attribute selection algorithms of the platform in greate depth; proposed the concept of reference distribution law and attributed corelations between attributes to be the differences between reference distribution law and distribution law; summarized the shortcomings of existing computational methods of corelation between attributes and proposedĪ±- index andĪ²- index to measure corelation between attriubutes based on the new definition of corelation and found out that the distributions of those two indices are highly regular and can be used to divide corelations between attributes into 4 categories; designed two attribute selection algorithms, in which the criterion of deciding whether to retain a certain attribute or not is the type of corelation between that attribute and class attribute and classification algorithms Naive Bayes and C4.5 are used to evaluate the result of attriubte selection. As testified by experiments, in most of data sets, attriubte selection algorithms based on classification theory of corelations between attriubtes can effectively select attributes and maitain classification performance at the same time.
Keywords/Search Tags:Data mining, Attribute Selection, Distribution Law, Weka, Correlation
PDF Full Text Request
Related items