Research On Quantitative Associative Classification Based On Lazy Method

Posted on:2015-01-29

Degree:Master

Type:Thesis

Country:China

Candidate:B F Li

Full Text:PDF

GTID:2268330422971939

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Nowadays, with the rapid development of computer technology, much valuableinformation is saved, but how these implicit information effectively utilized exhumed isour continuous research. In data mining domain, the association classification algorithmis widely used in various fields for its high classification accuracy and high adaptability.However, the current research methods and models are based on the ideal of discretedata, how it can be better applied to the quantitative data is one of the problems to besolved.Currently, quantitative data classification is divided into two steps, first, transformthese data from quantitative data to discrete data, second, use traditional associationalgorithm to make a classification. This "discrete and learn" method may cause discreteblindness, for example, when the testing data is incomplete, we couldn’t match the rulesof classifier, and then the accuracy of the classification is also be affected. On the otherhand, the use of attribute-based projection associative classification algorithm with Lazythinking will delay the time of classifier construction to classification phase, in themeantime, each testing data is projected on the original training data set, and then weget a small scale training dataset without irrelevant attributes. Studies have shown thatthe effect of the traditional classification of associative classification has been greatlyimproved by Lazy associative classification algorithm.Based on the drawbacks of traditional associative classification algorithm, andcombined with Lazy method, we propose a new associative classification algorithm-QLAC. For quantitative data, it first use K-nearest neighbor classification thinking togetting first N data as the new training dataset, and then use K-means clusteringalgorithm to discretize the new training dataset, with testing data together, finallyaccording to the characteristics of discrete dataset, mining the association rules based onclosed frequent item sets and construct a classifier for classification. Meanwhile, whengetting a new tested training data set, if their class property values are consistent, thisvalue assignment to testing data directly, otherwise continues behind associativeclassification operation.Finally, in order to verify the effectiveness of QLAC algorithm, we choose7quantitative experimental datasets of UCI. When comparing with K-NN，our methodhas an increase of1.03%; When comparing with traditional association classification algorithm, like CBA, CPRA, CMRA and Lazy Method, classification accuracy has anaverage increase of0.66%-1.65%. In addition, we also make a comparison with CBAon classifier size, the experimental result show that classifier of our proposed algorithmhas a decrease of39.5. It has a smaller classifier and with more efficient classassociation rules.

Keywords/Search Tags:

Data mining, Associative classification, Lazy, Quantitative data, Discreteblindness

PDF Full Text Request

Related items

1	Research On Incremental Associative Classification Algorithm And Multi-Label Classification Approch
2	Associative Classifier For Uncertain Data
3	Data Mining Research And Applications Of The Classification Algorithm
4	Research On An Associative Classification Algorithm To Data With Uncertain Attribhutes
5	Research And Implement Of A Frequent Pattern List Based Associative Classification Algorithm
6	Associative Classification Based On Hybrid Strategy
7	The Algorithm Research Of Associative Classification Rule Mining And Its Application In Medical Image Data Mining
8	Research On Association Rules Mining And Associative Classification Based On Bit Table
9	Methods For Complex Data Classification And The Application In Personalized Recommender System
10	The Algorithm Research Of Associative Classification And Classification Based On Imbalanced Data