Font Size: a A A

Condition-induced Concept Drift Detection And Optimizing Selection Of Attribute Reducts

Posted on:2020-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:Y W GeFull Text:PDF
GTID:2428330578961314Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Concept drift is a hot topic in data streams and big data,and it is also a common phenomenon in real world.As one of the challenges in data mining,concept drift has attracted more and more attention from many scholars.In data streams mining,there are many researches on concept drift detection,and most of the related literatures study concept drift phenomenon caused by time change,however,concept drift phenomenon is not only caused by time change,but also caused by the change of space or conditions.Moreover,there is a lack of research on concept drift between different expressions of the same concept(or the same family of concepts).In real life,human beings always think and reason with concepts,but it is hard for these concepts to express their meanings completely,and even if the meanings of the same concept(or the same family of concepts)are not the same when the concept is employed in different time or in different space or under different conditions.So how do we measure the differences and connections between them? How to choose a better expression? This article will carry on the discussion and the research around these two questions.Rough set theory is a mathematical tool to study imprecise and uncertain knowledge.It can draw meaningful rules from incomplete information and extract some rules.Attribute reduct is one of the core contents of rough set theory.In rough set theory,there are more than one attribute reduct in a data set,each reduction is an expression for either a single concept or the whole data set.and heuristic algorithms are always used to find one of them,which is verified with experiments.For many attribute reducts,it is hard for people to distinguish them,and lacks of valid methods of selecting the best one or a better one.According to the above problem,this article builds a tree structure to express and explain the concept(or the family of concepts).Essential concept drift and quantitative concept drift are defined as criteria of concept drift detection between concepts expressed based on a tree.The properties of rough concept trees and concept drift for the same concept(or the same family of concepts),which contains different concept intension(different conditional attributes),are investigated.Theoretical analysis and examples show that our methods are valid.Experimental results show that essential concept drift and quantitative concept drift are more sensitive than classification accuracy when they are employed as criteria to detect concept drift between heterogenous data(different conditional attributes).From the viewpoints of epistemology,the results of this article can express and explain concept drift phenomena in real world,and also can explain the reason why correct classification rates are different when different feature selections for the same training data are employed to classify the same testing data.Then the concept drift metric is used to solve the optimization problem of attribute reducts.There are more than one attribute reduct in a data set,in this article,indexes of concept drift and information loss are employed to compare the same type of Pawlak attribute reducts in a knowledge system.The focus of attribute reducts is presented,and its properties are investigated in this paper.Experimental results show that the closest attribute reduct to the focus of attribute reducts is better than other attribute reducts in classification accuracy.Indexes of concept drift detection and information loss can distinguish different attribute reducts,and the focus of attribute reducts can be employed to select the best attribute reduct or a better one.
Keywords/Search Tags:Rough Set, Attribute Reducts, Concept Drift, Formal Concept, Focus of Attribute Reducts
PDF Full Text Request
Related items