Font Size: a A A

Research On Attribute Reduction And Concept Drift With F-rough Sets

Posted on:2018-11-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y N LiFull Text:PDF
GTID:2348330518475040Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In social production and life practice,the quantity and types of data are large and complex.Data mining technique is appropriate to solve the problem that find potential,valuable knowledge from imprecise data.However,with the improvement of the computer network technique,especially data collection and data storage technique,data volume is becoming increasingly larger.Moreover,data are becoming variable as time goes on,which will trigger concept drift phenomena.Effective methods for detection can help people deal with uncertainty problems in data streams and discover potential knowledge.Rough sets are a new mathematical analysis tool,which can effectively deal with imprecise,inconsistent,incomplete information.F-rough sets is a kind of rough sets model that is about family information system or family decision system.F-rough sets are an extension of Pawlak rough sets.It is suitable for studying parallel computing and dynamic things.Rough sets and attribute reduction are the most commonly methods used to research uncertainty problems.Attribute reduction based on rough sets and F-rough sets is to preserve the decision system classification or decision ability and deletes the redundant condition attributes.All kinds of principles for attribute reduction are to preserve specified criteria.As a result,generalization ability is weak and classification accuracy is low,when deal with some data with exceptions.Classification accuracy,joint probability distribution and attribute reduction are used to detect concept drift.These methods can detect concept drift effectively and have been widely used.However,there are some disadvantages in practical application.For example,classification accuracy could detect concept drift on the whole,but for the same classifier and test set,there will be a different result with different feature selection.In this thesis,a reducing method called attribute reducts of various positive regions and a detecting method called dependence of conditional attributes and conditional information entropy are presented.The former allows the change of positive regions and the latter posses the advantages of both testability of classification accuracy and theoretic analyzability of joint probability.The specific research contents are as follows:1.Attribute reduction of various positive region based on rough sets.The proposed method is allowed the change of positive regions when attribute reduction is conducted.Thus,a few of attributes that cause some difficulties to the generalization ability are reduced to improve the generalization ability and classification accuracy.2.In contrast to attribute reduction,this thesis summarized the criteria of concept drift detection respectively,joint probability is limited to some concepts and is inflexible.3.Two criteria of concept drift detection,which are based on dependence of conditional attributes and conditional information entropy,are presented.The phenomena of concept drift is analyzed from the viewpoints of attribute reduction,and on the other hand,attribute reduction is analyzed deeply from the viewpoints of concept drift.This thesis investigates the differences and relations between concept drift and attribute reduction,in order to explore attribute reduction and concept drift on the uncertain problems in the nature.4 Experimental results show that dependence of conditional attributes and conditional information entropy are valid and efficient for detecting concept drift.The analysis and comparison are conducted between two classical criteria of concept drift(i.e.classification accuracy and joint probability distribution)and two criteria of attribute reduction(i.e.dependence of conditional attributes and conditional information entropy).
Keywords/Search Tags:Rough Sets, F-Rough Sets, Attribution Reduction, Data Streams, Concept Drift, Various Positive Region
PDF Full Text Request
Related items