Font Size: a A A

Research On Uncertainty Measurement Of Rough Set Based On Hybrid Information System

Posted on:2020-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:J ChenFull Text:PDF
GTID:2428330575965414Subject:Engineering
Abstract/Summary:PDF Full Text Request
Rough set theory is a mathematical tool for dealing with inaccurate,inconsistent,incomplete and fuzzy data.It was proposed by the Polish scholar Pawlak in 1982.The core of rough set theory is to divide the universe in the information system by equivalence relation,and gets a lot of object sets.These object sets are also called equivalence classes or meaningful and valuable knowledge that humans need.Since the classical rough set is based on the equivalence relation to divide the universe in the information system into knowledge,it can only be used to process discrete data or symbolic data.When processing numeric data,classical rough sets need to discretize this type of data.However,the process of discretization will change the internal organization of the data,and may cause some important information loss,which will reduce the data mining ability of the data set.In order to solve this limitation,scholars have extended and improved the classical rough set.Then,the neighborhood rough set model and the fuzzy rough set model have been proposed.With the rapid development of science and technology,the amount of data is large and complex.In practical applications,there are factors such as data measurement error,limitations of science and technology,and errors in understanding the data,which will make the acquired data incomplete.However,the data in the current information system are mostly mixed,that is,the discrete type,the numerical type,and the missing type satisfy at least both.Therefore,how to effectively extract valuable and meaningful knowledge from this complex and huge amount of data,which has become a key topic in the current era of big data;how to improve existing numerical methods;how to define a function,which can be suitable for evaluating the uncertainty measurement of mixed incomplete information systems.In response to the questions raised above,this paper will improve them in turn and propose corresponding uncertainty measurement methods.The main work in this paper is summarized as follows:(1)Making an improvement on the existing numerical measurement methods in this paper.Accuracy,roughness,approximation accuracy and approximation roughness are four single numerical measurement methods,which are proposed earlier.But they have some shortcomings when evaluating the uncertainty of information systems.To solve these problems,a combination uncertainty measurement method based on roughness and ambiguity is proposed in this paper;then,considering the size of each decision class in the information system,the degree of influence of ambiguity and roughness on the uncertainty of the whole system is also different.In order to solve this problem,a weight is assigned to each class in this paper,then a weighted combination uncertainty measure method is proposed,and the relevant properties are also given.Finally,the experimental results of UCI show that the weighted combination metric proposed in this paper has better uncertainty measurement effect.(2)Defining an uncertainty measure function suitable for evaluating mixed incomplete information systems in this paper.In practical applications,the data in most information systems is a hybrid type.In order to measure the uncertainty of mixed incomplete information systems,considering the distribution of data in information systems,and a distance function with tolerance capability is defined in this paper,and an improved incomplete neighborhood rough set model is proposed.Based on this model,the concepts of the mixed approximation accuracy and the mixed approximation roughness are defined respectively.Then,considering that these two single numerical metric methods can only estimate the size of the set boundary domain,but can not measure the size of the knowledge granularity,in order to solve this problem.In this paper,the uncertainty measure of information system is researched from the perspective of information theory and granularity,then the concept of neighborhood tolerance information entropy is defined.Finally,combining the advantages of the two measurement methods of the hybrid approximate roughness and the neighborhood tolerance information entropy,a combination metric is proposed and the related properties are studied.The UCI experimental results show that the proposed metric has better measure effect,which proves that the method has certain superiority,and theoretically proves the feasibility of the proposed method.The innovations in this paper are summarized as follows:(1)The proposed weighted combination metric method not only overcomes the shortcomings of the single numerical metric method,but also considers that the uncertainty of each class in the information system has different degrees of influence on the uncertainty of the whole information system,thus introducing the concept of weights,finally a weighted combination metric is proposed.(2)Considering that the data in the current information system are mostly mixed types,in order to be able to deal with the uncertainty measurement problem of mixed incomplete information systems,and considering the distribution characteristics of the data itself,for numerical attributes and symbolic attributes,different distance formulas are set respectively.And an improved incomplete neighborhood rough set model is established.Then,the concepts of the mixed approximation roughness and neighborhood tolerance information entropy are defined,respectively.The mixed approximation roughness can measure the size of the set boundary domain well.The neighborhood tolerance information entropy measures the size of knowledge granularity from the perspective of information theory or granularity.Finally,the advantages of these two single metric methods are combined,and the combination metric method is proposed,and the uncertainty of information system can be measured more effectively and get better classification accuracy.
Keywords/Search Tags:uncertainty metric, neighborhood rough set, incomplete rough set, hybrid information system, mixed approximation roughness, neighborhood tolerance information entropy
PDF Full Text Request
Related items