Font Size: a A A

Research Of Incomplete Information Processing Method Based On Rough Set Theory

Posted on:2013-10-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H GuanFull Text:PDF
GTID:1228330395953436Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of computer technology and network technology, data and information in all fields have been generated rapidly. This expansive growth and uncertainty of data lead to a requirement of the development of more powerful techniques, which can convert the huge and mess data into valuable information and knowledge. It is a challenge for the study of intelligent information processing. Thus, data mining becomes one of the key research fields in artificial intelligence.Among many methods of data mining, rough set theory is an effective method for handling complex systems. Compared with some other theories like probability theory, fuzzy set and evidence theory, it is significant advantage that rough set theory doses not require any prior knowledge except data set and can describe the problems objectively. It has found many interesting applications in the areas of machine learning, knowledge acquisition and pattern recognition.Using the classical rough set theory we may describe complete data sets, i.e., data sets in which all attribute values are known and certain. However, real-life data are often imperfect (erroneous, incomplete, uncertain and vague) for some reasons like limitation of data acquisition technique, transmission medium error and some personal factors. In this case, the application of the classical rough set theory is limited. So, the research on how to extend the rough set model is very important for the development of rough set theory.In this dissertation, the theory and methods for processing incomplete information based on rough set theory are studied. The main contents include:the extensions of the indiscernibility relation, the definitions of approximations induced by any generalized indiscernibility relation, the methods of knowledge reduction based on the generalized rough set models, and the incremental updating methods of positive region reduction. The main contributions of this dissertation are listed as follows.(1) The approximations defined by any generalized indiscernibility relation are investigated. The suitable definitions of approximations are suggested for each class of generalized indiscernibility relations.The classical upper and lower approximations are based on the indiscernibility relation, but this requirement of equivalence relation is too stringent and not satisfied in many situations. So it is quite necessary to extend these concepts to the case of more general relations. Nowadays, the definitions of approximations have become one of the important research issues of the generalized rough set models. Firstly, Pawlak approximation space and generalized approximation space are reviewed, and the existing twelve different basic definitions of approximations are introduced. Secondly, the relationships among twelve different basic definitions of approximations are investigated in each class of generalized indiscernibility relations. Thirdly, based on the ideas of rough set approximations, the suitable definitions of approximations are suggested for each class of generalized indiscernibility relations. Lastly, the approximations defined by the existing generalized indiscernibility relations are proposed, and their properties are analyzed.(2) The calculation method of tolerance degree and the selection method of the threshold in valued tolerance relation are studied based on the idea of the data-driven data mining, and a data-driven valued tolerance relation is proposed.The valued tolerance relation in incomplete information systems is an important extension model of the classical rough set theory. However, the general calculation method of tolerance degree needs to know the probability distribution of an information system in advance, and it is also difficult to select a suitable threshold. In this dissertation, a data-driven valued tolerance relation is proposed based on the idea of data-driven data mining. The new calculation method of tolerance degree is not only founded on the basis of the statistical characteristics of attribute values in incomplete information systems, but also considers the effect of the number of known and same values between objects. In order to avoid the limitations generated by selecting the threshold subjectively, an auto-selection method of threshold is proposed. These methods do not require any prior domain knowledge except the data set. Experiment results show that the data-driven valued tolerance relation can get better and more stable classification results than the other extension models of the classical rough set theory.(3) The strategies and methods of knowledge reduction based on generalized indiscernibility relations are developed. A heuristic algorithm of attribute reduction holding the classifiable capability not reduced and a hierarchical algorithm of value reduction are proposed. These algorithms can be used effectively to knowledge reduction of incomplete decision tables.In this dissertation, knowledge reduction based on generalized rough set models in incomplete decision tables is investigated. Firstly, according to the properties of approximations induced by the generalized indiscernibility relations, the strategy of knowledge reduction, which holds that the classifiable capability is not reduced, is proposed. Secondly, the knowledge reduction is defined. Thirdly, the heuristic algorithm of attribute reduction for any generalized indiscernibility relations and the hierarchical algorithm for value reduction are proposed. Simulation experimental results show that the recognition accuracy is high using the knowledge reduction methods proposed in this dissertation. These methods are vital to accelerate the industrial application of rough set theory.(4) The incremental updating methods of attribute reduction in incomplete decision tables with some objects are studied. An incremental algorithm of positive region reduction based on attribute order is proposed.In practical applications, incomplete decision table with some objects is unavoidable. Further, the different users have different requirements for knowledge acquisition. So, all different kinds of the new added objects are analyzed in this dissertation. An incremental algorithm of positive region reduction for a given attribute order is proposed based on the discernibility matrix element set. The simulation experiments show that compared with the non-incremental algorithm, the correctness and effectiveness of algorithm proposed in this dissertation are improved greatly.
Keywords/Search Tags:incomplete information system, rough set, generalized indiscernibility relation, knowledge reduction
PDF Full Text Request
Related items