Font Size: a A A

Research On Technology Of Rules Obtaining Based On Incomplete Decision Table

Posted on:2013-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:S WangFull Text:PDF
GTID:2248330371988853Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Rough set theory is a theory of data analysis which was proposed in1982by polish science academicians whose name is Z.Pawlak, and it is a mathematical tool which was used to deal with imprecise, uncertain and inconsistent data. Rough set is based on the past data, it does not need to provide expert knowledge and it can directly discover intrinsic association from the data table, which lead to the association rules that has high credibility and coverage. So it can be widely used in many fields, such as artificial intelligence, intelligent control and data mining.Data mining methods which based on rough set mainly include two aspects:one is the reduction towards attribute or attributes value and the core obtaining and the other is rule obtaining from data table. In order to simplify the data table and provide support for the follow-up data mining, attribute reduction can get ride of the irrelevant or redundant attributes without changing the original contact and classification capacity. Because the core is defined as the intersection of all the attribute reduction, so attribute core is an indispensable attribute in the data table. Core can be used as heuristic information in many attribute reduction algorithm and with their help we can simply find the attribute reduction. In addition, the core has a unique role in constructing multivariate decision tree. Therefore, the study of attribute reduction and the core obtaining has an important role in the application of rough set. An important application direction of rough set is to obtain decision rules, usually, there are two steps to obtain the best rules from the data table, the first step is attribute reduction and the second is the reduction of attribute value, but it also has another method to obtain rules directly from the data table, so the study of efficient algorithm for rules acquisition is very meaningful.The traditional rough set theory is assumed that the decision table which is called complete decision table has no missing attribute value. But in real life, because of unknown phenomenon, some attribute values in many decision tables is unknown, so we call this decision table the incomplete decision table. There are variety reasons for causing the missing value. Therefore, how to use the traditional rough set theory deal with information in incomplete decision table has become a hot topic in current research. And there are many methods for handling missing values, so the academic community has a different understanding. At present, more scholars tend to expand the basic concept of the traditional rough set under incomplete decision table. Relations such as tolerance, non-symmetric similarity relation and limited tolerance relation, the contact model, the variable precision the dynamic model of tolerance. Among them, the tolerance relation proposed earlier, so it affects a wide range. After Professor Kryszkiewicz proposes the tolerance relation, in order to solving the problem of lack co-ordination in decision table, he proposes the concept of generalized decision. This paper mainly studies the relationship between tolerance attribute reduction, the core obtaining and rule acquisition.1) Through the many scholars’research algorithms for computing core have been more perfect in the incomplete decision table, but there are few method about computing core in the incomplete decision table. At present, there are algorithms computing core based on positive region in the tolerance relation. In this paper, by constructing the compatibility relation model, the definition for computing core based on discernibility matrix is introduced, then it proves theoretically that computing the core use the definition of new discernibility matrix is equal to the core of general decision, and the time complexity of the algorithm isO(|C‖U|2).This new algorithm for computing core based on discernibility matrix both use the intuitive clear advantage of discernibility matrix, and use the method of radix sort which avoid the fault that the discernibility matrix will take up larger space.2) Professor Zhao Yapeng and Zhou Haiyan put the binary difference matrix in complete decision table into the incomplete decision table. And Zhao Yapeng proposed the binary discernibility matrix under the similar relationship models, however, binary difference matrix in the computing process is proposed by Professor Zhou Haiyan need to compare multiple objects. In this paper, a new attribute reduction algorithm based on the positive region of the binary discernibility matrix is given to handle this problem, which complexity is max{O(|C‖U|2), O(|C|2|Upos‖Uneg|)}. And it is simply to compare the object Upos and Uneg in the computing process, and at the same time in the calculation of the discernibility matrix, combined with a bitmap technology, eliminating repetitive and invalid contrast line. Thus this new algorithm can reduced the search object and storage space, and can effectively reduce the time complexity of algorithm.3) In the incomplete decision table, Professor Huang Bing proposed a new tolerance matrix under tolerance relation, and given attribute reduction algorithm based on the tolerance matrix, which time complexity is O(|C|3|U|2). In order to reduce the time complexity, with deeply study of the previous algorithm of attribute reduction based on the tolerance matrix, we found it is not well used the former step in object classification when looking for attribute reduction by the a single condition attribute. A new algorithm uses the matrix distance as heuristic information which combines with the characteristics of the conjunctive matrix is designed. Because this new algorithm that can reduce the time complexity make full use of the effective information has been calculated, so the time complexity becomes O(|C|2|U|2) and the calculation result is the same with the result of original algorithms.4) Solving the best attributes of the reduction has been shown to be the NP Hard problem, so solving the suboptimal solutions attribute reduction is an important aspect of information systems data mining method. Professor Kryszkiewicz given the definitions and methods of generalized decision which were used to the attribute reduction under the tolerance relation in incomplete decision table. With learning the scholars of previous research and combing with the thought of discernibility object pair, the definition of discernibility object pair set and the corresponding definition of attribute reduction are introduced. And it is proved that the definition of attribute reduction is equivalent to the one based on generalized decision-making. Under this condition, an algorithm based on the definition of discernibility object pair set is proposed which uses discernibility degree K(ci) and complete degree P(ci) as heuristic information and combines with quick sorting, and the time complexity of the new algorithm is cut down to O(_C‖U|2).5) According to the incomplete decision table,the compatibility matrix of condition attribute and the matrix which use decision attributes to allocate decision-making were constructed respectively by Professor Huang Bing and were used for rule acquisition, but to construct two matrices, resulting in time complexity and space complexity is not ideal; Professor Cheng Yu sheng proposed a method with joint decision-making compatibility matrix which improve the time and space complexity to a certain extent; Through the analysis Ji Huai Meng found by introducing attribute attribute reduction and general decision-making function can effectively resolve the problems mentioned above, but the proposed algorithm should repeatedly check conflict between the attribute-value and the value of the generalized decision, and this will cause the algorithm time complexity is less than ideal. However, we introduce extended discernibility matrix and using radix sort to divide generalized decision value in this paper. Simply to compare the different objects of the generalized decision values and record the corresponding conditional attribute values, so the shortcomings of checking the property values and generalized decision value whether conflict or not in calculation process in the original literature was avoided and optimize the algorithm.
Keywords/Search Tags:incomplete decision table, attribute reduction, computing core, rules obtaining, datamining
PDF Full Text Request
Related items