Font Size: a A A

A Multistage Rule Induction Algorithm Based On Rough Set And An Extension Of Flow Graph

Posted on:2008-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:H W LiuFull Text:PDF
GTID:2178360212495884Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
We live in an Information Age now, and information has become one of critically important components of our life. Due to the success of the Internet, the amount of available information, including immense volumes of data in databases, is growing explosively. People are often buried within the masses of data in recent years. How to transform the huge volumes of data into useful knowledge now becomes a center topic in intelligent information processing and decision support community. Therefore, means and tools for its support are urgently required. To cope with this tough problem, a kind of knowledge process technique, which is called knowledge discovery in database (KDD), has been introduced and gained a great attention in various fields. KDD is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns from huge volume of data. However, data mining (DM) is the most important component of KDD. In addition, Rough Set is one of effective and vital methods to handle data among the others in KDD and DM.Rough set proposed by Pawlak in 1982 is a new mathematical tool to handle uncertain, vague and ambiguous data. On the basis of an indiscernible relation, this theory represents knowledge as partitions and divides universe into numerous blocks. As a result, it provides an effective mathematical technique for data analysis and decision-making, especially for the imprecise, uncertain and incomplete data. There are some distinctions between rough sets-based method and others in terms of the manner of processing data. Rough set is capable of completely discovering knowledge from the given data, and does not need domain, prior knowledge or external model which most of methods based on statistics and machine learning techniques require. As a result, the mined results using Rough sets-based approach are not affected by subjective aspects. What's more, the redundant data can also be discarded according to the observed data. Among the data mining methods based on Rough sets, most of them firstly obtain reducts about data, and then extract rules on the ground of these reducts. Although this kind of methodhas many advantages, such as reducing time and space complexity, some drawbacks are still dwelling in them. For example, a single reduct can not represent all knowledge hidding in the original data. In addition, it is a NP-Hard problem to achieve all reducts. Moreover, reduct is not competent for experts'opinions, because it is only a kind of symbol.Flow graph, which is firstly introduced by Pawlak in 2002, is a new and promising mathematical and graphical model of data ananlysis. For it is a quantitative flow network about data, flow graph is competent for describing dependence relation among data. Based on this model, some relerant analysis can be conducted and the dependency can also be easily discovered. Since it is strongly associat with rough set and probability theory, flow graph has been received much concerns and applied sucessfully in decision-making and conflict solution fields. However, a series of fatal shortcomings reside in its structure. For instance, it can not precisely or qualitatively represent the relation among data.To cope with these problems mentioned above, in this paper, we firstly investigated a multistage rule induction algorithm based on rough sets. The predominances of this algorithm lie in that rule extracting will be carried on all attributes, rather than a single reduct. In addition, it integrates experts'opinions with dependence degree among attributes by adopting a special set of attributes. As a result of an extension of rough set, the algorithm can also effectively handle noise data. The experimental results demonstrate our algorithm is effective and higher efficiency.Since qualitative analysis is the same important with quantitative analysis, the other purpose of our paper is that an extension of Flow graph is introduced. In this extension, the flowthrough between nodes is no longer a quantity, but a set of objects. The major advantage of this structure is that the extension is not only competent for depicting the flowthrough between nodes in quantitative, but also capable of describing the relation in qualitative. What's more, the extension has the same capability with its corresponding decision table. For example, we also can implement determining the consistence, generating level reduct or node reduct. Additionally, itassociates solidly with Granular Computing. As a result, the data reasoning and reform in flow graph can be easily conducted by decomposition and composition operations in granular computing. Furthermore, this paper provides some algorithms, such as judgeing its consistence, generating level or node reduction, dynamic updating, data reasoning and reform of flow graph, about the extension and analyze their complexities respectively.
Keywords/Search Tags:Multistage
PDF Full Text Request
Related items