Font Size: a A A

Implementation Of Fuzzy Data Mining System

Posted on:2006-12-29Degree:MasterType:Thesis
Country:ChinaCandidate:C YangFull Text:PDF
GTID:2208360182986765Subject:Control theory and control engineering
Abstract/Summary:PDF Full Text Request
With the advent of computers and the information age, the problems of data processing have explored both in size and complexity. Challenges in the area of data processing have led to the birth of a new field - Data Mining. Nowadays, vast amounts of data are being generated in many fields. One of the most important jobs of the future human being is to extract important patterns, clear out the trends from data, and really understand "what the data says." This is the so-called learning from data.Fuzzy systems can be considered as reasonable models for data mining, which use "if-then" rules to establish qualitative relationships among variables in the model. Fuzzy sets serves as a friendly interface between qualitative variables in the rules and numerical data at the inputs and outputs of the model. The rule-based nature of fuzzy models allows the use of information expressed in the form of natural language statements and consequently makes the models transparent to interpretation and analysis. At computational level, fuzzy systems can be regarded as flexible mathematical structures, like other universal approximators, that can approximate any complex nonlinear systems to a desired degree of accuracy.However, it does not mean fuzzy systems are perfect models without any spots.1. Most of the available modeling methods are trying to improve the accuracy by function approximation, so that not much information can be abtained for a posteriori interpretation of the system's behavior, which misses the point of "data mining".2. Once coming to high dimension data mining, the phenomenon named "the curse of dimensionality" arises. Fuzzy systems have to face the problem of sparse data scattering in high dimension with rule exploding.In this thesis, we consider the mentioned two problems as follows:1. Our basic idea to deal with rule exploding is using clustering method. Similar data can be grouped into the same cluster, and then each cluster needs only one rule to describe its characteristic, which prevents rule exploding.2. Two new fuzzy modeling methods are proposed in this thesis, both of whichinvolve the Minimum Cluster Volume (MCV) clustering algorithm.From both transparency and accuracy of fuzzy modeling perspectives, the characteristic of MCV applied to fuzzy modeling is analyzed, such as the robustness of MCV, less overlap and larger core regions when obtaining membership functions with, simplification of the rule base, less bias for the consequent parameters estimation, etc. The modeling result gives easier understanding about the system than the model obtained by other clustering methods. Such cluster-based rule generation method contributes to a more concise rule base (for description) with high accuracy (for prediction).3. How to deal with high dimension problem? Input selection may be seen as a crucial step, which contributes to build an interpretable model with less computation. Input selection thus has drawn great attention in recent years, but most available methods are model-based. As a result, two model-free methods for input selection are presented in the thesis, one is developed from sensitivity analysis, and the other is based on the common sense - consistency. The relationship between the two methods is also discussed here. A lot of experiments are carried out to test the proposed methods, and quite good performances are achieved from both methods.Several famous "data mining" problems are carefully discussed, such as MPG prediction, Box-Jenikin gas furnace process, Boston housing etc. With the proposed methods, many vivid and interesting results come out, which serves the purposes of data mining very well.
Keywords/Search Tags:Implementation
PDF Full Text Request
Related items