Font Size: a A A

Granulation Mechanism And Data Modeling For Complex Data

Posted on:2012-11-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H QianFull Text:PDF
GTID:1118330368989832Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of computer technology, network technology and sensor technology, data acquisition and data transmission in astron-omy, military, biology, medical, management, and other disciplines, are becoming much easier and quicker. Because of data style becoming more and more complex and data scale becoming more and more increasing, a large number of data sets with high dimensions and large scale have been produced, in which data style is various and data form is isomer-ous. Complex data includes numerous data, nominal data, interval data, missing data and set-valued data as well as their combinations. Modeling, analyzing and applications for complex data have become main tasks to knowledge discovery in many practical fields. Complexity of data is one of key challenges in knowledge discovery. In a word, complex data has become the main body of data source and knowledge discovery in modern society.Data modeling is the foundation of analysis and applications for com-plex data. In recent years, people paid more attention to research theory and method of data modeling using results from cognitive science. These studies are often classified into two sides. One is to understand and sim-ulate perception mechanism, and the other is to understand and simulate cognitive mechanism. As one of important characters in human's cogni-tion, granulation cognition plays a key role for complex data modeling. Through introducing human's granulation cognition mechanism, we look forward to developing novel theories and methods of data modeling. For studying complex data modeling based on granulation mechanism, there are three key problems to solve as follows.●How to effectively granulate complex data?●How to analyze granulation uncertainty?●How to perform data modeling via granulation mechanism?Based on these considerations, aiming at complex data including nu- merous data, nominal data, interval data, missing data and set-valued data, this paper investigates these three key problems from information granulation, granulation uncertainty, modeling strategy and model selec-tion four viewpoints via using human's granulation cognition mechanism. Main results obtained are as follows.1. We have further established methods and algorithms of information granulation for complex data, and have profoundly revealed granulation mechanism of complex data. These results provided the foundation for data modeling of complex data via granulation mechanism.We have presented one new clustering issue, that is how to effec-tively organize data with measurement errors, and have proposed its cor-responding strategy for this new issue. Experimental results show that: (a) clustering algorithms with measurement errors may be much closer to real classes of data sets than those only considering measurement values; (b) error-number distance provides an effective method for measuring the difference between two objects with measurement errors.We have developed a kind of clustering algorithms based on selecting cluster representative, calledκ-representative algorithm. In the context of semi-supervised clustering algorithm,κ-representative algorithm shows its advantages from accuracy, precision, recall and iteration times for cluster-ing nominal data, set-valued data and missing data. In particular, becauseκ-representative algorithm does not analyze the space structure of a data set, it can effectively organize both single type data and mixed type data including numerous, nominal, set-valued, missing, and other types.2. We have established operation method for granular spaces and characterized structure property of granular spaces from al-gebra viewpoint and geometry viewpoint, respectively; and we have revealed the essence of information granularity, which pro-vides constrained theory and directable method for studying granulation uncertainty. For studying structures of granular spaces, we have given a uniform knowledge representation method for various types of granular spaces, and have presented intersection, union, complement and difference oper-ators, which can be used to composition, decomposition and transforma-tion operations among crisp/fuzzy granular spaces. It can be proved that all granular spaces from a universe and these four operators can form a complete complemented lattice, which reveals the hierarchical structure property of granular spaces from algebra viewpoint. In addition, we have also proposed a knowledge distance and a fuzzy knowledge distance, the knowledge/fuzzy knowledge distance and the crisp/fuzzy granular spaces founds a distance space, which reveals the geometry structure property of granular spaces from the geometry viewpoint.For information granulation, we have given several information gran-ularity measures for the crisp granular space and the fuzzy granular space, and have established the corresponding axiomatic approach to the crisp/fuzzy information granularity, respectively. These results uniform relative mea-sures of information granularity in the context of various types of granular spaces, and reveal the essence of crisp/fuzzy information granularity mea-sure, which provide constrained theory and directable method for studying granulation uncertainty.3. Through referencing human's granulation cognitive abil-ity, we have developed three kinds of modeling methods based on multigranulation, dynamic granulation and ordered granula-tion, respectively, which largely promoted the development of data modeling based on granulation mechanism.Through referencing human's multigranulation cognitive ability, we have established three kinds of multigranulation modeling methods, which are based on "Seeking common ground while reserving differences" (SCRD) strategy, "Seeking common ground while eliminating differences" (SCED) strategy and "Concept description" strategy, respectively, which largely enrich modeling theories and methods based on rough set theory. The proposed multigranulation rough sets can be widely applied data analysis under multigranulation contexts, such as distributive information systems and groups of intelligent agents.Through referencing human's dynamic granulation cognitive ability, we have given methods to concept approach and decision approach un-der dynamic granulation, and have proposed a general accelerator for rough feature selection, which provides an efficient strategy for heuristic feature selection in rough set theory. From theoretical analysis and ex-perimental results, one can draw conclusions:(a) each of the accelerated algorithms preserves the attribute reduct induced by the corresponding original one; (b) each of the accelerated algorithms usually comes with a substantially reduced computing time when compared with amount of time used by the corresponding original algorithm; and (c) the perfor-mance of these modified algorithms is getting better in presence of larger data sets; the larger the data set, the more profound computing savings. Furthermore, we have also developed a structure dimensionality reduc-tion strategy combining feature reduction and sample reduction together, and have designed a very efficient algorithm for rule extraction based on this strategy. Experimental results show that both computational time and decision performance are much better than each of existing meth-ods, which will provide an efficient method for knowledge discovery from large-scale data sets.Through referencing human's ordered granulation cognitive ability, we have given semantic description of each of interval data, conjunctive set-valued data and disjunctive set-valued data, have established rank decision and grading decision based on ordered granulation, and have proposed a feature selection method based on rank-preservation, which can effectively select a feature subset from an ordered information system and an ordered decision information system. These results further perfect the theories and methods of rank decision and grading decision, and also provide new viewpoints for ordered classification and ordered clustering, and others.4. We have established a model selection method based on entire decision performance evaluation, which provides theo-retical foundation and technique support for model selection in knowledge discovery.We have established a class of model selection methods based on entire decision performance evaluation. In the context of complete infor-mation systems, we have proposed three decision performance parameters of a complete decision-rule set, which are entire certain measure, entire consistency measure and entire support measure. In the context of incom-plete information systems, we have first characterized incomplete decision rules using maximal consistent blocks and have then given the correspond-ing those three parameters. To evaluate the entire decision performance of dominant rules from ordered decision information systems, we have also developed three evaluation parameters, which are entire certain measure, entire consistency measure and cover measure. These results show that all the proposed evaluation methods are much better than existing methods based on approximation accuracy and approximation quality, which can provide theoretical foundation and technique support for model selection and scientific decision for a specific issue.In a few words, from the viewpoint of granulation cognitive mech-anism, this paper have obtained a series of important results at four stages including information granulation, granulation uncertainty, mod-eling strategy and model selection. These results from this paper have initially established a data modeling theory and method achitecture based on granulation mechanism, which have important theoretical significance for complex data modeling, and also have practical application values for improving efficiency of mass information processing.
Keywords/Search Tags:Complex data, Data modeling, Granular computing, Information granulation, Granular space, Information granularity, Multigran-ulation, Dynamic granulation, Ordered granulation, Model selection
PDF Full Text Request
Related items