Classification Knowledge Discovery Algorithms Based On Granular Computing And Its Applications

Posted on:2011-04-26

Degree:Doctor

Type:Dissertation

Country:China

Candidate:J H Luo

Full Text:PDF

GTID:1118330338473438

Subject:Chemical Process Information Engineering

Abstract/Summary:

PDF Full Text Request

With our world enter in a knowledge economy era, knowledge production and application becomes one of the most important factors. Knowledge discovery, as the core of intelligent information processing technology, plays more and more important role in the knowledge production. The classification is one of the most important tasks in knowledge discovery, involved with data preprocessing, data mining, model evaluation and knowledge representation. In the field of chemistry and chemical engineering, the classification is wildly used and is also very important, so the study of the classification method can not only promote the development of data mining techniques, but also greatly expand the knowledge discovery application.At present, the researches and technologies of classification knowledge discovery have made significant progress, and at the same time a variety of data mining methods are used, but many prominent problems are still remain to be studied. Especially, in the field of chemistry and chemical engineering, because the collected data usually have characteristics of multi-factor, non-linear, high-noise and imbalance, the conventional data analysis and processing methods are not only time-consuming but also difficult to effectively mine the confidential message. If the relevant methods of classification knowledge discovery will be improved and developed, it can promote the development of chemistry and chemical engineering. And it is also significantly valuable in economy.Granular computing is a new concept and computing paradigm of information processing, covering all of the granularity related theories, methods, techniques and tools. In the course of solving a complex problem, the basic idea of granular computing is to simulate the human intelligence characteristics, select the appropriate granularity and reduce the complexity of solving problem, which is helpful to find a better solution, so granular computing provides a new way for knowledge discovery research. However, the current research on granular computing is mainly focused on theoretical research, and the application of granular computing is rarely concerned and reported especially in the field of chemistry and chemical engineering. In this paper, four basic principles of using granular computing into knowledge discovery are summarized, and then the research on adapting the principles to solve some challenges in classification knowledge discovery is provided, and at last the solution strategies and methods are proposed for the related problems in the field of chemistry and chemical engineering. The major works and achievements in this paper can be summarized as follows:1. Granulation and clustering is a kind of method of summarizing the classification knowledge, and the clustered class can present the confidential knowledge of data. Clustering analysis, as one of the important basic methods in soft science research, is an effective means. Adaptive Resonance Theory 2 (ART2) network has many advantages on clustering, but it also has some disadvantages, such as insensitivity to gradually changing of the input patterns and limited anti-noise performance. Therefore, an improved ART2 with Enhanced Triplex Matching mechanism (ETM-ART2) is proposed to improve clustering capability of ART2 networks. Experiments on cluttering the olive oil data sets show that the ETM-ART2 has a better clustering performance, and is particularly fit to be applied into the massive data clustering problems. The ETM-ART2 can also be used to construct the information granules in classification, which is helpful for knowledge discovery and improvement of classification performance.2. Constructing information granules is one of the basic steps in granular computing. Based on the principle of granularity knowledge discovery and granularity approximate solution summarized in this paper, a method of constructing information granules by ART network is proposed to analyze the research data conveniently and rapidly, and then a classification knowledge discovery solution based on information granules is also proposed according to the principle of problem be simplified by Granular Computing (GrC). Two algorithms are developed:one is the Information Granulation based Fuzzy Classification Knowledge Discovery Method (IG-FCKDM), and another one is the key feature analysis based on granulation (KFAG) for classification rules mining by C4.5, names as KFAG-C4.5. IG-FCKDM focuses on the imbalanced two-class problems and error-sensitive issue. The IG-FCKDM constructs the information granules by Fuzzy ART, and extracts the classification rule by fuzzy processing. Experiments by IG-FCKDM on a disease diagnosis problem show its better performance for this kind of problem and more important significance of prediction accuracy and credibility by IG-FCKDM. KFAG-C4.5, which can be used for general and multi-class imbalanced classification problems, uses ETM-ART2 to construct good information granules, and then analyzes the key feature based on granulation, and at last divide data's attributes into some distinguishable sub-attributes reasonably, so the number of sub-attributes will not be large. The information granules can be presented by sub-attributes with discrete values of 0 or 1, in order to mine the classification rules by C4.5. Experiments on two-class problem of glass and imbalanced multi-class problem show the good classification capabilities of KFAG-C4.5. The messages mined by IG-FCKDM and KFAG-C4.5 are different in manifestations, but they are very concise, comprehensible, easy to analyze for various users, and effective to solve the imbalanced data classification problem.3. Ensemble learning usually can improve the performance of a single classifier, and the selective ensemble learning is focused on with deeper study, But at the present, the selective ensemble algorithms based on stochastic optimization algorithms mainly set the generalization error as the goal, and almost ignore the diversity of individual classifiers, especially the diversity measuring. Though some good results can be achieved, there exists the more complex computing and low efficiency. In order to solve the problem of measuring the diversity of the individual classifier, the selective ensemble learning problem is transformed into a simple correlation space based on GrC problem equivalent principle, and a simple, efficient selective ensemble mechanism is proposed. The Correctness and Diversity based Selective Ensemble (CDSE) algorithm is proposed, which integrates the accuracy with the diversity of individual classifiers, based on knowledge granules. Experiments on toxicity classification show that CDSE has better classification performance than other ensemble algorithms such as Bagging, AdaBoost.M1, and single classifier C4.5. In the view of selecting appropriate individual classifiers, CDSE provides an efficient solution to improve the generalization performance and efficiency of ensemble classifier.4. Based on the aspects of construction of ensemble classifiers and prediction determining, a new Correctness and Diversity based Adaptive Selective Ensemble (CDASE) learning algorithm is put forward, extending CDSE into an adaptive ensemble learning algorithm, in which the generalization performance of ensemble classifiers is improved. An appropriate ensemble classifier is adaptively generated for each category, so they form a group of ensemble classifiers called as AE-Group. Each one of them shares the same storage space, so AE-Group occupies fewer computation resources and less storage space. Then classification for test data is also adaptively decided by selecting appropriate ensemble classifier from AE-Group. Experiments on a multiple-class problem show CDASE has the better ensemble learning results with less individual classifier. Compared with other algorithms, CDASE has a good generalization performance, so it is more efficient and stable. CDASE overcomes the limitations on narrow application of single-ensemble learning algorithm, and provides a novel method to further improve the generalization capability of ensemble learning.

Keywords/Search Tags:

granular computing, classification, knowledge discovery, information granule, imbalanced data, adaptive resonance theory network, key feature analysis, knowledge granule, diversity, selective ensemble, adaptive ensemble learning

PDF Full Text Request

Related items

1	Research On Several Key Problems Of Knowledge Discovery Based On Rough Set Theory
2	Research Of Granular Computing And Extension Of Variable Precision Rough Set Theory Based On Pansystems Theory
3	Two-class Imbalanced Big Data Classification Based On Data Reduction And Ensemble Learning
4	Research On Structural Diversity Of Ensemble Learning
5	Research And Application Of Imbalanced Data Classification Algorithm Based On Ensemble Learning
6	Study On Data Mining Model Based On Theory Of Granular Computing
7	Research On Imbalanced Data Classification Based On Sampling Method And Ensemble Learning
8	Study On Approaches To Ontology Learning Oriented Granular Computing
9	Research On Imbalanced Dataset Classification Based On Ensemble Learning
10	Research On Theory And Application Of The Full Covering GrC Model