Font Size: a A A

Research On Several Key Problems Of Knowledge Discovery Based On Rough Set Theory

Posted on:2017-05-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J HuFull Text:PDF
GTID:1318330518999283Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid generation of huge size, various types and continuously change data, the mod-ern society has entered the era of big data. In these emerging constantly and changing rapidly data,there are a lot of uncertain, vague and inconsistent information. How to perform data mining and knowledge discovery efficiently and effectively from the dynamical uncertain data has become a hot research topic in the field of information sciences. Granular computing, as one of the core techniques to simulate human thinking to solve complex problems in the area of computational intelligence, provides us with theories, methodologies, techniques, and tools for uncertainty com-plex problem solving. Rough set theory is an important granular computing model for processing uncertain data. This dissertation studies the several key issues of knowledge discovery approaches for dynamic uncertain data, where granular computing and rough set theory are as the basis, in-cremental learning is as the means and cluster ensemble analysis technique is integrated with. The main research works and innovations are presented as follows:(1) Aiming at the information processing of information system over two universes with object variation, the incremental approaches for updating approximations based on rough set model over two universes are proposed. Firstly, different updating patterns of equivalence classes and the target concept are investigated when adding and removing one data object, respec-tively. Then, the variation principles of approximations of rough sets over two universes in the case of object adding or deleting are presented. Lastly, the incremental algorithms for dynamically updating approximations of two universes information system while one object adding or deleting are designed. Experimental evaluations on recommender system datasets,UCI machine learning repository datasets and synthetic datasets verify the effectiveness of the proposed algorithms. (Chapter 3)(2) Aiming at the knowledge acquisition of fuzzy information system over two universes with multiple objects variation, the incremental approaches for updating approximations based on fuzzy probabilistic rough set model over two universes are proposed. Firstly, according to the different dynamic variations of the approximation space over two universes, the updating mechanisms of the fuzzy probability approximations of fuzzy information system over two universes with multiple objects adding or deleting are investigated. Then, the incremental algorithms for dynamically updating approximations of fuzzy information system over two universes with some objects adding or deleting are designed. Experimental evaluations on recommender system datasets and synthetic datasets verify the effectiveness of the proposed algorithms. (Chapter 4)(3) To combine the overlapping and uncertainty base clustering results in the cluster ensemble analysis, a hierarchical cluster ensemble model based on knowledge granulation and rough distance is proposed by combing granular computing theory and rough set method. Firstly,the problem of clustering ensemble is considered as the minimum rough knowledge granu-larity partitioning search problem, and a novel clustering ensemble objective function in the framework of granular computing is proposed. Then, a novel rough distance is introduced to measure the dissimilarity among base partitions and the notion of knowledge granulation is improved to measure the agglomeration degree of a given granule. Lastly, a hierarchi-cal cluster ensemble algorithm based on knowledge granulation is designed. Experimental evaluations on UCI machine learning repository datasets and Microsoft Research Asia Mul-timedia image datasets verify the effectiveness of the proposed method. It is also shown that the quality of the final solution has a weak correlation to the diversity among ensemble members. (Chapter 5)(4) Aiming at the problem of non-ideal partitions of some data points in the soft clustering, a novel fuzzy cluster ensemble method based on rough set theory is proposed. Firstly, on the basis of soft clustering results, the positive region, boundary region and negative region of clustering ensemble are obtained by applying the principle of approximation acquisition in rough set theory, and then the more accurate categories information of data points in positive region are obtained by applying a novel fuzzy cluster ensemble method. Then, by combin-ing with the supervised ensemble learning method (random forest) in the machine learning method, the obtained categories information is used to construct the supervised random for-est classifier, and then the classifier is used to predict the class of data points in boundary region. Lastly, a similar method is used to extract the new classification knowledge from all the obtained cluster information, and then it is used to predict the class of the data points in the negative region, and thus we get the final result. Experimental evaluations on UCI ma-chine learning repository datasets verify the effectiveness of the proposed method. It is also shown that the quality of the final solution has a weak correlation with the number of cluster members, the threshold setting for approximation acquisition is effective, and the algorithm is robust towards the diversity from hard clustering members. (Chapter 6)The research work of this thesis takes full advantages of granular computing and rough set theory to solve the uncertainty problem. The research results not only help to promote the dynamic knowledge updating efficiency of uncertain big data, enhance the ability to understand the uncertain cluster structure hidden in big data, provide a new theoretical framework and calculation method for the data mining and knowledge discovery of dynamic uncertain data under big data environment,but also contribute to the realization of data value-added services and improve people's decision-making, and hence has the important theoretical and practical significance.
Keywords/Search Tags:Granular Computing, Rough Set, Knowledge Discovery, Information System, Approximations, Incremental Updating, Cluster Ensemble
PDF Full Text Request
Related items