Font Size: a A A

Research On Information Granulation Algorithm For High-Dimensional Mixed And Class Overlapping Data

Posted on:2024-07-08Degree:MasterType:Thesis
Country:ChinaCandidate:Y D ZhangFull Text:PDF
GTID:2568307136989649Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
As a new paradigm for simulating human thinking and solving problems in the research filed of computational intelligence,granular computing has become an effective tool for massive data mining and intelligent information processing.As a basic problem of granular computing,information granulation is a constructive process in problem solving space,and its granulation criteria directly determine the effect of granular computing.Currently,the two-stage granulation framework,proposed by Pedrycz,is still one of the most commonly used information granulation methods.In this framework,the first phase pretreats raw data by applying clustering algorithms and describes a basic prototype of the granular structure.Subsequently,the second phase captures the core structure of the data by using the principle of justifiable granularity,and then constructs representative and specific information granules.But when dealing with high-demensional mixed and class overlapping data,traditional granulation methods still adopt some basic fuzzy clustering algorithms and granulation functions,which are not only susceptible to interference from redundant features,but also result in poor semantic performance of the generated information granules.This paper is based on Pedrycz’s information granulation method and takes the following steps as the main line: Improved Neighborhood Space based Feature Selection Algorithm for High-Dimensional Mixed Data→Rough Fuzzy K-Means Clustering based on Decision-Theoretic Shadowed Set→Information Granulation Algorithm Based on Local Boundary Fuzzified Metric.In this paper,a new information granulation framework for high-dimensional mixed and class overlapping data is explored so as to obtain information granules with better overall performance.The main research contents are as follows:(1)Improved Neighborhood Space based Feature Selection Algorithm for High-Dimensional Mixed DataAs an impromant data preprocessing technology before information granulation,feature selection algorithm can effectively deal with the “curse of dimensionality” caused by high dimensional data,thereby improving the effcts of unsupervised clustering and supervised granulation.Nonetheless,how to perform feature selection on high-dimensional mixed data is one of the focuses and difficulities of current research.In this paper,an improved construction method of neighborhood space is proposed on the basis of neighborhood rough set;Considering boundary overlapping data and the size of neighborhood space,an evaluation function is designed to characterize the discrimination ability of neighborhood space;On this basis,a heuristic feature selection algorithm considering highdimensional mixed data is proposed.The validity and superiority of proposed algorithm are verified by the UCI standard datasets.(2)Rough Fuzzy K-Means Clustering based on Decision-Theoretic Shadowed SetUnsupervised clustering constitutes the foundations of information granules formation,thus,the results of clustering will directly affect the justifiability and specificity of the finally formed information granules.Because of introducing the concepts of upper and lower approximations of rough set theory,the Rough Fuzzy K-Means Clustering is suitable for applying into information granulation algorithms,which can effectively characterize the uncertainty of class overlapping data.However,the definition of class overlapping regions in this algorithm only depends on the fixed distance threshold value ζ.When confronted with the data exhibiting heavily overlap,the division of approximate regions varies greatly and cannot be explanatory.This paper proposes an improved rough fuzzy K-Means clustering based on decision-theoretic shadowed set,in which the three-way approximation is implemented by incorporating a novel fuzzy entropy for class overlapping regions to obtain reasonable and explanatory approximate regions,thereby improving the clustering accuracy and stability of the algorithm.(3)Information Granulation Algorithm Based on Loacl Boundary Fuzzified MetricThe information granulation framework based on Fuzzy C-Means under the principle of justifiable granularity is currently the most commonly used granulation method.However,this design process leads to information granules that are likely to intersect each other in cluster boundaries,which degrades the semantic expression.In this paper,the improved granulation principle based on local boundary fuzzified enhancement is introduced into the information granulation framework based on Rough Fuzzy K-Means Clustering,in which the characteristics of support and inhibition of class overlapping data are strengthed.Through the comparative analysis of synthetic datasets and multiple UCI standard datasets,the information granules generated by proposed algorithm are more compact and representative whilst ensuring clearer boundaries when faced with seriously class overlapping data.
Keywords/Search Tags:Information granulation, High-dimensional mixed data, Class overlapping data, Feature selection, Rough clustering, Granulation function
PDF Full Text Request
Related items