Font Size: a A A

Research On Fuzzy Clustering Algorithms Based On Shadowed Sets And Rough Sets And Their Applications

Posted on:2017-04-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:L N WangFull Text:PDF
GTID:1318330536468175Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Due to the complexity of real data environment,more and more data analysis utilizes integration methods to accomplish tasks that are unsolvable with a singular,specific method.Therefore,integrating various theories so as to construct a suitable data mining model to solve the problems of practical data analysis has attracted significant interests.In the field of data mining,fuzzy clustering algorithms are widely used in research and application.At present,the fuzzy clustering algorithm for data mining applications has some disadvantages,such as its sensitivity to noise data and its limited effectiveness that is only suitable for the partition of a spherical cluster similar in size,and so on.Recently,with the progress of the theory of shadowed sets,rough sets and fuzzy sets,some scholars have applied the theory of rough sets and shadowed sets to fuzzy clustering for detecting noisy data and outliers.By combining the theory of shadowed sets and rough sets,the traditional fuzzy clustering analysis algorithms are systematically improved and innovated in this thesis.The emphasis is on the optimization method of the objective function for fuzzy clustering,and improved fuzzy clustering algorithms applicable to multiple data types and dataset of random distribution.In addition,a novel fuzzy clustering validity function is developed for data mining.The experimental results illustrate the effectiveness and improved performance of the proposed methods,and some fuzzy clustering analysis algorithms are successfully demonstrated in the prediction of time series of airport noise and related fields.The main contributions and innovations of this thesis are given as follows:(1)An improved validity index for the evaluation of fuzzy clustering algorithm is proposed,which is based on previously reported indices.This index defined by the radio of separation measure and compactness measure not only contains the information of data membership degree and data structure,but also reflects the characteristics of data distribution.Experimental results prove that the new index is reliable and effective for data from overlapping datasets.(2)Partition-based clustering with weighted feature is developed in the framework of shadowed sets.Utilizing the optimization theory of the shadowed sets,the core and exclusion regions are generated based on fuzzy membership,which facilitates the discovery of the noise data.Taking into account the feature vectors having different contributions to the pattern classification,weighted method integrating shadowed sets with fuzzy clustering is introduced.The proposed algorithm solves the effective partitioning with overlapping clusters,as well as enhances the robustness in the presence of outliers.(3)A novel feature weighting fuzzy clustering in the framework of granular computing that incorporates fuzzy sets,rough sets and shadowed sets is developed.Associating features with weights and combining different theories,the algorithm is more effective in handing overlapping among clusters and more robust in the presence of noisy data and outliers.(4)A new,much more flexible fuzzy clustering with categorical attributes is proposed.For data comprising numeric and categorical attributes,based on the hypothesis of the probability distribution in clusters,to effectively detect noise and abnormal data points,fuzzy clustering employing a fully probabilistic dissimilarity function integrating shadowed sets and rough sets is introduced.The resulting characterization leads to an efficient description of information granules obtained through the process of clustering including their overlap regions,outliers,and boundary regions.For data with categorical attributes,modifying the objective function of the fuzzy k-modes by adding the between-cluster information simultaneously minimizes the within-cluster dispersion and enhances the between-cluster separation.Meanwhile,to reduce the misclassification by using the hard centroids,a new fuzzy centroids clustering with between-cluster information for categorical data is demonstrated.(5)The application of fuzzy clustering algorithm integrating shadowed sets and rough sets is implemented in different fields.Firstly,based on shadowed rough-fuzzy clustering algorithm and later support vector regression,the prediction model of airport noise time series is constructed.Secondly,due to the current attention of network intrusion detection,dataset sampling analysis of KDD CUP 1999 is analyzed.To achieve effective detection of intrusion data,a new measure named ?two-steps? fuzzy clustering integrating shadowed sets and rough sets is proposed to improve detection performance.Lastly,in view of practical significance of local outliers detection,a novel algorithm is proposed that integrates rough sets and shadowed sets into feature weighted fuzzy clustering to reduce the computational effort of local outliers.
Keywords/Search Tags:Shadowed sets, rough sets, fuzzy clustering algorithm, feature weighting, clustering validity index, prediction of airport noise time series, network intrusion detection, local outlier detection
PDF Full Text Request
Related items