Font Size: a A A

The Research Of Possibilistic And Fuzzy Co-Clustering Approach With Multisource Features

Posted on:2020-08-15Degree:MasterType:Thesis
Country:ChinaCandidate:J Q RenFull Text:PDF
GTID:2428330602952277Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
With the growth of data and the continuous improvement of data processing systems,we have put forward more requirements for the development of powerful and effective data processing and data mining algorithms.Clustering algorithms are widely used as the basic tools of data mining.In the actual clustering problems,improper extraction of feature sets and the interference of outliers greatly affect the clustering results.Among them,feature set extraction faces the following two problems: first,the features extracted from the sample provide less information for the clustering,ie the weakness of the feature;Secondly,eigenvectors usually have the properties of high-dimensional multi-sources,resulting in clusters in feature space with complex structures.The interference of outliers is a headache for many proposed clustering algorithms.Limited outliers may cause errors or inaccuracies in partitioning.This paper based on accurate data and non-precise data to study the two major problems of clustering.Firstly,for the precise data set,this paper proposes a multitask possibilistic and fuzzy coclustering algorithm(MPFC).The algorithm first considers the structural difference of the feature space,and makes a reasonable allocation of the feature source by measuring the contribution of different feature sources to each cluster.Then the algorithm uses the information sharing between tasks to mine the information carried by the data features from different aspects and improve the utilization of effective information.Finally,in order to reduce the interference of outliers on the clustering results and increase the robustness of the algorithm,the algorithm uses the properties of the typical degree to identify outliers,and weakens its influence in the clustering process.At the same time,in order to avoid the coincidence of the centroid due to the influence of typicality,this paper proposes a new parameter selection index,which uses the properties of the typical degree to correct the parameters,and controls the movement of the centroid through the corrected parameters.In order to test the performance of the MPFC algorithm,this paper uses several data sets and corresponding clustering algorithms to conduct experiments.The results show that the MPFC algorithm not only improves the clustering accuracy,but also greatly reduces the interference of outliers on the clustering results.Secondly,in many practical problems,the measurement results are often inaccurate real numbers or vectors.Such data is called inexact data.Non-precise data is extremely extensive,such as gas content surveys.Most current clustering algorithms are used to process accurate data sets,in order to improve the benchmark of the clustering algorithm for dealing with inaccurate data,this paper extends the improved MFC algorithm and the newly proposed MPFC algorithm to non-precise data sets.Since inaccuracy is usually handled based on fuzzy sets,when processing inaccurate data,this paper uses the LR type fuzzy set to cluster and gives the corresponding distance method.In order to improve the clustering accuracy of the MFC algorithm,the improved MFC algorithm randomly initializes membership degrees,and for the purpose of avoiding the centroid being close due to the initialization membership degree,when constructing a new parameter selection index,the algorithm adds a repulsive force between the cluster centers to contain the movement of the cluster centers.Experiments show that the MFC-F algorithm and the MPFC-F algorithm achieve superior results compared to other excellent algorithms for processing fuzzy data.Finally,the research content and results are summarized,which lays a foundation for the subsequent research on high performance clustering algorithm.
Keywords/Search Tags:Multitask clustering, Possibilistic clustering, Fuzzy co-clustering, Robustness, Clustering accuracy
PDF Full Text Request
Related items