Font Size: a A A

Research On Soft Subspace Incremental Clustering For Incomplete Data

Posted on:2021-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:S W LanFull Text:PDF
GTID:2518306350983589Subject:Mathematics
Abstract/Summary:PDF Full Text Request
In recent years,due to the rapid development of science and technology,the collected data has the characteristics of high scale and high complexity.When clustering and analyzing these data with traditional clustering algorithms,the clustering effect will be greatly reduced.The clustering results obtained are also very different from people's expectations.Therefore,the cluster analysis method for large-scale data becomes particularly important,which has also set off a wave of research in the field of data mining.At present,because subspace clustering has achieved relatively satisfactory results for large-scale data processing,subspace clustering has gradually become a focus of attention from various circles.The soft subspace clustering algorithm is a process of determining the subspace in which the cluster is located by the degree of membership.Starting from the problems in soft subspace clustering,although people have made continuous improvements to soft subspace clustering,they still lack in handling incomplete data sets and dynamic data.For the problem of incomplete data,this paper uses the q nearest neighbor method to use the maximum and minimum values of the corresponding attribute values of the nearest data object as the left and right interval values of the missing attribute value,and then uses the q nearest neighbor method to construct the data object.Combined with the optimization of complete clustering methods,an entropy weighted soft subspace clustering algorithm(INEWSC)based on interval neighborhoods is proposed.Comparative experiments were performed between INEWSC and fuzzy C-means clustering of complete data strategy,fuzzy C-means clustering of partial distance strategy,fuzzy C-means clustering of optimized complete strategy,and fuzzy C-means clustering of nearest neighbor strategy.The experimental results show that the INEWSC method proposed in this chapter can not only accurately estimate the missing attribute values,but also greatly improve the clustering accuracy.For the problem of dynamic data,based on the entropy weighted soft subspace clustering method,this paper proposes an enhanced entropy weighting possibility by combining the enhanced likelihood fuzzy C-means algorithm and incremental learning strategies.Soft Subspace Incremental Clustering Method(EEWPSSIC).This method processes dynamic data on the original clustering structure,and uses the original clustering information to update the clustering information after adding new data.For the problems of incomplete data sets and dynamically updated data in large-scale data,the entropy weighted soft subspace clustering algorithm based on interval neighborhood and the enhanced entropy weighted soft subspace incremental clustering algorithm are combined in this paper.Together,an interval entropy-based enhanced entropy weighted likelihood soft subspace incremental clustering algorithm(INEEWPSSIC)is proposed.This algorithm can accurately estimate missing attribute values while using the original clustering during the clustering process.The information indicates the clustering information after the data is added.Contrast experiments between INEEWPSSIC and fuzzy C-means clustering,likelihood C-means clustering,fuzzy weighted soft subspace clustering,and entropy weighted soft subspace clustering.Experimental results show the effectiveness and advantages of INEEWPSSIC algorithm.
Keywords/Search Tags:Soft subspace clustering, Incomplete data, Interval neighborhood, Incremental learning
PDF Full Text Request
Related items