Font Size: a A A

Research On Stream Clustering Algorithm Based On Importance Sampling

Posted on:2020-02-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z ZhangFull Text:PDF
GTID:2428330623465364Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,stream data analysis has become a research hotspot in the field of data mining,and its development is very rapid.However,most current stream clustering algorithms are linear.In the real world,these linear clustering algorithms can not achieve satisfactory clustering quality.So how to make the real world data have higher clustering quality is an urgent problem to be solved.To solve these problems,an efficient clustering algorithm based on kernel method is proposed.Firstly,importance sampling method is used to collect subsets of data stream and construct its core matrix with sample points.Secondly,real-time clustering of points in the core matrix is carried out by cosine similarity measurement method of sample points,and a labeled sample core matrix is obtained.The points in the data stream are partitioned by the matrix and projected to the points crossed by the top eigenvector.In high-dimensional space;finally,the data points in high-dimensional space are mapped to low-dimensional space by using the kernel fuzzy c-means,and the clustering results are updated by using the fading clustering mechanism.The experimental results of data sets show that compared with traditional clustering algorithm,SSE is obviously lower,ARI and NMI are higher,and real-time clustering can be achieved,which avoids dimension disaster in data processing.And only a small number of points need to be sampled from the data stream.The approximation error generated has good boundedness.At the same time,the kernel method is used to make the data points linear separable in the feature space.Moreover,it eliminates the need to adjust complex parameters,and achieves remarkable acceleration and higher efficiency than other traditional kernel-based clustering algorithms under the same conditions.There are 33 figures,13tables,and 54 references.
Keywords/Search Tags:importance sampling, kernel fuzzy c-means, flow clustering, kernel matrix, decline clustering mechanism
PDF Full Text Request
Related items