Research And Application Of Clustering Algorithms For Large Scale Data Sets

Posted on:2018-03-16

Degree:Master

Type:Thesis

Country:China

Candidate:J C Yi

Full Text:PDF

GTID:2348330542477855

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology and the Internet,all kinds of behaviors in production and daily life are digitized and informationized.People deal with all kinds of data every day.Not only diversify in form but also explosive in volume of the data.How to use the data mining method to excavate the hidden information from the vast ocean of data to better serve our life has become a main topic of current research.Clustering is a kind of important data mining task.Its purpose is to divide the data into several data subsets according to certain criteria.The data among each subset is more similar and the data difference between the subsets is larger.Clustering task is an unsupervised learning method,which is widely used in information retrieval,image segmentation,bioinformatics and other fields.With the rapid development of storage technology,the cost of storing data is getting smaller,and the available data scale is increasing in all trades.The classic clustering algorithm has achieved very good results on small-scale data sets,when faced with such a large-scale data today,these classic clustering algorithms are unable to complete the task of cluster analysis.Therefore,it is of great value and significance to study large-scale data clustering algorithms that can meet the needs of these challenges.In this paper,we summarize the current large-scale data clustering algorithms,and two large-scale data set clustering algorithms are proposed:(1)A semi-supervised single-pass kernel fuzzy c-means algorithm is proposed,which uses a small amount of labeled seeds to guide the clustering process.We apply this algorithm to the clustering analysis of stellar spectra.It is found that the use of the seeds not only improves the quality of the final clustering results,but also reduces the number of iterations and speeds up the algorithm.(2)A large-scale multi-view clustering algorithm based on bipartite graph integration is proposed,and a new multi-view normalized cut is defined.The use of bipartite graphs greatly accelerates the computational process of spectral clustering,and the method of constructing representative samples at each view is better than that in concatenated feature space,so that a higher quality bipartite graph can be constructed.The fusion of multiple bipartite graphs into a graph avoids the iterative weighting of each view and reduces the parameters of the algorithm.

Keywords/Search Tags:

Large scale, Clustering, Multi-view, Spectral clustering, Bipartite graph

PDF Full Text Request

Related items

1	Research And Application Of Clustering Algorithms For Large Scale Data Sets
2	Research On Fast Graph Clustering Algorithm On Large-Scale Data
3	Research On Multi-View Clustering Algorithm Based On Bipartite Graph Learning
4	Research On Clustering With Multi-view Data
5	Research On Multi-view Clustering Algorithms Based On Graph Learning
6	Research On Spectral Clustering Algorithm And Its Application
7	Multi-view Clustering Via Graph Learning
8	Research On Multi-view Clustering Algorithm Based On Graph Learning
9	Research On Spectral Clustering Methods For Large Scale Datasets
10	High Resolution Remote Sensing Image Multi-scale Segmentation Support By Spectral Graph Theory