Font Size: a A A

Research And Application Of Clustering Algorithms For Large Scale Data Sets

Posted on:2018-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:J C YiFull Text:PDF
GTID:2348330542477855Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and the Internet,all kinds of behaviors in production and daily life are digitized and informationized.People deal with all kinds of data every day.Not only diversify in form but also explosive in volume of the data.How to use the data mining method to excavate the hidden information from the vast ocean of data to better serve our life has become a main topic of current research.Clustering is a kind of important data mining task.Its purpose is to divide the data into several data subsets according to certain criteria.The data among each subset is more similar and the data difference between the subsets is larger.Clustering task is an unsupervised learning method,which is widely used in information retrieval,image segmentation,bioinformatics and other fields.With the rapid development of storage technology,the cost of storing data is getting smaller,and the available data scale is increasing in all trades.The classic clustering algorithm has achieved very good results on small-scale data sets,when faced with such a large-scale data today,these classic clustering algorithms are unable to complete the task of cluster analysis.Therefore,it is of great value and significance to study large-scale data clustering algorithms that can meet the needs of these challenges.In this paper,we summarize the current large-scale data clustering algorithms,and two large-scale data set clustering algorithms are proposed:(1)A semi-supervised single-pass kernel fuzzy c-means algorithm is proposed,which uses a small amount of labeled seeds to guide the clustering process.We apply this algorithm to the clustering analysis of stellar spectra.It is found that the use of the seeds not only improves the quality of the final clustering results,but also reduces the number of iterations and speeds up the algorithm.(2)A large-scale multi-view clustering algorithm based on bipartite graph integration is proposed,and a new multi-view normalized cut is defined.The use of bipartite graphs greatly accelerates the computational process of spectral clustering,and the method of constructing representative samples at each view is better than that in concatenated feature space,so that a higher quality bipartite graph can be constructed.The fusion of multiple bipartite graphs into a graph avoids the iterative weighting of each view and reduces the parameters of the algorithm.
Keywords/Search Tags:Large scale, Clustering, Multi-view, Spectral clustering, Bipartite graph
PDF Full Text Request
Related items