Font Size: a A A

Research On Incremental Multiple Medoids Clustering Algorithm Based On Weighted For Big Data

Posted on:2018-12-23Degree:MasterType:Thesis
Country:ChinaCandidate:D X GaoFull Text:PDF
GTID:2348330542459768Subject:Software engineering
Abstract/Summary:PDF Full Text Request
As an important technique of data analysis,clustering plays an important role in finding the potential pattern structure in unlabeled data.For traditional clustering methods,the entire dataset needs to be loaded into the memory while nowadays the internet related enterprises developed rapidly and they produced much data at all times.The data volumes produced by these enterprises are at the PB or TB level while it is almost impossible to load such huge data into memory.The method of sampling is easy to be considered when the question of such large scale data was arose.The sampling method could solve the problem of data loading into memory actually.However,some information of the data will be losed when the sampling method is been adopted at the same time.In addition,the nonspherical and unbalanced datasets could not be solved by traditional clustering algorithm.In this paper,a new incremental clustering approach called incremental fuzzy clustering with multiple medoids based weighted method is proposed(wIMMFC)to deal with the nonspherical and unbalanced datasets,which is proposed as the extension of IMMFC.Three basic strategies are adopted for the proposed approach.First,the entired dataset is divided into s chunks.Second,multiple medoids is selected for each classes in each chunk.Third,the relationships of identified medoids in each chunk is used to guide the final clustering.Fourth,the candidate medoids from all chunks is weighted.In contrast to the original IMMFC,there are some changes for wIMMFC.First,the improved maximize the minimal distance algorithm is used to initialize the medoids and local optimization will then be avoided.Second,the full use of the intermediate clustering results through weighting on candidate medoids makes the final clustering more effective.In our experiments,The proposed approach is compared with another three algorithms in 5 datasets(2 real world datasets included).Our results show that the proposed approach is more effective.
Keywords/Search Tags:multiple medoids, big data, clustering algorithm, IMMFC, nonspherical, unbalanced, weight
PDF Full Text Request
Related items