Font Size: a A A

Large Data Set Incremental Fuzzy Clustering Algorithm

Posted on:2019-12-25Degree:MasterType:Thesis
Country:ChinaCandidate:B G HuFull Text:PDF
GTID:2370330545473719Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Clustering is a common data mining algorithm.By dividing the data into multiple clusters,the elements in the same cluster have a higher degree of recognition,while the elements in the different clusters are less similar,so that the useful information in the data is excavated.At first,this article analyzes the domestic and foreign research status in cluster analysis,and elaborates the main problems in incremental fuzzy clustering methods,focusing on the selection of cluster center points in incremental fuzzy clustering methods,the study aiming at the problems existing in the selection of center points in the past algorithms,based on the incremental fuzzy clustering algorithm IMMFC,an incremental fuzzy clustering algorithm based on the minimum weight threshold was proposed to improve the accuracy of the algorithm.First,the algorithm divides the data into multiple data blocks and performs fuzzy clustering on each data block.Second,multiple center points are selected from each cluster in each data block.The number of center points is the minimum number of objects whose sum of weights in the cluster is greater than the given threshold.Finally,all the selected center points are used as the last block of data and fuzzy clustering is performed to obtain the final center point.The accuracy and F-value of the algorithm were tested by two sets of experiments.Experimental results show that the algorithm performs better than IMMFC when the data block size is greater than 10%of the total data.The algorithm proposed in this paper has the following features.Firstly,the algorithm divides the data into multiple small data blocks,solving the problem of being unable to put them into memory once because of the large amount of data;secondly,the algorithm determines the center of each cluster in each data block flexibly.The number of points,the sum of the weights of the center points in a cluster is not less than a certain threshold,thus avoiding the situation that when the weight of elements in a cluster is generally low,the selected center point is insufficient to represent the cluster.At the same time,the algorithm has its drawbacks.When the data block size is 10%of all data,the algorithm does not perform as well as the IMMFC.Secondly,this paper makes a detailed study of the data preprocessing work,and according to the characteristics of the incremental clustering algorithm in this paper,proposes a distance matrix generation algorithm suitable for the algorithm.At last,this paper applies the proposed algorithm to a practical case,which is the mining of Twitter hot topics.This application case gives the general steps of the incremental fuzzy clustering algorithm in Twitter hot topic mining and the solution ideas.It can be an algorithm.The application provides a certain reference.
Keywords/Search Tags:incremental clustering, fuzzy clustering, incremental fuzzy clustering, multi-center, large dataset
PDF Full Text Request
Related items