Large Data Set Incremental Fuzzy Clustering Algorithm

Posted on:2019-12-25

Degree:Master

Type:Thesis

Country:China

Candidate:B G Hu

Full Text:PDF

GTID:2370330545473719

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

Clustering is a common data mining algorithm.By dividing the data into multiple clusters,the elements in the same cluster have a higher degree of recognition,while the elements in the different clusters are less similar,so that the useful information in the data is excavated.At first,this article analyzes the domestic and foreign research status in cluster analysis,and elaborates the main problems in incremental fuzzy clustering methods,focusing on the selection of cluster center points in incremental fuzzy clustering methods,the study aiming at the problems existing in the selection of center points in the past algorithms,based on the incremental fuzzy clustering algorithm IMMFC,an incremental fuzzy clustering algorithm based on the minimum weight threshold was proposed to improve the accuracy of the algorithm.First,the algorithm divides the data into multiple data blocks and performs fuzzy clustering on each data block.Second,multiple center points are selected from each cluster in each data block.The number of center points is the minimum number of objects whose sum of weights in the cluster is greater than the given threshold.Finally,all the selected center points are used as the last block of data and fuzzy clustering is performed to obtain the final center point.The accuracy and F-value of the algorithm were tested by two sets of experiments.Experimental results show that the algorithm performs better than IMMFC when the data block size is greater than 10%of the total data.The algorithm proposed in this paper has the following features.Firstly,the algorithm divides the data into multiple small data blocks,solving the problem of being unable to put them into memory once because of the large amount of data;secondly,the algorithm determines the center of each cluster in each data block flexibly.The number of points,the sum of the weights of the center points in a cluster is not less than a certain threshold,thus avoiding the situation that when the weight of elements in a cluster is generally low,the selected center point is insufficient to represent the cluster.At the same time,the algorithm has its drawbacks.When the data block size is 10%of all data,the algorithm does not perform as well as the IMMFC.Secondly,this paper makes a detailed study of the data preprocessing work,and according to the characteristics of the incremental clustering algorithm in this paper,proposes a distance matrix generation algorithm suitable for the algorithm.At last,this paper applies the proposed algorithm to a practical case,which is the mining of Twitter hot topics.This application case gives the general steps of the incremental fuzzy clustering algorithm in Twitter hot topic mining and the solution ideas.It can be an algorithm.The application provides a certain reference.

Keywords/Search Tags:

incremental clustering, fuzzy clustering, incremental fuzzy clustering, multi-center, large dataset

PDF Full Text Request

Related items

1	Studies On New Fuzzy Clustering Algorithms
2	Research On Incremental Fuzzy Clustering Algorithm Of Time Series Data
3	Fuzzy Clustering And Its Applied Research In The Chinese Text Clustering
4	RCP-based GIS System And Clustering Technology Research
5	Improvement Of Fuzzy Time Series Forecasting Model And Its Application
6	Research On Optical Remote Sensing Image Change Detection Based On Fuzzy Clustering Algorithm
7	Study On Several Problems In Fuzzy Clustering
8	The Appliccation Of Fuzzy Clustering In The Optimization Decision
9	Fuzzy Clustering Approaches Based On AFS Fuzzy Logic
10	Study On Gray Correlation Fuzzy Clustering Method And Its Application