Font Size: a A A

A Study Of Large-scale Data Clustering Based On Fuzzy Clustering And Its Application

Posted on:2015-03-09Degree:MasterType:Thesis
Country:ChinaCandidate:G L YangFull Text:PDF
GTID:2308330464968656Subject:Circuits and Systems
Abstract/Summary:PDF Full Text Request
With the rapid development of modern science, we step into the age of information. Information in many industries is growing explosive. How to find really useful information from these massive data become the focus of attention. Data- Mining is an important tool for Information-Decision and Knowledge Discovery in Database. Clustering is an important means of Data-Mining. The purpose of clustering is divided massive samples or abstract data into several subsets by their similarity among themselves, discovery the structure of these data, help us understand the hidden information in these data. With the development of technology, the scale of the many databases is bigger and bigger, the features contains in these data also more and more complicated. Many traditional clustering algorithms are not able to deal such a massive data. Researchers begin to find new algorithms which can deal big scale database. In this paper we introduced the stream-clustering into the Fuzzy C lustering and proposed the stream-clustering model aim to the massive data, make the traditional able to deal big scale data. The main contribution of this paper are listed as follows :(1)We proposed an online clustering algorithms based on the Fuzzy-C-Means(FCM) weighted by the density of samples(OWFCM), which is suitable to deal big scale handwritten digital images. We deal the data as the method of stream clustering, get in one data point each loop and calculate the membership with the current cluster center. Make different operation to the sample based on the max value of these membership. The key point of our algorithm is that we designed a clustering structure based on stream clustering. we absorbed the update method of cluster center in online k- means algorithm and the Fuzzy-C-Means algorithm weighted by the density of samples. And recognize the handwritten digital images unsupervised by deal the images one-by-one. Our algorithm didn’t deal all the data in one time that means our algorithm needn’t a very good computer. Compared with the WFCM algorithm chunk-by-chunk, OWFCM reduce the time complexity, because most samples update the cluster center directly and reduce the times of using WFCM. It is more suitable for the recognition of big scale handwritten digital images.(2)Based on the algorithm we proposed in last section, we proposed another improved stream clustering algorithm. The last algorithm in last section was based on the Fuzzy-C-Means weighted by the density of samples, the weight was calculate by the density of sample around one sample. This method indeed improved the speed of convergence, but it takes long time to calculate the weights of every sample and improved the time complexity of algorithm. In order to avoid this kind of situation, we proposed an improved stream clustering based algorithm on the basis of Single-Pass FCM(SPFCM), we call it str WFCM. We have used the SPFCM in the last section as a comparing algorithm. In str WFCM every sample in the database was also weighted, but every weight value was set to 1. O nly the weight of cluster center will increase. If a sample was assign to one center, the weight value of the center will bigger. And these samples with bigger weight will be more likely chose as the cluster center.(3)We find that there are many similarities between community detective and clustering after research on current community detective algorithms, and there are several community detective method based on clustering analysis. Because of the complex structures of the feature matrix of complex network, many clustering algorithm didn’t work efficient. So we introduced the quantum clustering algorithm into the clustering part. The neighborhood information is also introduced into the part of clustering and this improvement indeed reduce the time complexity. Then we compared our method with other three algorithm on the benchmark network and the real world network.
Keywords/Search Tags:Stream clustering, Fuzzy clustering, Handwritten digital images, Quantum clustering, Community detective
PDF Full Text Request
Related items