Font Size: a A A

Research On Fuzzy Clustering Algorithm For Data Stream

Posted on:2017-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:X D ChenFull Text:PDF
GTID:2308330488997122Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a research front of data mining, with the characteristics of continue timely arrived at a high speed and dynamic changed, data stream has played an important role in many filed, such as wireless senor network, financial analysis market, network intrusion detection and so on. As a mainly part of data mining, clustering analysis can discovery cluster models in data, and help users to mark accurate decisions. As one of data stream researches, concept drift detection is useful to figure out when and why data distribute model changed, and help to predict the trend of stream. However, current cluster algorithms are focusing on whole data set, and it‘s hardly to be applied in stream. Therefore, it is necessary to design a new clustering method, which could cluster the stream and detect the concept drift effectively.In this thesis, the studies of data stream clustering are mainly divided into three parts. Firstly, it reviews the research status and shortcomings about data stream through analysis of relevant literature. Secondly, in order to solve the limitation of time and space, it proposes a new fuzzy-clustering algorithm for data stream in this paper. The algorithm divides the whole data stream into parts. Each part is processed by a reformed weighted fuzzy c-means algorithm. The structures of micro-clusters and weight-decay help to improve quality of clustering. Experimental results show that algorithm has better accuracy than SWFCM and StreamKM++ algorithm. At last, in order to detect concept drift in data stream, the thesis proposes to measure the difference of clusters between adjacent windows to determine whether concept drift occurs within variable siding window. The result shows that algorithm can detect concept drift in data stream effectively, and has great performance in clustering quality and time.
Keywords/Search Tags:Data Stream, Fuzzy Clustering Means, Weight Decay, Concept Drift, Variable Sliding Window
PDF Full Text Request
Related items