Font Size: a A A

Research On Clustering Of Stream Data Based On Sliding Window

Posted on:2010-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:L K WangFull Text:PDF
GTID:2178360278959398Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The rapid development of modern computer network and sensor network technology has lead to the appearance of a class of new data with broad development prospects-the application of data stream ,such as monitoring of environment and astronomy, computer network monitoring, financial stock business and analysis of web-searched log. Relative to the static data in the traditional database, streaming data has the following characteristics: continuous, rapidly and unlimited amount. The unlimited amount of data stream often makes the mining of data stream impossible to retain all the original data information, only the summary information of the series of original data can be maintained in the memory. And the final result is generated based on the summary information. Therefore, the result of data stream mining is often a approximation result with a certain allowable error.More and more scholars both at home and abroad have been studying and exploring the cluster mining based on the data stream. And they have put forward a number of models and methods to deal with the stream data, such as the landmark model, the sliding window model and so on. Based on the sliding window model, a new clustering method was put forward in this paper. We also apply the theory of kernel density estimation to the density estimation of stream data under the sliding window model. The major work is as following:Clustering of the data stream under the sliding window model is realized by the technology of exponential histogram, it is used as a kind of data structure to store the summary information of the stream data in the paper, also realizes the increase and deletion of the data under the sliding window model. The on-line layer mainly realizes micro-cluster and the incremental maintenance of the data. That the new-reached data is absorbed by the original EHCF or the new-created EHCF is determined by calculating the distance between the new data and the original EHCF. The off-line layer can use a more mature algorithm to fulfill clustering, the algorithm uses the on-line result as the input to complete the macro clustering. Ultimately there will be a more accurate clustering result.The theory of density estimation of the stream data uses the basic window technology, it divides the sliding window into a number of basic windows equally. The data information of each basic window is stored by a vector with four elements. Kernel density estimation theory is applied to the density estimation of data stream under the sliding window model. All the density functions are accumulated to get the similar density distribution of the original data. Only the number of the basic window will influence the accuracy of the result. Experiment indicates that with the increase of the number of the basic window, the method can reach a certain accuracy.
Keywords/Search Tags:stream data, exponential histogram, kernel density estimation
PDF Full Text Request
Related items