Font Size: a A A

Data Stream Clustering Algorithm And Its Application

Posted on:2010-07-09Degree:MasterType:Thesis
Country:ChinaCandidate:H Z YangFull Text:PDF
GTID:2208360278467469Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, because of the high speed development of computer and its application, the ability of people to get data has a tremendous exaltation, Data Streams is an important type of data source, and is subjected to more and more concern, data mining technique and its algorithm which based on data streams model have become an important applied topic.A data stream is a data sequence, which is ordered, high-speed, large amount, coming continuously and unknown. Usually a data stream can be regarded as a dynamic data set, the scale of which increases infinitely with the lapse of time.As a result, it is too expensive to access a point in stream randomly, and thus requiring a single scan over the stream has become an object of the clustering algorithms.Clustering on streaming data brings forward challenges to traditional clustering algorithms in the following aspects.For example, obtaining high quality clusters by only one-pass over the data;time-window analysis over an arbitrary period of the stream etc. At present, stream algorithms are still facing some problems, for example:bad quality of clusters due to the loss of global information caused by dividing the stream;high time complexity etc.This paper has made a profound research on a clustering algorithm and its application on data stream, put forward an improved sequence algorithm for clustering data stream, then applied it to two domains, E-mail filtering and intrusion detecting, including:The first of all, put forward an improvement of sequence algorithm for clustering data stream.It aims to keep the precise on clustering and improve its process speed.Using the standard tools set to solve its difficulties in it.Based the experiment, we can see that the algorithm has reached the goals.Second, based on the characteristics of the spam mails, establishing a model with sequence algorithm for clustering data stream and support vector machine for filtering spams.According to the experiment, the model has higher classification precise and processing speed, also it has better self-learing anf self-adaptive ability.At last, due to the complication of networks and the intrusion method up to date, we design a model with sequence algorithm for clustering data stream and support vector machine to detect intrusion.As a result of the experiment, it improves the effect of detection and strenthening the ablity of self-learning;Also it can be trained off-line and has better self-adaptive than others.
Keywords/Search Tags:Data Stream, Clustering Anlysis, Support Vector Machine, Intrusion Detection, E-mails Filtering
PDF Full Text Request
Related items