Font Size: a A A

Data Stream Mining Technology Research

Posted on:2006-09-25Degree:MasterType:Thesis
Country:ChinaCandidate:X K GuoFull Text:PDF
GTID:2208360155461446Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the fast development of the communication technology and network,There are many applications of data stream in many fields, such as Performance measurements in network monitoring and traffic management; Call detail records in telecommunications; Transactions in retail chains, ATM operations in banks and Sensor network data, so the research of data stream application is becoming very important.Mining data stream is a new and hot field of database research.This paper is in order to mine association rule, classifying and prediction, clustering over streaming data.algorithms of data mining over data stream.There are three specialities in the algorithms. (1)Scanning data is only one-pass The algorithm of data mining can only obtain data once. So we must diminish the influence of between old data and new data (2)The result of mining is changed as time goes by.Due to a data stream is arrived by order The algorithm of data mining can be achieved over entire dataset and can only be achieved in a piece of time So the result is the newest next piece of time ,the result maybe be changed .(3)The character in a class can describe the information of all data of entire class Because a data stream is massive ,many results not to adapt to mine the large volumes of data can evaluate the character in a class and the percentage of the character in entire dataset.Algorithms of data mining over data stream in this paper will be introduced .MARODS is a algorithms mining association rule in a data stream It can be achieved to obtain the largest frequent itemset by one-pass scanning a dataset and need smaller memory , While it is not constrained by the large volumes of data. CODS is a classifying algorithm with frequent patterns . It is not a decision tree, so it need not the operation of pruning while it has ahigh running speed .When the users have a request for querying mining result. It uses the data in the window to find some frequent patterns to classify the data and test the classifying rules with new data element if the right ratio is under a user-specified threshold ,it begins to classify the data with the data in the window over again. CDSC is a algorithm to gain a character of a class of data in data stream by clustering the data stream ,It can also achieve to cluster a large volume dataset and obtain some valuable informations based on the percentage of data in entire data.In this paper ,all algorithms we present use the sliding window to deal with the data .we divide data into two parts.one is called online data ,that is streaming data .another is called offline data ,that is a sample of data stream . except traditional algorithms, The algorithms that first are presented have been proved to be correct. The algorithms in this paper are very simple and using a few memory .Our algorithms are efficient to mine data stream by using theory and practical data .
Keywords/Search Tags:Data Stream, Data Mining, Association rules, Classifying, Clustering
PDF Full Text Request
Related items