Theoretical Analysis And Algorithm Study On Improvement Of Finding Frequent Items In Data Streams

Posted on:2010-09-21

Degree:Master

Type:Thesis

Country:China

Candidate:X Y Wang

Full Text:PDF

GTID:2178360278462173

Subject:Management Science and Engineering

Abstract/Summary:

Finding frequent items in data streams is a fundamental task within data stream mining, which has various applications. It's gradually becoming one of the main focuses in the field of data mining. We have discovered new possibility to further improve the performance of frequent item mining according to study on the features of existing algorithms and theory, which is to make exchange between the guarantee of upper error bound and accuracy of monitored frequency. Exactly speaking, we manage to exchange a few false-negative errors for a big improvement. We built several models, proved some detailed properties and designed corresponding algorithms, among which SS_Complete and SS_Complete_K prove our target to be tangible, SS_Random_r is completely randomized, while SS_COB_w, SS_lbCount_c and SS_lbCountV take the features of data structure into consideration. Besides, we constructed a series of new indexes in order to subtly valuate the performance of these new algorithms. Experiments show that using the same number of counters, all new algorithms outperform Space Saving which has been proved to be one of the best and most effective methods. SS_Random_r and SS_COB_w are even better than Space Saving with twice amount of counters. Simultaneously the new index shows that the new methods are usually companied by slightly underestimated frequency, which just proves our theory to be correct and successful. We also present new methods to calculate tight error bounds for Space Saving after deep study, which would serve as an important complement to the original algorithm.

Keywords/Search Tags:

computing technique, data mining, data streams, frequent items

Related items

1	Research On Algorithms For Mining Frequent Patterns In Data Streams
2	Research On The Algorithm For Mining Frequent Items From Data Streams
3	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
4	The Research Of Frequent Itemsets Mining Algorithm Over Data Streams
5	Research On Sketch-Based Data Streams Mining Of Frequent Itemsets
6	Research And Application Of Mining Frequent Items Algorithm In Data Streams
7	Research On Frequent Items Mining Technology Based On Trajectory Data
8	Research On Count-based Algorithm For Mining Frequent Items Over Data Stream
9	Study On Frequent Data Mining From Uncertain Data Streams
10	Research On Frequent Pattern Mining Algorithms In Uncertain Data Streams