Font Size: a A A

Theoretical Analysis And Algorithm Study On Improvement Of Finding Frequent Items In Data Streams

Posted on:2010-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:X Y WangFull Text:PDF
GTID:2178360278462173Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Finding frequent items in data streams is a fundamental task within data stream mining, which has various applications. It's gradually becoming one of the main focuses in the field of data mining. We have discovered new possibility to further improve the performance of frequent item mining according to study on the features of existing algorithms and theory, which is to make exchange between the guarantee of upper error bound and accuracy of monitored frequency. Exactly speaking, we manage to exchange a few false-negative errors for a big improvement. We built several models, proved some detailed properties and designed corresponding algorithms, among which SS_Complete and SS_Complete_K prove our target to be tangible, SS_Random_r is completely randomized, while SS_COB_w, SS_lbCount_c and SS_lbCountV take the features of data structure into consideration. Besides, we constructed a series of new indexes in order to subtly valuate the performance of these new algorithms. Experiments show that using the same number of counters, all new algorithms outperform Space Saving which has been proved to be one of the best and most effective methods. SS_Random_r and SS_COB_w are even better than Space Saving with twice amount of counters. Simultaneously the new index shows that the new methods are usually companied by slightly underestimated frequency, which just proves our theory to be correct and successful. We also present new methods to calculate tight error bounds for Space Saving after deep study, which would serve as an important complement to the original algorithm.
Keywords/Search Tags:computing technique, data mining, data streams, frequent items
PDF Full Text Request
Related items