Research On Frequent Items Problem Using Lower Bound In Massive Data Stream

Posted on:2019-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:W W Tan

Full Text:PDF

GTID:2428330545986962

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

It is possible to automatically and continuously generate a large amount of detail data due to the emergence and development of Internet and wireless communication network.Therefore,the new flow-type data that is different from traditional static data has come into being.The data stream is continuous,fast,unlimited,wide area,etc.The algorithm only has the opportunity to process the data when it arrives initially,or it is difficult to access the data again at other times(and the data cannot be saved all).The essence goals of "datalization" are"intellectualization" and "informationization." It is of greatest importance that how to extract the implicit knowledge and information from various types of original data.In summary,design of the data mining algorithm for data stream environment is a challenging and rewarding task.This article mainly researches on the mining of frequent items of data stream and frequent item sets.Main content of the research is summarized,that is innovative points and improved algorithm models are put forward based on the shortcoming of these two existing mining algorithms after the classic counting algorithms "Frequent" and "Space Saving" are learned and analyzed which are based on counting.Core of the new algorithm is that when frequent item sets of data stream are mined,if counters are insufficient,the current status of"counter frequency" and "counter error" will be taken into account at the same time.The new algorithm is different from "Frequent",which has never maintained the counter error,as well as "Space Saving",which has never reduced the counter frequency.It is based on both"frequency" and "error" to ensure that the low-frequency counters in the sequence of counters can always be released over an extended period of time.Considering that the data stream mining algorithm only needs to accurately store high-frequency data item sets,so that the accuracy of the new algorithm can be basically guaranteed.Theoretically,global error of the new algorithm is small than that of "Frequent" and "Space Saving".Finally,experiments show that the improved algorithm has a smaller general error rate.

Keywords/Search Tags:

Lower bounds of error, Frequent, Space Saving, Frequent Items, Data Stream Mining

PDF Full Text Request

Related items

1	Research On Algorithms For Mining Frequent Patterns In Data Streams
2	Research On Count-based Algorithm For Mining Frequent Items Over Data Stream
3	Algorithm Research Based On Counting For Mining Frequent Items Over Network Traffic Measurement
4	The Research Of Frequent Itemsets Mining Algorithm Over Data Streams
5	A Method Based On The Vertical Division Of Data Stream Frequent Itemset Mining Algorithm
6	Research On Frequent Items Mining And Clustering Algorithms Of Data Stream
7	Methods For Mining Frequent Items Over Data Stream Base On Time Windows
8	Study On Frequent Pattern Mining Algorithms And Pruning Strategies
9	Research On The Algorithm For Mining Frequent Items From Data Streams
10	Research And Application Of Mining Frequent Items Algorithm In Data Streams