Font Size: a A A

Research On Frequent Items Problem Using Lower Bound In Massive Data Stream

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:W W TanFull Text:PDF
GTID:2428330545986962Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
It is possible to automatically and continuously generate a large amount of detail data due to the emergence and development of Internet and wireless communication network.Therefore,the new flow-type data that is different from traditional static data has come into being.The data stream is continuous,fast,unlimited,wide area,etc.The algorithm only has the opportunity to process the data when it arrives initially,or it is difficult to access the data again at other times(and the data cannot be saved all).The essence goals of "datalization" are"intellectualization" and "informationization." It is of greatest importance that how to extract the implicit knowledge and information from various types of original data.In summary,design of the data mining algorithm for data stream environment is a challenging and rewarding task.This article mainly researches on the mining of frequent items of data stream and frequent item sets.Main content of the research is summarized,that is innovative points and improved algorithm models are put forward based on the shortcoming of these two existing mining algorithms after the classic counting algorithms "Frequent" and "Space Saving" are learned and analyzed which are based on counting.Core of the new algorithm is that when frequent item sets of data stream are mined,if counters are insufficient,the current status of"counter frequency" and "counter error" will be taken into account at the same time.The new algorithm is different from "Frequent",which has never maintained the counter error,as well as "Space Saving",which has never reduced the counter frequency.It is based on both"frequency" and "error" to ensure that the low-frequency counters in the sequence of counters can always be released over an extended period of time.Considering that the data stream mining algorithm only needs to accurately store high-frequency data item sets,so that the accuracy of the new algorithm can be basically guaranteed.Theoretically,global error of the new algorithm is small than that of "Frequent" and "Space Saving".Finally,experiments show that the improved algorithm has a smaller general error rate.
Keywords/Search Tags:Lower bounds of error, Frequent, Space Saving, Frequent Items, Data Stream Mining
PDF Full Text Request
Related items