Font Size: a A A

Mining Frequent Itemsets Of Data Streams Algorithm Research Based On Storm

Posted on:2019-03-07Degree:MasterType:Thesis
Country:ChinaCandidate:C C NiuFull Text:PDF
GTID:2428330548970212Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,the volume of data is unprecedented,even reaching the level of Yottabyte(YB).At the same time,data streams get into people's daily life,such as real-time trading data in the stock market.The real-time traffic monitoring data and the real-time log access of the users on the Internet website,etc.,and these data have some characteristics which the static data do not have in the past,these data come in the form of continuous infinite,high speed and change.And the data that comes in in real time can't all be stored.Because of these features of the data streams,It makes it difficult to apply the original algorithm for static data mining.So how to mine useful information from these large-scale real-time data,The mining of frequent itemsets is an important research content of data streams mining.This paper introduces several classical frequent itemsets mining algorithms,distributed frequent itemsets mining algorithms and data streams frequent itemsets mining algorithms.In order to find out the frequent itemsets in the data streams more effectively,This paper presents an algorithm of Distributed Mining Frequent Itemsets on Data Streams based on sliding window.The algorithm is implemented on Storm,a real-time distributed data streams processing platform.The algorithm is read by using message middleware kafka.The data is simulated to generate the data stream,After that,the frequent itemsets in the data streams are mined by using the sliding window method based on the byte sequence,and then all frequent itemsets are obtained by merging the mining results.The main work of the algorithm proposed in this paper is as follows:(1)in the stage of distributed mining data stream,the frequent itemsets in data stream are mined by using the method of sliding window based on byte sequence.(2)in the aggregation stage of frequent itemsets of data streams,the frequent itemsets generated in distributed mining phase are counted by LTWT Tree(Trie with Logarithmic Tilted-Time Window),which not only reduces the number of traversal,but also reduces the complexity of space;At the same time,in the aggregation phase of frequent itemsets of data flow,the frequent itemsets are stored and counted by introducing logarithmic tilting time window,which not only reduces the number of windows that need to be constructed,Moreover,the results of mining can also show the different situations of frequent itemsets in historical time period and current time period,which means that the extracted frequent itemsets have better time characteristics.(3)the proposed DMFIDS algorithm is implemented on the real-time distributed data stream processing platform Storm.(4)Finally,a prototype system of data streams frequent itemsets mining based on Storm is constructed.Three important components of the system are introduced:data collection module?data processing module and data storage module.
Keywords/Search Tags:Data mining, Data streams, Frequent itemsets mining, Storm
PDF Full Text Request
Related items