Font Size: a A A

Research On Large Data Streams Mining Technology Applied In Network Automation Management

Posted on:2017-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:S H WuFull Text:PDF
GTID:2308330485498913Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Frequent itemsets mining is a hot research direction in the field of data mining. With the rapid development of Internet, more and more data generated from real life applications are presented in the form of stream. Real-time online data streams mining has become a research hotspot in recent years.Currently researches on closed frequent itemsets in data streams mining are mainly based on landmark window model, damped window model and sliding window model. This paper described mining closed frequent itemsets in data streams based on the sliding window model. After analysis of the important algorithms, the EMoment algorithm used the sliding window model to mine closed frequent itemsets in data streams is proposed. In addition, considering of the difficulty to set the reasonable minimum support threshold while aim at mining closed frequent itemsets, the ETopK algorithm which used the sliding window model to mine Top-K closed frequent itemsets is presented.The main contents of this paper are as follows:(1) The manifestations of data streams and common processing models are introduced in detail, the important algorithms of closed frequent itemsets in data streams mining based on the sliding window model are listed, and the advantages and disadvantages of these algorithms are given through the analysis towards these algorithms, thereby theoretical basis for this study is also provided.(2) The adding and deleting transactions needs to be carried out in two steps while aim at the Moment algorithm of mining the closed frequent itemsets in data streams based on the sliding window model, hence the EMoment algorithm is proposed in order to accelerate the sliding speed by finding the relevance between new itemsets and oldest itemsets. The summary data structure used in this algorithm are also described in detail. Finally, the experimental comparison of the two algorithms including window sliding time and memory consumption is given. The experimental results showed that the EMoment algorithm has better performance than the Moment algorithm in window sliding speed and memory consumption.(3) Considering of the difficulty to pre-set the minimum support threshold, the ETopK algorithm is proposed, which is used the sliding window model to mine Top-K closed frequent itemsets. The main idea of this algorithm is described combined with concrete example. The performance result of the algorithm shows that ETopK could effectively mine TopK closed frequent itemsets and the proposed algorithm has better performance in the time cost and memory consumption compared with other existing algorithms.
Keywords/Search Tags:Data mining, Data stream, Sliding window, Closed frequent itemsets, Top-K closed itemsets
PDF Full Text Request
Related items