Research On Frequent Pattern Mining Methods For Large-scale Date Stream

Posted on:2019-04-03

Degree:Master

Type:Thesis

Country:China

Candidate:X Fu

Full Text:PDF

GTID:2428330542994222

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the burst of internet technology,massive data streams are being produced on many different application fields in an unprecedented speed,such as e-commerce,social network,intelligent transportation,sensor network and so on.Compared with traditional static offline data mining,data mining technology on stream raises new de?mands on processing model,state management,and load balance,etc.The requirements are as follows but not limited to these:1.scalability:algorithm should be able to adapt to fluctuation in data traffic and automatically expand;2.state management:stream in-formation reqiures efficient method to maintain and update;3.load balance:algorithm needs a load balance strategy to keep it's performance stable.In this paper,we do some work on frequent pattern mining over massive data stream,which are several shortbacks or disadvantages in this application field.We propose a new Balanced Parallel Frequent Pattern Mining over Massive Data Stream(BPFPMS)to solve those problems and do some experiments to prove the improvement of our algorithm,our main works are as follows:1.To solve the problem of state management,we propose a DPTS-Tree model to represent massive data stream,on which can perform efficient maintainance and update.Compared with traditional methods,DPST-Tree can take the full ad-vantage of historical information to reduce the load of maintainance and update.Experiments prove that it has a good advantage in terms of memory usage,status update,etc.2.To solve the problem of load imbalance that often occurs in traditional paral-lel frequent pattern mining algorithms,we propose a new dynamic load balance strategy.Compared to the traditional static threshold setting methods,dynamic load balance strategy shows a better performance over massive data stream,as it takes the full advantage of load information within sub-tasks.At the same time,it also meets the dynamic uncertainty brought by the real-time changes in the data flow status.Experiments show that it can achieve load balance to a certain ex-tent and ensure the performance stability of BPFPMS under large-scale data flow scenario.We implement BPFPMS based on the popular big data processing platform Spark and Kafka,a distributed message system.At the same time,we test its performance in terms of throughput,speedup,load balancing,and latency,etc.Experiments show that BPFPMS has good results in terms of state management,load balancing,etc.

Keywords/Search Tags:

analysis of data stream, frequent pattern mining, state management, load balance, Kafka&Spark Streaming

PDF Full Text Request

Related items

1	Research And Implementation Of Sequential Pattern Mining Algorithm Over Data Streams Based On Spark Streaming
2	Design And Implementation Of Kafka-based Full-Link Stream Data Processing Platform
3	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
4	The Study On Frequent Patterns Mining And Data Predicting Over Data Streams
5	Design And Implementation Of Log Stream Analysis Of Computer Room Security Equipment Based On Spark On Yarn
6	A Multi-flow Streaming Data Fre Quent Pattern Mining Adaptive Algorithm
7	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream
8	Frequent Pattern Mining Algorithm Research For Data Stream
9	Research On The Algorithm Of Data Stream Frequent Itemsets Mining
10	Research On Key Algorithms For Mining Frequent Patterns In Data Streams And Their Application In Simulation System