Font Size: a A A

A Multi-flow Streaming Data Fre Quent Pattern Mining Adaptive Algorithm

Posted on:2018-10-09Degree:MasterType:Thesis
Country:ChinaCandidate:F FengFull Text:PDF
GTID:2348330563452491Subject:Master of Engineering / Software Engineering
Abstract/Summary:PDF Full Text Request
With the development and application of Internet technology and communication technology,more and more data are accumulated in the fields of economy,natural science and engineering technology.Among them,there is a kind of important data,which can reflect the relationship of data in time,called time series data..The frequent pattern mining of time series data can find out the periodic changes and frequent patterns in the order of time,and help the decision makers to make more reasonable decisions.In real life,a large number of time series data in the field are constantly generated and difficult to save.For example,the interactive data in the network,the real-time data of the stock exchange,the real-time data of the satellite communication and so on,this kind of dynamic,fast and massive data,known as streaming data.Although the mining of frequent patterns for static data has been relatively perfect,the frequent pattern mining of the stream data is still in an immature stage.Because the stream data can only be scanned for a single time and can not be saved completely,it is difficult to meet the requirement of the stream data processing in the traditional serial frequent pattern mining algorithm.How to improve the efficiency of the algorithm so that the algorithm can be applied to the frequent pattern mining of stream data has become a hot research direction.Based on the analysis of the existing time series data mining algorithms,this paper proposes a parallel algorithm for mining frequent patterns in data streams with multi data flows called Parallel-Pisa.Because the original algorithm uses serial processing to process the time series data,the algorithm has low efficiency and can not meet the requirement of stream data processing.Therefore,this paper from the perspective of improving the efficiency of the algorithm,adopts the parallel processing mechanism to optimize the algorithm.The use of multi-core resources to deal with stream data to improve the efficiency of the algorithm so that it can be better applied to frequent data mining in the stream.Because the stream data is unpredictable,when the data is processed,the velocity of the stream data in different time periods may change greatly.When the explosive growth of the stream data,if we do not increase the resources used by the algorithm may lead to system load is too high,can not deal with the arrival of a large number of streaming data.When the stream data is restored to a lower flow rate,not adjusts the use of resources may lead to the long-term occupation of too many resources,resulting in waste of system resources.In the face of this issue,this paper uses Parallel-Pisa data structure,Parallel-Pisa internal parallel principle and data flow characteristics of three aspects of research,design and implementation of an adaptive parallel strategy is applied to Parallel-Pisa.When the algorithm is applied,it can adjust its own parallel mode,so that the algorithm has higher stability in the process of data stream processing and improves the utilization rate of the system resource.Finally,the Parallel-Pisa and its adaptive strategies are integrated into a system,and the performance of the Parallel-Pisa and the adaptive strategy are tested by the system.The experimental results show that the Parallel-Pisa achieves the expected optimization effect compared with other algorithms,and the adaptive strategy has certain usability.
Keywords/Search Tags:Frequent sequential pattern mining, Stream data, Parallel processing, Self-adaption
PDF Full Text Request
Related items