Font Size: a A A

A Batch-based Algorithm For Mining Weighted Sequential Patterns In Data Streams

Posted on:2014-05-30Degree:MasterType:Thesis
Country:ChinaCandidate:Y B LiFull Text:PDF
GTID:2268330422466716Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
In the real world, sequential pattern mining has been widely used in various sequencedata sets. In recent years, the traditional sequential pattern mining have achieved manyfruitful work. With the continuous development of information technology and thenetwork, the data scale grows rapidly. In the same time, a new type of data–data streamwhich is different from traditional data types has emerged. Data stream is a continuous bigdata sequence. Because of the characteristics of the data stream, there are many traditionaldata mining algorithms are not suitable for data stream. In this paper, based on dividingthe data stream into batches, we have done the following tasks:Firstly, we presented the latest research work on sequential pattern mining in datastreams. Because most of the existing work use the sliding window and does not considerthe item weight, we improved the lexicographic tree and proposed the idea of dividing thedata streams into batches.Secondly, because the data stream is different from the traditional data types, wemine the sequential patterns in data stream use the divide-and-conquer property. We alsogive different weights to the items in the data stream and propose a new mining algorithmWSPD. In the algorithm, the data stream is divided into batches with same size, and theweights of items are taken into consideration. The algorithm outputs a accurate miningresults quickly, and it could also satisfy the user’s dynamic request for mining weightedsequential patterns from any batchesFinally, in order to take advantage of the hardware resources, we proposed a parallelalgorithm–P-WSPD based on the WSPD algorithm mentioned above. The P-WSPDalgorithm accelerate the efficiency of WSPD algorithm and the speed of data processing.The experiments are implemented by Myeclipse8.5with Java programminglanguage and performed over a number of synthetic data sets. The experimental resultsshow that the proposed several algorithms solve their problem efficiently, the running timeof the two algorithms is lower than others, and the accuracy of the results have beenimproved to some extent. Our work achieved the goals which are set previously.
Keywords/Search Tags:data stream, weighted sequential patterns, closed sequential pattern, parallelcomputing, minimum weighted support
PDF Full Text Request
Related items