Font Size: a A A

Research And Implementation Of Frequent Pattern Mining Algorithms Over Data Streams

Posted on:2014-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L ChangFull Text:PDF
GTID:2248330395998124Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the large-scale application of database management system,many fields have accumulated mass data. The requirement of how toeffectively use these data promotes the appearance and rapiddevelopment of Data Mining technology. However, along with the rise ofsome fields, such as computer network monitoring, weather monitoring,financial quotations and sensor network, a new data processing modelwhich is called data streams is proposed. The data in this data processingmodel continuously arrive at a high speed, and the mining algorithm candeal with the data only once, so the design of frequent itemsets miningalgorithm in the environment of data streams is a challenging task. The limitlessness and high-speed of data streams determine themining algorithm which must be approximate. Specific to the problem ofhow to design a effective synopsis data structure, this thesis analyzes theclassical frequent pattern tree, FP-tree structure, combines the improvedsliding window model which updates with basic window, designs atwo-dimensional array structure to store the frequent itemsets, andproposes the special assignment method of this two-dimensional array.Based on the proposed two-dimensional array structure, this thesisdesigns a frequent itemsets mining algorithm, MFIBA(Mining FrequentItemsets based on Bitwise AND) algorithm, which moves towards datastreams. This algorithm is a batch processing program, first of all, it setsup a fixed-size sliding window in the arriving data streams, then dividesthe sliding window into some basic windows with the same width, andfinally utilizes the two-dimensional array structure to store the frequent itemsets information of every basic window. When there are user requests,the mining algorithm generates all the frequent itemsets by implementingbitwise AND operation between every two lines of the array, andcalculates the support counts of every frequent itemsets according to thevalue stored in the array. As an approximate mining algorithm, MFIBAalgorithm introduces a permitted error parameter, so it can effectivelydelete the infrequent itemsets, and save the memory resources.By comparing the performance of MFIBA algorithm with the Apriorialgorithm, which is a classical algorithm of data mining, it is obvious thatthe algorithm proposed in this thesis is better than Apriori algorithminterms of running time and memory consumption, so it is applicable tothe data streams mining.
Keywords/Search Tags:data streams mining, sliding window, synopsis datastructure, bitwise operation
PDF Full Text Request
Related items