Research And Implementation Of Frequent Pattern Mining Algorithms Over Data Streams

Posted on:2014-01-04

Degree:Master

Type:Thesis

Country:China

Candidate:L Chang

Full Text:PDF

GTID:2248330395998124

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the large-scale application of database management system,many fields have accumulated mass data. The requirement of how toeffectively use these data promotes the appearance and rapiddevelopment of Data Mining technology. However, along with the rise ofsome fields, such as computer network monitoring, weather monitoring,financial quotations and sensor network, a new data processing modelwhich is called data streams is proposed. The data in this data processingmodel continuously arrive at a high speed, and the mining algorithm candeal with the data only once, so the design of frequent itemsets miningalgorithm in the environment of data streams is a challenging task. The limitlessness and high-speed of data streams determine themining algorithm which must be approximate. Specific to the problem ofhow to design a effective synopsis data structure, this thesis analyzes theclassical frequent pattern tree, FP-tree structure, combines the improvedsliding window model which updates with basic window, designs atwo-dimensional array structure to store the frequent itemsets, andproposes the special assignment method of this two-dimensional array.Based on the proposed two-dimensional array structure, this thesisdesigns a frequent itemsets mining algorithm, MFIBA(Mining FrequentItemsets based on Bitwise AND) algorithm, which moves towards datastreams. This algorithm is a batch processing program, first of all, it setsup a fixed-size sliding window in the arriving data streams, then dividesthe sliding window into some basic windows with the same width, andfinally utilizes the two-dimensional array structure to store the frequent itemsets information of every basic window. When there are user requests,the mining algorithm generates all the frequent itemsets by implementingbitwise AND operation between every two lines of the array, andcalculates the support counts of every frequent itemsets according to thevalue stored in the array. As an approximate mining algorithm, MFIBAalgorithm introduces a permitted error parameter, so it can effectivelydelete the infrequent itemsets, and save the memory resources.By comparing the performance of MFIBA algorithm with the Apriorialgorithm, which is a classical algorithm of data mining, it is obvious thatthe algorithm proposed in this thesis is better than Apriori algorithminterms of running time and memory consumption, so it is applicable tothe data streams mining.

Keywords/Search Tags:

data streams mining, sliding window, synopsis datastructure, bitwise operation

PDF Full Text Request

Related items

1	Optimal Data Streams Clustering Algorithm Based On N-δ Sliding Window Model
2	The Research On The Related Problems Of Association Rule Mining Over Data Streams
3	Research On Mining Algorithms Over Data Streams
4	Research On Algorithm For Mining Frequent Closed Itemsets Over Data Streams
5	The Research On The Algorithm Of Mining Frequent Patterns Over Data Streams
6	The Research And Realization Of Clustering Algorithm In Data Streams Mining
7	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams
8	Research On Technique And Application Of Mining Data Streams
9	Research On Algorithm For Mining Frequent Patterns Of Data Streams
10	Research On Algorithms For Mining Top-K Frequent Patterns Over Data Streams