A Schema Feature Based Frequent Pattern Mining Algorithm For Semi-structured Data Stream

Posted on:2018-11-19

Degree:Master

Type:Thesis

Country:China

Candidate:W Q Fu

Full Text:PDF

GTID:2348330563452503

Subject:Master of Engineering / Software Engineering

Abstract/Summary:

PDF Full Text Request

With the development of information technology,massive data is constantly generated.The analysis of the data is no longer a task that can be completed by manpower.To solve this problem,people proposed data mining technology to discover useful information from massive data.Frequent pattern mining is an important task in data mining.Frequent pattern refers to a data fragment that are repeated in the data.And frequent pattern mining refers to finding these frequent patterns from massive data.In the studies of frequent pattern mining,researches on frequent pattern mining for semi-structured data have made some progresses,and researches on frequent pattern mining for data stream also have a lot of focuses.However,only a few studies focus on both semi-structured data and stream data.Therefore,how to efficiently and accurately mine frequent patterns of semi-structured data stream has become the focus of this paper.Semi-structured data stream is real-time,ordered,infinite,continuous and it also has the tree structure.This paper proposed a mining model based on time window which can be used to mine semi-structured data stream.The mode serializes and segments the semi-structured data stream first,then mines each segment of data by the SPrefixTreeISpan algorithm proposed by this paper.In the end,all the mining results will be maintained by a structure called patternTree.And to solve the problem of incorrect mining caused by segmenting,this paper proposed a structure called checkStack and a mining strategy.This paper uses XML data stream as the mining object.Sine there is usually a Schema document to describe the XML data structure,by analyzing the Schema,the inevitable parent-child relationship and the inevitable child-parent relationship can be extracted and be used to optimize the SPrefixTreeISpan algorithm.Experiment shows that the algorithm has better performance and the optimization strategy based on Schema feature is effective.

Keywords/Search Tags:

Frequent Pattern Mining, Semi-Structured Data Stream, Schema Feature

PDF Full Text Request

Related items

1	A Real-time Frequent Pattern Mining Algorithm For Semi Structured Data Streams
2	Study Of Mining Data Streams Based On Semi-Structured Data
3	Study On Semi-structured Data Mining
4	Research On Related Technology Of Frequent Pattern Mining For Semi-structured Data
5	Research On Frequent Pattern Mining In XML
6	Uncertain Data Frequent Pattern Mining Algorithms
7	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
8	The Study On Frequent Patterns Mining And Data Predicting Over Data Streams
9	Research On The Data Model And The Approaches To Data Mining In The Semi-structured Data
10	Research Of Schema Extraction Algorithm Of Semi-structured Data Based On OEM Model