Font Size: a A A

A Real-time Frequent Pattern Mining Algorithm For Semi Structured Data Streams

Posted on:2018-09-29Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q TongFull Text:PDF
GTID:2348330563952318Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the big data era,semi structured data,such as graphic data,tree data,and sequence data,is widely applied in the semantic network,social interrelation analysis,and macromolecular information mining.How to mine the interrelation information is key to semi structured data mining.However,the semi structured data has a complex structure and is difficult to store,which makes it hard.Meanwhile,more and more data streams appear,for example,data streams from real-time systems such as social networks,financial management and information monitoring systems,are playing more and more important roles in daily life.These data streams are of large volumes and quick speed,and people often pay more attention to the current data streams and have higher requirements on the timeliness.It is a problem that needs to be solved by the big data technology to efficiently obtain data that people want in real time from the massive and complex stream data.As the big data technology develops,the application requirements are becoming more and more complicated and an increasing number of semi structured data streams need to be processed.However,the current data mining method for the semi structured data cannot satisfy the stream data processing requirements.To solve this problem,this essay remakes the current data mining method for the semi structured data and proposes a mining algorithm based on the time degradation model and batch update modes for frequent pattern mining on the semi structured data streams.This algorithm uses the time degradation model to add weights of new data and reduce weights of the historic data,reducing the impact of expired data during mining and obtaining real-time mining results.In addition,it uses the batch update mode to retain useful information of the mined data,avoiding replicated scanning and processing on the database used in the traditional data stream mining algorithm and preventing waste of resources.The contrast experiment indicates that this algorithm implements the semi structure data stream mining function to effectively process complex and high-speed massive semi structure data streams and reduce the impact of historic data on the real-time obtaining of the frequent pattern data in the data streams,thus meeting the current requirements on the semi structure data stream mining.
Keywords/Search Tags:semi structured data, data stream, frequent pattern mining
PDF Full Text Request
Related items