Research On Frequent Item Mining And Correlation Analysis In Data Streams

Posted on:2018-04-23

Degree:Doctor

Type:Dissertation

Country:China

Candidate:S S Wu

Full Text:PDF

GTID:1318330518473525

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data stream applications first appeared in the financial field(i.e.,the traditional bank and the stock exchange),and then appeared in the geologic survey,meteorology,astro-nomical observation,traffic,medical treatment,etc.Especially the emergence of the Inter-net(real network monitor,click stream)and wireless communication network(call log),it is necessary to analyze and mine data streams.For example,the technologies of frequent item and correlation analysis of data streams can be applied to smart healthcar and detecting suspicious behavioral.Hence,it is a valuable work to mine frequent item and correlation analysis of data streams.Moreover,they have been served as an important basic work for other data stream mining techniques.Data mining techniques have been devoted to the data streams,such as mining frequent item(itemset),correlation analysis,clustering,classification,sequential pattern analysis,etc.Any data stream mining algorithm solves two problems.One is the query response time,i.e.,how to process data in real time to match the streaming data arrival rate.At the technical level,it needs to propose a new or improve an existing data structure and pruning strategies.The other is how to compress storage space.At the technical level,it needs to come up with a sketch structure with a small memory and provide approximate results.According to the above analysis,this thesis aims to solve the query response time and compression storage space in the frequent item mining problem and correlation anal-ysis problem in the data streams.Based on the existing data stream mining technologies,this thesis aims to come up with data structures and sketch structures to process the data efficiently and improve the mining accuracy.They are mainly as follows:Finding frequent items in time decayed data streams.This problem on a new stream-ing model based on the time decay is revisited,where the importance of every arrival item is decreased over the time.To address the importance changes over the time,it needs to design an innovative heap structure,which maintains the item order,to improve frequent item mining efficiency.To achieve better accuracy of frequency estimation,this thesis studies a new sketch structure,which can estimate the count of an item with almost no error,to improve frequent item mining accuracy.Finding the hottest item in a data stream.Aiming at a wide variety of query re-quirements,such as monitoring the peak sales records.Existing algorithms cannot be applied to these new requirements.Hence,this thesis explores a new data stream mining problem-the hottest item.To discover the hottest item,it needs to propose an algorithm with an efficient data structure and several pruning strategies to reduce the search space progressively.Ranking lag correlations with flexible sliding windows.Existing lag correlation analysis work focus on two aspects,computing lag correlations on the entire data stream and setting a proper sliding window length.However,the sliding window length is hard to set,which should be set based on the characteristic of data streams,applications,time and queries.Hence,this thesis analyzes the lag correlation which is computed based on flexible sliding windows.To boost the computation,this thesis attempts to employ an efficient data structure to facilitate the query processing.This thesis studies the counting problem(mining frequent items in data streams),the frequency(finding the hottest items in data streams),the lag correlation of data streams(ranking lag correlations with flexible sliding windows in data streams).The research of this paper is only a preliminary attempt and exploration,but there are still many researches that need to be further explored.For example,data stream mining with a changing rate and data stream processing with the Hadoop or Spark.

Keywords/Search Tags:

Data Stream Mining, Frequent Item, Frequency Calculation, Lag Correlation, Flexible Sliding Window

PDF Full Text Request

Related items

1	Mining Frequent Itemsets Over Recent Data Stream
2	Study On Probabilistic Frequent Pattern Mining Over Uncertain Data Stream
3	Research On Multi-stream Frequent Item Set Mining Algorithm
4	Research On The Algorithm For Mining Frequent Items From Data Streams
5	Frequent Itemsets Mining Algorithm And Its Application In Data Flow
6	Study On Key Technologies Of Frequent Items Mining And Clustering On Data Streams
7	Research On Frequent Patterns Mining Algorithm Based Sliding Window In Data Streams
8	Research On Optimization Of Data Stream Frequent Itemsets Mining Algorithm Based On Sliding Window
9	Research On Frequent Pattern Mining Algorithm Of Data Stream Based On Sliding Window
10	Research On Frequent Pattern Mining Algorithm Oriented To Data Stream