Font Size: a A A

Research On Frequent Item Mining And Correlation Analysis In Data Streams

Posted on:2018-04-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:S S WuFull Text:PDF
GTID:1318330518473525Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data stream applications first appeared in the financial field(i.e.,the traditional bank and the stock exchange),and then appeared in the geologic survey,meteorology,astro-nomical observation,traffic,medical treatment,etc.Especially the emergence of the Inter-net(real network monitor,click stream)and wireless communication network(call log),it is necessary to analyze and mine data streams.For example,the technologies of frequent item and correlation analysis of data streams can be applied to smart healthcar and detecting suspicious behavioral.Hence,it is a valuable work to mine frequent item and correlation analysis of data streams.Moreover,they have been served as an important basic work for other data stream mining techniques.Data mining techniques have been devoted to the data streams,such as mining frequent item(itemset),correlation analysis,clustering,classification,sequential pattern analysis,etc.Any data stream mining algorithm solves two problems.One is the query response time,i.e.,how to process data in real time to match the streaming data arrival rate.At the technical level,it needs to propose a new or improve an existing data structure and pruning strategies.The other is how to compress storage space.At the technical level,it needs to come up with a sketch structure with a small memory and provide approximate results.According to the above analysis,this thesis aims to solve the query response time and compression storage space in the frequent item mining problem and correlation anal-ysis problem in the data streams.Based on the existing data stream mining technologies,this thesis aims to come up with data structures and sketch structures to process the data efficiently and improve the mining accuracy.They are mainly as follows:Finding frequent items in time decayed data streams.This problem on a new stream-ing model based on the time decay is revisited,where the importance of every arrival item is decreased over the time.To address the importance changes over the time,it needs to design an innovative heap structure,which maintains the item order,to improve frequent item mining efficiency.To achieve better accuracy of frequency estimation,this thesis studies a new sketch structure,which can estimate the count of an item with almost no error,to improve frequent item mining accuracy.Finding the hottest item in a data stream.Aiming at a wide variety of query re-quirements,such as monitoring the peak sales records.Existing algorithms cannot be applied to these new requirements.Hence,this thesis explores a new data stream mining problem-the hottest item.To discover the hottest item,it needs to propose an algorithm with an efficient data structure and several pruning strategies to reduce the search space progressively.Ranking lag correlations with flexible sliding windows.Existing lag correlation analysis work focus on two aspects,computing lag correlations on the entire data stream and setting a proper sliding window length.However,the sliding window length is hard to set,which should be set based on the characteristic of data streams,applications,time and queries.Hence,this thesis analyzes the lag correlation which is computed based on flexible sliding windows.To boost the computation,this thesis attempts to employ an efficient data structure to facilitate the query processing.This thesis studies the counting problem(mining frequent items in data streams),the frequency(finding the hottest items in data streams),the lag correlation of data streams(ranking lag correlations with flexible sliding windows in data streams).The research of this paper is only a preliminary attempt and exploration,but there are still many researches that need to be further explored.For example,data stream mining with a changing rate and data stream processing with the Hadoop or Spark.
Keywords/Search Tags:Data Stream Mining, Frequent Item, Frequency Calculation, Lag Correlation, Flexible Sliding Window
PDF Full Text Request
Related items