Font Size: a A A

Data Density Description Based Data Stream Frequent Pattern Mining

Posted on:2014-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:Q GaoFull Text:PDF
GTID:2268330392969068Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Lots of current Streaming data mining method are developed from the Static dataset based data mining method. And these method inherited the basic idea of Static dataset based data mining method, which is storing the data in the easy controlled memoryand mining on it. So many Streaming data mining methods’ thought is get part ofStreaming data in local machine, and mining on the part of Streaming data which isstored in local machine, the so called window mechanism. But the idea is notcompletely suitable to use in Streaming data mining. That is to say the most slidingwindow, landmarks window based Streaming data mining methods are borndisadvantages, that is they can only depend on the current window. This is inevitable toignore the fluctuation characteristics of Streaming data. There is another disadvantage isbecause storage equipment limits, the size of the window is restricted, and even if theRecession window mechanism which is taken out for solving the problem can’t solvethe question thoroughly.Aims to solving These Shortages, an original method that more suitable forstreaming data mining is proposed. That is a mining method that based on the statisticaldata density distribution characteristics the so called PDB-FIM. The most importantcontent of this paper is as follows:First, how PDB-FIM store and process the information of every high speedarriving stream data.Second, the method of how to keep balance of main store of PDB-FIM is cut setsby the probability density information and the support information.Third, the conceptions of complete information tree and un-complete informationtree are proposed. And the strategy of keep an un-complete information tree and acomplete information tree to solve the store problem is adopted.Last, the method of processing streaming data and generation probabilityinformation from them.This method has the following advantages: less memory requirements, Giveconsideration to the historical data, can detect the frequent now but not frequenthistorical data, sensitive to the changing of streaming data.
Keywords/Search Tags:data density description, data stream mining, frequent item-set
PDF Full Text Request
Related items