Font Size: a A A

Related Mining Methods Research Based On Data Streams

Posted on:2016-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:F F ZhouFull Text:PDF
GTID:2308330509450934Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The advances in technology make it easier to generate and collect data, how to get useful information from large amounts of data to guide some production process and behavior of human beings is becoming a significant research in data analysis. As an effective means of data analysis, data mining can discover interesting knowledge from large amounts of data,while the emergence of data stream has put forward higher requirements in the related processing technology. Different from the traditional data, data stream has a series of new features such as liquidity, unlimited and high speed, which decides that the mining algorithm must be high speed increment, and also can use the limited memory to get mining results effectively within a certain error range.This paper mainly did some research and analysis based on a wide range of literature review to several basic problems of data stream analysis and processing, and the main contents focus on the following points.Firstly, based on data stream environment, an effective algorithm DSM-Miner for mining maximal frequent patterns was proposed. It uses Transactions Sliding Window to specify the number of transactions in each treatment process, and distinguishes and treats the old and new transactions by the way of decaying, meanwhile it proposes a sliding window maximal frequent pattern tree SWM-Tree based on the improved classical FP-Tree structure, incrementally maintains and stores patterns by updating SWM-Tree dynamically.In the process of mining maximal frequent patterns, the algorithm uses the corresponding node of SWM-Tree as the root of an enumeration tree and uses this enumeration tree as a search space. In addition, the algorithm also adopts appropriate pruning operations,calculation method of bit items group and "depth-first" search strategies. By testing several different data sets, the validity of time complexity, space complexity and scalability for DSM-Miner algorithm was be proved.Secondly, an effective algorithm DSCBDG for clustering over data stream was proposed by using the method of Density Grid. The DSCBDG references to the Two-Tier structure ofthe CluStream which is a classical algorithm, and divide the clustering process into two sub-processes: online and offline. The online part uses Density Grid to maintain the summary information of data stream in real time, and updates the grid regularly by mesh dynamically.Then uses the model of modified pyramid-time to maintains grid information in real time on the moment of snapshot. The offline part firstly takes out the grid table in the corresponding snapshots, then uses the variable density threshold to judges and analyzes the grid, getting the initial tiny clusters by merging adjacent grids. Finally, uses the clusters as vertexes, and distance between clusters as edge-weighted to build a connected graph, then generates minimum spanning tree. The final clustering result is obtained by retaining the edge of the shortest distance.
Keywords/Search Tags:Data mining, data stream, maximal frequent patterns, Density grid, Clustering
PDF Full Text Request
Related items