Font Size: a A A

Research On Frequent Pattern Mining Algorithms For Uncertain Data

Posted on:2017-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:H R LiuFull Text:PDF
GTID:2358330482491376Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of all kinds of Internet technologies, the network will produce a variety of data in the practical application. For example, online retail entity data generated in the supermarket, wireless sensor networks to collect data and GPS positioning system to obtain the geographic information data and so on. In the face of so much data, the key question is that how to deal with the huge amounts of data stored in the database and application. Therefore the theories and technologies of data mining are introduced. But in these huge amounts of data, a lot of data is incomplete or uncertain. So there is a new research hotspot that found interesting knowledge and the contents from the uncertain data.In this paper, the algorithms of mining frequent patterns for uncertain data as a new research object. Firstly, this paper introduces the key technologies of data processing, such as database technology, data mining technology and summarizes the data mining frequent pattern; Secondly, this paper introduces the theory and the technical knowledge of uncertain data mining, including uncertain data, the theoretical model of uncertain data and the algorithms of mining frequent patterns for uncertain data. Finally, this paper puts forward two effective algorithms of frequent pattern mining algorithm for uncertain data.In this paper, there are three aspects included in the main work:(1)To study the data structure of uncertain data frequent pattern mining algorithm, and design an improved data structure that was based on tree structure of uncertain data frequent pattern mining algorithm.Data structure is the way that to store and organize data by computer. The data must be stored in the computer. The data storage structure is the realization form of the data structure. So the data structure which has the rigorous and reasonable logic structure can directly affect the efficiency of the algorithm.Based on the characteristics and manifestations of uncertain data, this paper optimizes the existing data structure for the uncertain data frequent pattern mining algorithm that is based on tree structure and designs an improved data structure that is oriented by uncertain data frequent pattern mining algorithm, which is called spanning tree structure of the head table. The processing is that adds a dynamic array of variable-length in the first table which can compress the memory space when construct the frequent pattern tree.(2) To study the algorithms mining frequent patterns for uncertain data based on tree structure and design an efficient uncertain data growth frequent pattern growth algorithm.The algorithms may be determined based on the specific data structure on the basis of the existing research for uncertain data frequent pattern growth algorithm, this paper put forward an improved growth algorithm of frequent pattern for uncertain data. The algorithm can be constructed uncertain data frequent pattern tree, at the same time constantly updated header table that is used to save all of the nodes and the corresponding set of expectations array. When frequent pattern tree is built, we can get the desired probability of frequent itemsets by traversing the array instead of by traversing the tree. Finally, the algorithm can not only reduce memory space that was occupied by algorithm, but also improve the efficiency of uncertain data mining frequent itemsets.(3) To study the algorithm of mining frequent pattern based on tree structure from uncertain data streams and design an algorithm of frequent pattern growth model for uncertain data stream based on sliding window.The data stream has the characteristics of real-time and unbounded. With the advent of data stream, the outdated data needs to be disposed of as soon as possible because of the limited computer memory, otherwise it will lead to memory overflow. And with the data stream continuously flows into the memory, some infrequent itemsets may become frequent itemsets. Therefore, according to the characteristics of the data stream in this paper, this paper uses the sliding window model and puts forward a kind of frequent-pattern growth algorithm for uncertain data stream based on sliding window model. When the latest data in the data stream reaches a certain scale, we adopt the incremental mining method and use the batch processing mode, which will be store the middle of the mining results in the summary data structure of the header table. Data in the window will continue to change as data continues to come. That is the new transaction continually been added, at the same time, the old transaction will be removed from the sliding window. Finally, we can get all probability frequent itemsets in the data stream through the array uncertain data stream.
Keywords/Search Tags:data mining, uncertain data, possible world model, the probability of frequent itemsets, sliding window model
PDF Full Text Request
Related items