Font Size: a A A

Uncertain Data Frequent Pattern Mining Algorithms

Posted on:2013-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2248330377953563Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
By the technology fast developing, such as sensor net, RFID, privacy protection, these technologies are widely researched and used, which also bring a lot of uncertain data. In the past, people always try to deal with the uncertain data by dealing with the certain data, which normally cannot get the right result, so exploring the right data mining arithmetic for dealing with the uncertain data is becoming more urgent.The paper firstly summarizes the reason and representing format of uncertain data, and briefly introduces the uncertainty method, on which the uncertain data model was established, several normal uncertainty arithmetic are summarized. Uncertain data are more complex than certain data, so it is special on handling method. By far, some arithmetic methods are limited on both usage scope and efficiency. Therefore, the paper divides the data to structural data, semi-structural data and non-structural data according to data classification. There are different methods to deal with these3different data, structural data and semi-structural data are the stress point of the paper.The major contents of the paper are as following:(1) Relational data is a kind of typical structural data, which is widely applied in our work and life. This kind of data is perceptual compared to other data with the processing of it is much easier. The uncertain relational data is commonly found in daily life, but traditional data mining algorithm cannot handle this kind of problem. And user usually require to mine information that meets their demand. Currently the classic algorithm for uncertain with data frequent option based on restraints-U-FPS has been proposed. But U-FPS needs to build a frequent-mode tree which may consume a lot of memory when the processing data is large enough. And it also invokes a lot of recursive functions to complete the mining which cause degradation in performance.To solve this disadvantage, we propose an improved algorithm based on uncertain data constraints which is called UC-Eclat mining algorithm. This algorithm does not need to build a frequent-tree while it calculates supporting degree by means of getting the intersections in vertical mode of the database. It is proved to be more effective and efficient in the experiment.(2) Graphics data is a kind of semi-structural data. Since graph is more suitable for describing complex data and the relationship between data, more and more science and technology fields describing complex data by graph structure. Currently the existing classic algorithm for frequent sub-graph mining-DFS is suggested by former scholars, but the algorithm cost enormous searching space, which causes the execution of algorithm less efficient. In order to solve the disadvantage, we propose a cut out strategy for sub-graph searching space, with which searching space has been much preserved. A theory based on database partition has also been proposed from which EDFS algorithm is derived. EDFS further cut off the searching space compared to original DFS. EDFS has been proved to be more effective and efficient by the experiment.
Keywords/Search Tags:data mining, uncertain data, expected support, frequent pattern, structured, semi-structured
PDF Full Text Request
Related items