Font Size: a A A

Research And Implementation On Algorithms Of Frequent Subgraph Patterns Mining Upon Uncertain Graph Data

Posted on:2015-12-10Degree:MasterType:Thesis
Country:ChinaCandidate:S S AiFull Text:PDF
GTID:2348330482952456Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of information technology, many walks of business have accumulated vast amounts of data information, which contains a great deal of knowledge. Graph, as one of the most common data structures, has significant advantages in describing the properties and structures of data. As a result, more and more application areas start to use the graph to describe and store the relationships between data objects. Graph mining has also become a hot issue in the field of database and data mining, which has an important significance in theoretical research and practical application for the reason that it can find the structure characteristics, the patterns and the formation laws of the relationships between data objects from the graph data.However, in practical applications, a large amount of data is uncertain due to the random error and the measurement error in the process of data acquisition, the failure and the delay in the process of data transmission, the inconsistency and the incompleteness in multi-source data integration, data privacy protection and so on. The graph data with uncertainty is called "uncertain graph data", such as the protein interaction network in bio informatics. The protein interaction (the edge in the protein interaction network) may not be real because the reliability of biological experimental technology is not high. The uncertainty of the protein interaction reflects the practical probability of the protein interactions.With the dramatic increase of uncertain graph data, the study of excavating useful knowledge from the rich structures and the semantic information implied in the graph data has attracted more and more attention. However, due to the presence of uncertainty, the traditional mining algorithms upon graphs cannot be directed applied to uncertain graphs, which makes graph mining face enormous challenges. Therefore, several relevant works aimed on frequent subgraph patterns mining in uncertain graph data has been done in this paper, and the main works are as follows:Firstly, on the basis of learning the concept, the function and the procedure of data mining, analyze and summarize the classical algorithms of mining frequent subgraph patterns, laying the foundation for the next step of research.Secondly, in view of the existence of uncertainty in uncertain graph data, we put forward a new storage structure named UADI and preprocess the uncertain graph database. On the basis above, a new algorithm of mining frequent subgraph patterns is proposed. That is to say, use the improved gSpan algorithm to mine frequent subgraph patterns preliminarily under the condition of ignoring the uncertainty in uncertain graphs. Then, in the process of mining frequent subgraph patterns under uncertain graphs, we put forward the probabilistic and priori pruning technologies to further reduce the search space of subgraph patterns.Thirdly, as for the problem of mining frequent subgraph patterns under the large size of uncertain graph database, we propose a horizontal portioning framework and an algorithm for mining frequent subgraph patterns based on partition technology.Finally, a lot of experiments have been carried out to verify the proposed algorithms and the experimental results prove the effectiveness and the efficiency of them.
Keywords/Search Tags:uncertain graph data, data mining, frequent subgraph patterns mining, partition
PDF Full Text Request
Related items