Font Size: a A A

Research On The Frequent Graph Mining Algorithm For Graph Classification

Posted on:2014-09-08Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2268330392964277Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a large number application datas generated from the practical applications,peoples are more and more interested in the informations hidden in the datas.The data mining has become a hot problem people is researched on because that the storage and analusis technologies for the large data set is developed rapidly.As a simple data structure, the graph can be used to describe things and the relationship betiten them.Due to the graph applied on lots of application fields,such as bioinformatics,chemical and the social network,many people pay attention to the graph-based data mining named graph mining.The main content of this article isto integrate all aspects of the theory knowledge to analyse and research on the pretreatment of the graph data set and the frequant subgraph mining mathod.Firstly,in this article it describes the principle of the graph sampling algorithm and introduces the Markov Chain Monte Carlo Methods,the Metropolis algorithm and the simulated annealing the graph sampling algorithm is involved in.According to the feature of the frequent graph,it combines the graph sampling algorithm with the mathod deleting the infrequent edges.In this process the graph scale threshold is proposed and it also deletes infrequent edges from the graph in the graph data set in the first time and samples the graph in the second time.It gets subgraph samples that the size of the graph data set is compressed and the storage space is saved.Secondly, merging the isomorphic graphs in the pretreatment process of the graph data set is proposed.In a large size graph data set,there must be a lot of isomorphic graphs.The existence of these isomorphic graphs constrains the efficiency of the frequent graph mining seriously.In this article,it merge the isomorphic graphs after the graph sampling.This strategy saves the storage space of the graph date set.And to reach this purpose,it improve the storage structure of the graph.The new storage structure record the graph frequence in the whole graph data set.Finally,it reports a new frequent subgraph mining algorithm named FBS algorithm.It analyse on the characteristic of the feature selection result set in detail.Based on the analysis it inserts constraint conditions related to the graph frequence and the size of the graph into the gSpan algorithm.So the FBS(Feature based frequent Subgraph mining) algorithm is proposed based on the improvement of the gSpan algorithm.The purpose of this algorithm is to get a be frequent subgraph result set that can applyon the feature selection better.In this article,it tests the experiments and prove that the methods it proposes are effective and practicable.
Keywords/Search Tags:data mining, graph mining, pretreatment, graph sampling, frequen subgraph, feature selection
PDF Full Text Request
Related items