Font Size: a A A

Application Research On Label Null Model And Subgraph Distribution Algorithm

Posted on:2018-07-02Degree:MasterType:Thesis
Country:ChinaCandidate:S Q HeFull Text:PDF
GTID:2348330533463656Subject:Engineering
Abstract/Summary:PDF Full Text Request
Graph mining is an important research field of data mining.With the increasing demand for structured data analysis,graph classification problem in graph mining has become a very important research topic in the field of data mining.At present,subgraph distribution algorithm for graph classification is mainly used in the field of biological and chemical to determine whether substances is carcinogenic and toxic classification problems.With the development of information technology,subgraph distribution algorithm will have a wide application prospect in information science,network intrusion detection fields and social network etc.For subgraph distribution algorithm,the problem to be solved is how to extract more features of classification,to improve the accuracy of graph classification.Based on analysis of the present situation of research on subgraph distribution algorithm,according to existing problems,this paper research as follows:Firstly,the distribution calculated according to the Graphlet model for graph classification,because ignore the label of node which in graph,leader to the classification features too little effect of classification accuracy,in this paper,label null model based on null model is proposed,increase feature of graph classification.The rationality of label null model used in graph classification is proved.At the same time,to quantify label subgraph distribution information,determine the sample size.Use the concept of information entropy,propose information extraction ratio and give the reliability calculation method.Determine sample size according to information extraction ratio as termination condition for algorithm.Secondly,direct calculation of label subgraph distribution need to be repeated several times for graph isomorphism testing,cause high time complexity.To solve this problem,based on label null model,this paper proposes two algorithms.BGLI algorithm used to construct graph index which can reduce the graph search,ESGS algorithm based on the BGLI algorithm for estimating label subgraph distribution and implemented on Spark.Finally,the rationality of label null model for graph classification is verified in experiment.The experiment proved that the number of samples can be determined indirectly according to the information extraction ratio,reduce redundant computation,the label subgraph distribution which extracted by ESGS algorithm as graph classification features to classify accurately.
Keywords/Search Tags:data mining, graph mining, graph classification, null model, subgraph distribution
PDF Full Text Request
Related items