Font Size: a A A

Research On The Key Techniques Of Graph Similarity Measure

Posted on:2018-01-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:X WangFull Text:PDF
GTID:1318330542452728Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of science and technology,a mass of graph data comes into being form in the fields of text mining,computational biology,bioinformatics and chemistry.The graph mining method can effectively analyze and mine these graph data,and then understand the valuable information in it.As an important branch of graph mining,the classification of graphs achieve the automatic classification of unknown class chart data via the classification prediction model established by learning the known category graph data.Graph similarity measure plays a significant role in precisely classifying,and is a research emphasis of graph data classification.Meanwhile,the more graph information is measured,the higher its computational and time complexity increase.Therefore,how to involve more graph information,improve the accuracy,and decrease the computational complexity becomes the key point in measuring graph similarity.Currently,the main methods of graph similarity measure are: the Random Walk Graph Kernel(RWGK),the Shortest-Path Graph Kernel(SPGK),the Common Paths Graph Kernel(CTGK)and the Weisfeiler-Lehman Subtree Kernel(WLSK).These methods measure the graph similarity by calculating the number of common paths of two graphs and the similarity of common subtrees.In above methods,the vertex label information are measured,while the connectivity of a vertex with all neighbors vertex are not involed,and the time complexities reach up to O(n3)~ O(n4).Morever,only the similarity between two graphs is measured.Herein,we have performed the researches on improving the classification precision,decreasing the time complexity,and multiplegraphs similarity measurement in graph similarity measurement,the specific works are as follows:The researches on similarity measures between two graphs are as follows:(I)In order to avoid the lots of redundant operations,we propose a method to simplify the calculation in the process of measuring graph similarity.In this simplified method,the redundant operations of matrix multiplication and matrix addition are avoided and thus contributing to quickly obtaining of the initial tickets matrices of the sparse and dense graphs.Experiments show that,compared to the Common Paths Graph Kernel(CTGK),our simplified method avoid the redundancy calculation of element values in initial tickets matrix.Meanwhile,the classification precision will be significantly improved when the data set classification level of the graph is large.(II)With the purpose of measuring more information of edges and improving the accuracy of the graph data classification,two new graph similarity measuring methods based on vertical dimension sequence and hierarchical sequence are proposed.They all convert graph into the generalized tree,and regard this tree as the vertical dimensional sequences or hierarchical sequence,respectively.The in-degree and out-degree informations are involed and the relationship between the neighboring vertex edges is reflected.Experiments show that the vertical dimension sequence method can preserve the path information of the graph,and reflects the superincumbent vertical structure characteristics of the graph,thus improving the accuracy of the graph data classification.As for the hierarchical sequence method,it reflects the hierarchical characteristics of the graph and process the measure with a shorter running time than that of the existing graph similarity measuring methods.(III)Aiming at measuring more path information and improving the accuracy of the graph data classification,a new method converting a graph to multidimensional sequence is proposed.The multidimensional sequence preserves the relationship between the vertices in the graph,and reflects all the path information in the graph.And we can measure the graph similarity by calculating the similarity of the number of all the common subsequences of the multidimensional sequence and the length of the longest common subsequence.Experiments show that this method measure more path information than existing graph kernel methods,contributing to the higher classification accuracy.The research on measuring similarity of multiplegraphs is as follows:(IV)For the high complexity of measuring similarity of multiplegraphs,a method that can measure similarity of multiplegraphs is proposed.In this method,the multiple graphs are converted into multiple sequences,and the matching point of the multiple sequence is calculated in the multidimensional matrix.Meanwhile,the heuristic algorithm is adopted to calculate the number of all common subsequences on the matching point,and then the multiplegraphs similarity measurement is completed.Through algorithmic analysis and theoretical proof,this method not only can measure the similarity of any number of graphs but also reduce the time complexity from O(n~3)to O(n~2).
Keywords/Search Tags:Graph similarity measure, Generalized tree, All common subsequences, Multiple sequences, Heuristic algorithm, Dynamic time warping, Time complexity
PDF Full Text Request
Related items