Font Size: a A A

Research On Graph Kernel Based On Unsupervised Semantic Encoding Scheme

Posted on:2016-06-30Degree:MasterType:Thesis
Country:ChinaCandidate:L PengFull Text:PDF
GTID:2308330461467711Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
There are generally two major ways to represent objects in a formal way, statistical and structural approach, respectively. Objects are represented as feature vectors in the statistical representations and graphs in the structural representations. Mathematical operations in vector space are formally defined, so they can be computed conveniently. However, the vector conveys little inherent structure information and all vectors in a given application have to preserve the same length. The graph-based approach can describe the properties of an object as well as the relationships between different parts. Besides, graphs are not constrained to a fixed size. However, mathematical wealth of operations available in vector space is unreachable in the graph structure. The complexity of many algorithms will significantly increase when using structural representations. For instance, the computation of similarity between two feature vectors takes only linear time with respect to the length of vectors, but the analogous operation on graphs usually needs more than exponential time, because the isomorphism cannot be completed in polynomial time.Graph-structured data has increasingly started to emerge in various application areas where graphs are the natural data structure to model networks. As a result, comparison of graphs becomes problematic. Traditional approaches for graph comparison face the same problem:either increase the runtime for large graphs or simplify the representation of graphs that ignore part of their topological information. Graph kernels are one of the most recent approaches to graph comparison. Kernel methods allow one to extend basic linear algorithms to complex non-linear ones so that algorithms available in vector space are also appropriate for graphs.This paper studies the relevant graph kernels defined on different substructures of the graph firstly. They can compare two graphs effectively. However, there are a few limitations in the traditional graph kernels:(1) each node usually has a single label in existing graph kernels. Nonetheless, most time a node has multiple attributes or belongs to several categories; (2) semantic information of the link is often ignored. The links basically describe the structural information of protein or the bonds between the atoms of a chemical compound, without semantic tags; (3) most existing graph kernels need cubic time at least, such as the random walk kernel. They cannot be scaled up to large graphs with many nodes; (4) when the number of node types becomes huge, the similarity between two graphs tends to be zero. This phenomenon is paid little attention.Based on the deficiencies, this paper then proposes two kinds of semantic-based graph kernels, graph kernel based on the topic model and neighborhood hashing, graph kernel based on statistical language model and Weisfeiler-Lehman testing, respectively. The former utilizes topic model to estimate the topic distribution under documents and the word distribution under topics. The topics indicate the latent semantic features in the documents. Graph characterizes the spatial relationship between words. And the neighborhood hashing ensures that the similarity between two graphs can be calculated efficiently. The latter is based on statistical language model which connects two words according to similar contexts so as to obtain semantic word embeddings. Analogously, graph structure represents the spatial relationships in the documents. The computed kernel matrix between graphs reflects the similarities between the documents, eventually. The effectiveness and efficiency of these approaches are evaluated by the text categorization task on two public corpora. Empirical results demonstrate that the proposed methods can achieve better performance than the traditional topic model-based classification approach and can be performed faster than the shortest-path graph kernel...
Keywords/Search Tags:Knowledge Representation, Topic Model, Langnage Model, Graph Kernel
PDF Full Text Request
Related items