Font Size: a A A

Research On Key Technologies Of Multi-Relational Graph Mining Based On Tensors

Posted on:2014-05-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:X T LiFull Text:PDF
GTID:1108330422490350Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of technologies, the interactions or relationship between entities from different areas can be measured or recorded, e.g., the protein-protein inter-actions in biology, the communications between users in social networks, the connections between computers in the Internet, and water pollution dispersion in environmental pro-tection. As these interactions between entities are able to be represented as graphs, it leads to a stirring research interest on graph mining technologies in academic. Currently, researchers mainly concentrate on single-relational graph, which models the interactions between one type of entities with only one relationship. However, the interactions be-tween entities in reality are generally very complex, including entities of different types, interactions of different types and even hyper-interactions between more than two entities. For example, in a social media network, it may include users, stories and comments as different types of entities, where friend relationship, comment relationship, and group re-lationship are involved as different types of interactions. The group relationship indicates the hyper-interactions between more than two users. Complex networks of such kind need to be characterized using multi-relational graphs, where different types of nodes and edges are included and an edge can connect more than two nodes.Multi-relational graph mining is very meaningful, which may be able to help people discovery the valuable information hidden in complex interaction systems. For example, mining the importance of nodes in multi-relational graph may be beneficial to Web link analysis, discovering core members or hub members in social networks and evaluating the influence of users in Twitter etc; mining communities in academic multi-relational graph may help to discovery authors, papers and keywords that interact strongly; mining modules in biological multi-relational graph may help to discovery the organism in charge of a particular biological function. In the literature, however, there are rarely research results and mining technologies regarding to multi-relational graph.Therefore this dissertation focuses on the research problem in multi-relational graph-s. In this dissertation, two interesting problems have been considered, i.e., the evaluation of node importance and community discovery. Based on tensors, a series of algorithms are proposed to solve these two problems. The main contributions of this dissertation are as follows: 1. A tensor-based Markov chain model is proposed, which approximates the limiting probability distribution of a higher-order Markov chain by using tensor product and transfers its solving problem into a tensor equation problem. An iterative algorithm similar to Power method is proposed for solving it. Theoretical analysis about the existence and uniqueness of its solution have been given, as well as the convergence of the iterative algorithm. Experimental results have validated the effectiveness of this tensor-based Markov chain model. This model servers as a theoretical foundation in this dissertation, which connects all research problems concerned.2. MultiRank algorithm and HAR algorithm are proposed for computing query in-dependent node importance in multi-relational graph. Both algorithms originate from the proposed tensor-based Markov chain model, extending respectively the ideas of PageRank algorithm and HITS algorithm to multi-relational graph for e-valuating node importance. It has been shown the existence and uniqueness of so-lutions of MultiRank and HAR, as well as their convergence. Experimental results on DBLP data set have demonstrated that MultiRank and HAR provide reasonable and effective importance to nodes and relations in multi-relational graph.3. MultiVCRank algorithm is proposed for computing query dependent node impor-tance in multi-relational graph. This algorithm combines the idea of random walk with restart with the tensor-based Markov chain model and incorporates the query input into Markov chain in a smart way. Theoretical analysis have shown the existence of its solution, and shown that MultiVCRank algorithm converges to a unique solution when restarting probability satisfy some suitable condition. Ex-perimental results on TREC data set have shown that MultiVCRank outperforms state-of-the-art methods significantly in retrieval performance.4. MultiComm algorithm is proposed for finding community structures in time inde-pendent multi-relational graph. With a multi-relational graph represented as sev-eral tensors, MultiComm establishes a Markov chain based on them and considers its limiting probability distribution as proximity between nodes. The community structure is discovered by iterative adjoining new node with largest proximity val-ues to current community. Theoretic analysis results have shown that MultiComm converges to a unique solution under suitable condition. Experimental results on synthetic data and SIAM/DBLP data have shown the performance of MultiComm is better than that of state-of-the-art methods.5. MultiFacTV algorithm is proposed for finding community structures in time de-pendent multi-relational graph. The algorithm extends the objective function of tensor factorization by introducing a TV regularization term on decomposition vectors for time dimension, so as to preserve the time properties of communi-ty structures. Theoretic analysis results have shown MultiFacTV converges to a local minimum solution. Experiment results have shown that MultiFacTV out-performs Multifac and EDISA algorithms, and discoveries meaningful biological modules in time-dependent multi-relational graphs constructed from Arabidopsis thaliana data, Yeast data and Homo sapiens data respectively.In summary, this dissertation concentrates on the problems of computing node im-portance and discovering community structures. It establishes a tensor-based Markov chain model as a theoretic foundation and a main line of all research content. The research in this dissertation may improve the multi-relational graph mining techniques further, and at the same time it is promising to bring new research areas in the fields of tensors and Markov chain.
Keywords/Search Tags:multi-relational graph, transition probability tensors, Markov chain, tensordecomposition, node centrality, community discovery
PDF Full Text Request
Related items