Font Size: a A A

Multi-layer Graph Analytics

Posted on:2020-10-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:R ZhuFull Text:PDF
GTID:1360330590972972Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years,graph has been used to represent and manage data in more and more domains.Data represented by graphs is called graph data.Graph data analytics are able to discover useful knowledge such as structural features,frequent patterns and evolving principles and thus of great significance in both academic research and real-world applications.With the in-depth research,people realizes that graph data often contains multiple types of relationships between data objects.For example,social network data contains networks in different social medias,and traffic network data is composed by networks formed by different traffic means.This type of graph is called multi-layer graph,where each layer contains all relationships between data objects in the same type.Multi-layer graph analytics can find accurate and reliable knowledge with higher value.However,multi-layer graph analytics face two challenges: On one side,the computation semantic defined on single-layer graphs is not appliable on multi-layer graphs,which makes the computation semantic on multi-layer graphs to be more complex;On the other side,analyzing multi-layer graphs involves the computation tasks on multiple layers,which makes the inherent computation complexity of analyzing problems to be higher.Existing work on analyzing multi-layer graphs have inherent limitations in both computation semantic and algorithm design,which do not provide good solutions to address some problems in analyzing multi-layer graphs.In this thesis,we conduct a systemic research on multi-layer graph analytics by applying the theories,techniques and methods in data analysis.This thesis considers the multi-layer graphs with and without probablities at the same time.From the four important properties of graph structures,namely density,reliability,diffusibility and similarity,this thesis conduct an in-depth research on a series of important problems in analyzing multi-layer graphs.The main contributions of this thesis are summarized as follows:1.This thesis investigates the diversified dense subgraphs extraction problem on multi-layer graphs.This problem is important in protein complex examination and community detection.Based on the common multi-layer graph model without probabilties,this thesis proposes a new dense subgraph notion d-Coherent-Core(dCC for short),and designs two efficient search algorithms with 1/4-approximation ratio to address this NP-Hard problem.Our proposed algorithms are better than existing quasi-clique-based algorithms in terms of both result quality and execution time.The d-CC notion simultaneously considers the density measure and support measure,which attains the uniqueness property,the hierarchy property and the containment property.The bottom-up and top-down search algorithm apply efficient search strategies and pruning methods,which are suitable for small and large support parameter,respectively.The experimental results on real-world datasets show that bothe the bottom-up algorithm and the top-down algorithm are efficient and accurate.2.This thesis investigates the top-k vertex reliability search problem on multilayer graphs.The problem is of great importance in communication networks,which is more adaptive than the threshold-based reliability search problem.This thesis presents a multi-layer graph model with probabities on layers and a new computation framework for analyzing multi-layer graphs,called sharing computation,which can utilize the overlaps among different layers in the multi-layer graphs to reduce search cost and improve the algorithm efficiency.Based on this,this thesis designs the exact sharing BFS algorithm and the randomized sharing BFS algorithm to address the top-k vertex reliability search problem.The experimental results on real-world datasets show that the exact sharing BFS algorithm is very efficient and scalable and the randomized sharing BFS algorithm is very accurate.3.This thesis investigates the influence maximization on multi-layer graphs,which is widely used in viral marketing and public opinions control.To characterize the graph data in the influece maximization problem,this thesis presents a multi-layer graph model with probabities,which is able to represent multi-layer graphs generated by the uncertainties on edges.Upon the drawbacks of existing algorithms,this thesis designs an influence maximization algorithm which simultaneously attains high time efficiency,high result quality,low memory footprint and high probability robustness.This algorithm has linear time and space cost.It utilizes a score estimation function with high quality and a score updating method with high efficiency.On real-world social networks,the proposed algorithm exhibits good performance and high scalability.4.This thesis investigates the SimRank vertex similarity measurement on multi-layer graphs.This problem is the foundation of many applications such as recommedation system and entity resolution.Based on the multi-layer graph model with probabities,this thesis proposes a rigorous definition of SimRank by following its the possible world semantic,and design efficient and accurate algorithms to compute the SimRank similarity value between vertices.To lay foundation of the SimRank vertex similarity measure,this thesis proposes the definition of random walks on multi-layer graphs,which is rigorously proved to satisfy the Markov's property.Meanwhile,this thesis designs efficient algorithms the compute the random walk probabilties.The experimental results on real-world datasets show that the proposed SimRank computation algorithms are efficient and accurate,and the proposed SimRank vertex similarity measure can obtain better results than tranditional measures in real-world applications.
Keywords/Search Tags:multi-layer graph analytics, diversified dense subgraphs extraction, top-k reliability search, influence maximization, SimRank vertex similarity measurement
PDF Full Text Request
Related items