Font Size: a A A

Top-k Selection Theory And Its Applications In Graph Data Processing

Posted on:2019-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z Y XueFull Text:PDF
GTID:1368330548955213Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sorting is a basic computational task in computer science.As the deformation of sorting problem,the Top-k Selection Problem aims to find out k objects that most meet the requirements from large-scale objects.The Top-k Selection Problem has a wide range of applications,such as recommendation system,database,information retrieval,anomaly detection,network analysis,and so on.In different application scenarios,Top-k selection methods are different,but most of them follow two steps:(1)Quantization,and(2)Sorting.In other words,we first quantify the objects in the problem(i.e.,the priority of the object to be selected),and then select the best k solution according to the quantized index.In different application scenarios,it is difficult for us to model them uniformly and to solve the Top-k problem.In the theoretical study,generally we can only focus on the latter(Sorting stage).That is,assuming that the object has been quantified,then the problem is abstracted as a Top-k maximums selection problem.In specific applications,we tend to focus on the former(Quantization stage).That is,in view of the characteristics of the application,the degree of correlation between the object and the target problem is calculated by a specific method.In theory,we make thorough theoretical analysis of the common problems in large-scale Top-k selection,and put forward a new algorithm.In application,based on the typical social network data,we put forward a new Top-k recommendation algorithm to improve the recommendation performance.Moreover,for typical large graph data,we put forward a new Top-k selection problem and the corresponding algorithm.Specifically,this article mainly has three contributions:(1)In the theory of Top-k selection algorithm,the best algorithm at present is Partial Quicksort,but it is difficult to be effectively parallelized,and the efficiency of the algorithm is not stable.Based on Divide-and-Conquer,we propose a new Top-k selection algorithm,DC-Top-k and its parallel version.DC-Top-k decomposes the Top-k problem into k top-1 problems.Based on that,the difference between the k top-1 results and the top-k result is analyzed,and the final top-k result is further obtained.The algorithmeffectively improves performance and alleviates defects of existing algorithms.Moreover,we further put forward the optimized version of the algorithm,which makes top-k selection result closer to the practical application.(2)We study a typical application of the Top-k selection problem ?? the Top-k recommendation in social networks.Specifically,we propose a new algorithm FRFB(Following Recommendation by Following Behaviors),which is based on the following behaviors themselves to predict the following behaviors in the directed social networks.FRFB only needs to make use of the topology structure of the social network,that is,the“follower-followee” relationship,to get the potential followees of the target users.Experiments on real social network data sets show that the efficiency of the algorithm is outstanding and the recommendation effect is obviously better than the mainstream followee recommendation method.(3)In view of an important problem in the field of big data ?? sub-graph matching problem,we propose the problem and its method for the top-k key node query w.r.t.the sub-graph matching.Unlike the general top-k query problem,we aim to find out k nodes that make the matching sub-graphs in G which are covered by the k nodes as more as possible.This is a problem of the maximum coverage of sub-graph matching,which belongs to the NP-Hard problem.We study the problem based on greedy algorithm and give an intuitive solution.Considering the characteristics of top-k problem,we propose an improved and more efficient greedy algorithm.Experiments on real social network graph data set(Twitter)show that the related results represent the key nodes that can better reveal the essential characteristics of the query graph in the data graph G.
Keywords/Search Tags:graph data, sub-graph matching, key nodes, followee recommendation, top-k selection, Partial Quicksort
PDF Full Text Request
Related items