Font Size: a A A

Multi-object KNN Query In Heterogeneous Information Network

Posted on:2021-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiuFull Text:PDF
GTID:2518306476953399Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The heterogeneous information network kNN query has a wide range of applications in the field of data mining,such as link prediction,personalized recommendation.Existing related algorithms find the k nearest neighbors of one query object,however,many tasks involve multiple query objects,that is,multi-object kNN query.As a consequence,this thesis focus on the research of multi-object kNN query in heterogeneous information network.This thesis proposes two frameworks: Specific Semantic based Multi-object k-Nearest Neighbor Query(SSM-kNN)and Fusion Semantic based Multi-object k-Nearest Neighobr Query(FSM-kNN).According to the query semantics,SSM-kNN selects a single meta path to calculate the similarity and relevance on the original network.However,in FSM-kNN,both the explicit connections and implicit relationships are utilized.Due to the semantic restrictions,query objects in SSM-kNN must be of the same type,while FSM-kNN has no such restriction.For SSM-kNN,the query meta path can be given explicitly or be obtained by analyzing the query objects when the network is too complex to describe.So,an association analysis based meta path determination algorithm is first proposed to select the most appropriate query meta path.Second,a meta-path-based object influence calculation method is proposed to distinguish the importance of query objects.Third,the similarity measure m-Path Sim and the relevance measure m-Avg Sim are defined.Then,for improving the efficiency of kNN query processing,an adjacency-matrix-based and an upper-bound-based filtering algorithm are proposed.Finally,the experimental results prove the effectiveness of SSM-kNN and filtering algorithms can improve the efficiency by 30% to 90%.For FSM-kNN,this thesis first proposes a network embedding algorithm named Repeat Random2 Vec,which uses meta-path-based repeat random walk and the heterogeneous Skip-gram algorithm to map the network into a low-dimensional vector space.Then,a semantic analysis method is proposed to determine the bias of meta path for the query objects.In the low-dimensional vector space,the feature vector of objects are converted into a centroid vector for distance measuring.For improving the efficiency of query in vector space,a Ball-tree-based algorithm and a Voronoi-based approximate query algorithm are proposed.The experimental results demonstrate the effectiveness of Repeat Random2 Vec and FSM-kNN,and the efficiency is increased by 88% to 90% through optimization algorithms.Finally,a multi-object kNN query system is designed and implemented,verifying the application feasibility of proposed algorithms.
Keywords/Search Tags:Heterogeneous Information Network, Multi-object kNN Query, Meta Path, Specific Semantic, Fusion Semantic
PDF Full Text Request
Related items