Font Size: a A A

Research On Similarity Search Algorithm Of Heterogeneous Information Network Based On Meta-graph

Posted on:2020-10-29Degree:MasterType:Thesis
Country:ChinaCandidate:Y H JiangFull Text:PDF
GTID:2428330623951388Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Data mining in information networks has been extensively studied.Over the past,some researches in information network mining are mainly designed for single-type objects and links,which is called homogeneous information networks.It is determined that the above methods based on homogeneous information networks do not apply to the processing of heterogeneous information networks(HINs)consisting of multi-type objects and links.There is no doubt that most real-world networks are basically composed of complex heterogeneous ways.The similarity measure is regarded as the basic task of other important mining tasks in HINs.It has been widely used in similarity search,information retrieval and machine learning algorithms.In HINs with rich semantics,similarity measures are calculated based on relational semantics.However,similarity measures in most HINs are based on a single semantics nowadays.Therefore,how to capture the complex semantic relationship in these networks is one of the most challenging tasks that directly affect system performance.In addition,it is also an important task to design the similarity measure between objects based on complex semantics to measure the correlation between them.Since applications based on the similarity measure should be applied in real life,the degree of user satisfaction is a factor that must be considered.Therefore,in the case of fully considering the semantic relationship,it is necessary to fuse other information in the network to measure the similarity between objects.This article focuses on these issues,and the main work is as follows:(1)The similarity search algorithms in most HINs only consider single relational semantics.For HINs,a meta-graph based similarity algorithm GraphSim that captures complex relational semantics is proposed to compensate for the above shortcoming.The algorithm measures the similarity between objects in the network by matrix multiplication.First,For a given query object,the algorithm obtains the graphcount of candidate objects that are of the same type as the query object and that are connected by the meta graph through online calculation of the relation matrix.The graphcount is the total number of meta graph instances corresponding to the meta graph.Then,the similarity values between objects are calculated by the GraphSim metric.Finally the first k objects that are most similar to the query object are returned.Since it is an online connection matrix,the calculation time is large.Therefore,according to the unconnected properties of the object in the matrix,the pruning algorithm GraphSim-pruning is proposed in order to improve the computational performance.Experiments verify that GraphSim performs better than the meta-path based similarity search algorithm.(2)The meta graph based similarity search algorithm does not consider other information in HINs,and only pays attention to the complex relationship semantics between objects.This dissertation further proposes a meta-graph based similarity search algorithm GraphSimExt algorithm that combines external information support.The algorithm not only considers the number of graphs between objects of the same type,but also considers the external support information of the metagraph,namely,the information of the object itself.The algorithm first calculates the graphcount corresponding to an object under a given meta-graph,and then calculates the similarity between the objects passing through their own information.After integrating their common similarities,the GraphSimExt algorithm can obtain the final similarity including both semantic information and object feature information.Due to the different emphasis of the similarities between computing objects,the experiment proves that the GraphSimExt algorithm performs better than the GraphSim algorithm in terms of sorting quality and clustering accuracy,and also superior to the similarity measure algorithm based on meta path.
Keywords/Search Tags:heterogeneous information network, similarity search, External Supports information, meta graph
PDF Full Text Request
Related items