Font Size: a A A

Similarity Search On Weighted Heterogeneous Information Networks

Posted on:2021-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:J X WuFull Text:PDF
GTID:2480306476953419Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Weighted Heterogeneous Information Network(WHIN)is an extension of the graph model.It is capable of expressing heterogeneous semantics in many complex applications.In the era of big data,similarity search is one of the most popular research focuses in data management.This thesis exploited how to search similar WHINs.Graph Edit Distance(GED)is commonly used to measure the similarity of graphs.However,GED focuses only on unweighted graphs,and calculating GED is an NP-Hard problem.We first extended the GED notion to measure the similarities between WHINs.Then,the star structure mapping distance and the weighted metapath mapping distance were proposed to approximate the WHIN edit distance.The upper bounds and lower bounds were used as filters for searching similar WHINs.The experimental results verified that the filtering algorithm could improve the efficiency and effectiveness of similarity search on network data with explicit structural features.For a large scale WHIN,this thesis proposed to represent it with a feature vector,which is composed of a structure vector and a content vector.The similarity between two WHINs was then measured by the distance between these two feature vectors.The experiments showed the feature vector based similarity search performed well in terms of accuracy,efficiency and scalability.In some practical applications,neither edit distance based similarity measure nor the feature vectors based similarity measure can fully express the similarly semantic between two WHINs.This thesis proposed a feature structure based similarity measure method,which transforms a WHIN into a sequence of feature structures.Feature structure was defined as a basic structural semantic within a given WHIN.A weighted matching based feature structure sequence similarity algorithm was proposed to measure the similarities between WHINs.Experiments showed that the feature structures proposed in this thesis expressed the real semantics well.And the feature structure based similarity search algorithm had better performances on WHINs with strong semantic characteristics.
Keywords/Search Tags:Weighted Heterogeneous Information Network, Graph Edit Distance, Mapping Distance, Feature Vector, Feature Structure, Sequence Similarity
PDF Full Text Request
Related items