Font Size: a A A

Finding The K Nearest Neighbours In Heterogeneous Information Networks

Posted on:2019-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2428330596960911Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technologies and applications,a vast amount of complex data is generated.The data are often multiple-typed and there are diverse relationships among the objects.Heterogeneous information networks(HIN)are widely utilized as data model.The k nearest neighbo urs problem(kNN)is an important method for data analysis.This thesis investigated some key issues for effectively and efficiently answering kNN queries on HINs.Due to the heterogeneity of objects,HIN kNN queries are categorized into two types according to whether the query object and the target objects are of the same types: Monochromatic k nearest neighbours(MkNN)and Bichromatic k nearest neighbo urs(BkNN).MkNN finds the k objects that are the most similar to the query object.BkNN finds the k objects that have the highest correlations to the query object.In BkNN,the results are usually of a specified type which is different to the query type.The structure connectivity is used by the existed MkNN and BkNN algorithms to measure the similarities and correlations of objects.The similarity or correlation between two objects is 0 when they are not connected.Therefore,in many scenarios,the structure connectivity is not enough to express the semantics of object similarity or object correlation.This thesis proposed several techniques and algorithms for comprehensive,flexible and efficient kNN query processing on HINs,in which both the structure connectivity and the entity similarity are taken into account.An augmented heterogeneous information network(A-HIN)was defined by employing the entity similarities into HIN.An algorithm for A-HIN construction was introduced,in which the related entities are selected according to the user specified query semantics.The Entity Similarity Extended PathSim,abbreviated as ES-PathSim,was proposed to measure the object similarities.An iterative algorithm for simila r it y measurement was proposed to effectively measure the object similarities even the objects are not connected.An upper-bound filtering algorithm was developed to improve the performance of ES-PathSim based MkNN querying.The experimenta l results showed the effectiveness of ES-PathSim and the iterative algorithm,the efficiency of upper-bound filtering was evaluated as well.The Entity Similarity Extended AvgSim,abbreviated as ES-AvgSim,was proposed to effectively measure the object correlations by combining both the structure information and the entity similarity semantics.A threshold-based optimizat io n algorithm was proposed to speed up the ES-AvgSim based BkNN queries.The experimental results showed ES-AvgSim is effective in expressing rich object correlation semantics and the threshold-based BkNN algorithm is efficient in online query processing.
Keywords/Search Tags:Heterogeneous Information Network, kNN Query, Similarity Measure, Correlation Measure
PDF Full Text Request
Related items