Font Size: a A A

Research On Top-K Relevant Search In Heterogeneous Information Network

Posted on:2015-01-18Degree:MasterType:Thesis
Country:ChinaCandidate:S L BuFull Text:PDF
GTID:2250330431457203Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The world we are living is interconnected. Most of the data objects such as individuals, organizations or groups are interconnected and interactive, which forms a huge, interconnected and sophisticated network. Without loss of generality, information network is constructed. Examples of the information network in the real world are all around and it has become an important component of modern information infrastructure. Nowadays mining on information network or on its specific kinds such as social networks and e-commercial networks has gained extremely wide attentions from researchers in computer science, biology and social science.The current research in information networks can be divided into research in homogeneous or heterogeneous networks according to the difference of networks. Nodes on homogeneous network are of the same entity type, thus the edges on it have identical meaning. Numerous influential algorithms generate in homogeneous networks such as the PageRank and the community detection methods. However most networks in real world are heterogeneous, in which nodes and links are of multi types. For example, network generated from Renren consists of persons, photos, movies, groups and so on. In addition to the friendship between persons, there may be relationships of other types such as person-movie reviewing relationships and person-photo tagging relationships. Heterogeneous information networks are powerful in representing the interactions between different kinds of entities in real world.There have been many research achievements on heterogeneous information networks, and relevance search in it is a basic and crucial operation which is usually used in recommendation, clustering and anomaly detection. Existing relevance search methods focus on objects in homogeneous information networks. In this paper, we propose a method to find the Top-k most relevant objects to a specific one in heterogeneous networks. It is a two phase process that we get the initial relevance score based on the method of pair wise random walk along given meta-paths, which is a meta-level description of the path instances in heterogeneous information networks, and then take user preference into consideration to calculate the weights combination of meta-paths and model the problem into a multi-objective linear planning problem which can be solved with the method of generic algorithm. Besides, to ensure the efficiency, we use graph partitioning and distributed computing to accelerate the searching process. The experiments on IMDB and DBLP dataset show that the method can gain a better accuracy and efficiency.
Keywords/Search Tags:Heterogeneous information network, relevant search, user preference, graphpartitioning, distributed computing
PDF Full Text Request
Related items