Font Size: a A A

Research On Relevance Measure In Heterogeneous Information Network

Posted on:2016-10-07Degree:MasterType:Thesis
Country:ChinaCandidate:X F MengFull Text:PDF
GTID:2298330467495223Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer science and network technology, social network analysis has gradually become the mainstream direction in the field of data mining. Currently, social network analysis is mainly based on homogeneous information network, where nodes or edges in relational network have same type. However, with the emergence of a large number of online social media and cyber physical systems, complex network formed by interrelated objects cannot be easily described as homogeneous information network but heterogeneous information network with various types of nodes or edges. Compared with homogenous information network, heterogeneous information network is more complex in its network structure and has more abundant semantic information. And social network analysis in heterogeneous network has possibility in discovering more accurate hidden knowledge. Relevance measure, namely the relevance evaluation of two objects, is the basis of cluster analysis and many other data mining tasks. In this paper, the main research object is relevance measure in heterogeneous information network, and the purpose is in-depth analysis of heterogeneous object processing and semantic relation mining methods in heterogeneous information network through research of relevance measure and its related tasks.Firstly, this paper proposed one novel meta-path based relevance measure algorithm, namely AvgSim, through analysis of superiority and insufficiency of existing similarity measure algorithms. The algorithm can measure the relevance between arbitrary node pairs in heterogeneous information network, and the measurement has symmetry property. Compared experiment results with other measurements on real data sets, AvgSim performs better in both effectiveness and efficiency.Secondly, this paper proposed one quick computation strategy of AvgSim algorithm in massive data. Applying dynamic programming strategy and parallel block matrix multiplication method, parallel AvgSim is realized on Hadoop platform. Experiments on large-scale data set verified the efficiency of parallel AvgSim.Finally, this paper proposed the automatic meta-path discovery strategy. According to the given node pair in heterogeneous network, the strategy automatically discovers meta-paths linking the node pair together and measures the weight of paths. Based on meta-path discovery strategy, this paper also puts forward a strategy of relation prediction in knowledge graph and verified its effectiveness.
Keywords/Search Tags:heterogeneous information network, meta-pathrelevance measure, mapreduce, knowledge graph
PDF Full Text Request
Related items