Font Size: a A A

The Research And Implementation On Paralleling Relevance Measure In Heterogeneous Information Network

Posted on:2018-10-23Degree:MasterType:Thesis
Country:ChinaCandidate:J W HuFull Text:PDF
GTID:2348330518996428Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Recently research based on heterogeneous information network(HIN)has gained more and more attention and many research tasks such as clustering, classification, recommendation, have been exploited in HIN.HIN is an information network which consists of various types of nodes or edges. HIN's complex network structure and abundant semantic information make it a better way to model the real world. Relevance measure based on HIN is the basis of cluster analysis and many other data mining tasks. Many methods have been proposed to solve the similarity measure between nodes in HIN. HeteSim is one of the most representative methods in this field, it's a pair-wise random walk (RW)model.At present, all computation methods about HeteSim are based on single node, with the explosive growth of information network ,traditional single node computation can no longer support quick computation of HeteSim on big data, there are fast demand in parrallizing HeteSim. In this paper, we research on the parallel algorithms for HeteSim, and implement the parallel algorithms based on Spark.Firstly, we introduced parallel algorithms for HeteSim based on the matrix multiplication, the key of which is the parallel method for matrix multiplication. Traditional parallel methods needs large memory and network communication .To alleviate these problems, we designed three improvement methods for matrix multiplication. Further, we implement parallel algorithms for HeteSim based on improved methods.Experiments conducted on real datasets validate the effectiveness of our method for HeteSim compared with the state-of-art parallel methods.Secondely, with consideration of the approximate calculation method of HeteSim,we designed a parallel method for HeteSim based on Monte Carlo simulations. Experiments show that this method can have better performance over previous parallel algorithms for HeteSim with little loss in accuracy.Finally, based on parallel algorithm for HeteSim, we implement parallel algorithms for semantic recommendation methods in HIN.Furthermore, we designed a semantic recommendation prototype system in HIN. It can provide different recommendations based on different algorithms, and users can get personal recommendation explanations based on the rich semantic information in HIN.
Keywords/Search Tags:heterogeneous information network, relevance measure, parallelization, HeteSim, Spark
PDF Full Text Request
Related items