Font Size: a A A

Research On User Relevance Measure Method Combining LDA And Meta-path In Heterogeneous Information Network

Posted on:2021-03-26Degree:MasterType:Thesis
Country:ChinaCandidate:D WangFull Text:PDF
GTID:2428330611953107Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the rapid development of large-scale heterogeneous networks composed of multiple types and interrelated objects such as social networks,new media networks,and document networks,analysis of heterogeneous information networks(HIN)has become one of data mining important and popular research directions.As one of the important research directions in heterogeneous network mining,relevance search has attracted more and more scholars' attention.Relevance search aims to mine relevant peer entities from large-scale heterogeneous information networks,so as to provide a basis for heterogeneous network research,especially to lay a foundation for relevant user recommendation work.However,in the face of the current situation where the amount of data in heterogeneous information networks is increasing exponentially,it is often difficult for users to mine interesting information from a large amount of data.This is the so-called "information overload" problem.This problem greatly reduces the efficiency of information use.Different meta-paths can be used to define different relationships between objects.Therefore,a meta-path based entity relevance measure method came into being.This method can find the most relevant object information related to the query object from a large number of data sets,and as a the foundation and core of relevant research on constructing information networks has been widely used in many practical scenarios.At present,there are two main problems in the study of measure methods based on meta-path.First,in large-scale complex heterogeneous information networks,due to the large number of nodes and the complex types of edges,it is impossible to define or enumerate all meta-paths,so that the measure efficiency and accuracy of its entity relevance have been greatly challenged.The second is that the existing user relevance measure methods still have improvement space due to insufficient implementation of multi-dimensional analysis and link analysis.In response to the above problems,this thesis proposes a user relevance measure method combining LDA and meta-path in heterogeneous information network.Onthe one hand,important meta-paths are automatically obtained in complex heterogeneous networks;on the other hand,The path concentrates mining node semantic information to improve the accuracy of user measure based on meta-path.The main work is divided into the following two parts:(1)Propose an extended tree based meta path generation method(Extended Tree based Meta Path Generation,ETMPG).First,the knowledge graph is modeled as a heterogeneous information network;then,the meta-path is automatically extracted from the architecture-rich HIN according to the order of the node connection probability;finally,the weight learning method using the logarithmic maximum likelihood function is trained meta-path weights,select the most important meta-path set.(2)Propose a user relevance measure method combining LDA and meta path analysis(User Relevance Measure Method Combining LDA and Meta Path Analysis,LPUSim)is proposed.The method first uses LDA to model the topic,and analyzes the relevance of nodes by analyzing the node content in the network.Then,the meta-path is introduced to describe the relationship type between nodes,and relevance measure is carried out for users in important meta-paths by relevance measure method(DPRel).Finally,the node relevance is incorporated into the user relevance measure,and the node semantic information is fully considered,thereby improving the accuracy of the user relevance measure.In this thesis,through full experiments on the Yago database and the IMDB movie data set,the user relevance measure method combining LDA and meta-path in heterogeneous information network proposed in this thesis is evaluated in detail.The results of the link prediction experiment on the knowledge graph and the user relevance measure experiment on the movie data set show that this method has significantly improved the time efficiency and accuracy of meta-path mining compared to the current mainstream methods,and can overcome the data.The disadvantage of sparsity is to improve the accuracy of user relevance measure.Through the analysis of the experimental results,it can be concluded that the method proposed in this thesis has the advantages of more efficient and stable.
Keywords/Search Tags:heterogeneous information network, knowledge graph, link prediction, user relevance, LDA, meta-path, measure
PDF Full Text Request
Related items