Font Size: a A A

Mining Significant Meta-path In Schema-rich Heterogeneous Information Networks

Posted on:2020-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:C SunFull Text:PDF
GTID:2370330602952531Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,more and more researchers have paid attention to the research of heterogeneous information networks in the field of data mining.Since most graph data extracted from the real world are heterogeneous,people pay more attention to the diversity of nodes and edges in the graph data.Therefore,meta-path,an important concept in heterogeneous information network,plays a key role.Meta-path is an important tool in schema-rich heterogeneous information networks.However,in the schema-rich heterogeneous information network,the meta-path is difficult to be obtained by manual design.Based on the above requirements,we study the problem of meta-path mining and propose an efficient algorithm to solve this problem in schema-rich heterogeneous information networks.This paper explores methods to generate meta-paths with maximum similarity through given pairs of nodes.Firstly,this paper proposes a short meta-path set mining algorithm based on network schema.This algorithm is mainly divided into two parts: generating process and verification process.Firstly,all short meta-paths are enumerated through the generating module,and then the meta-path set with the highest correlation degree is screened out through the path-limited random walk in the validation module.At the same time,in order to improve the efficiency of the algorithm,this paper integrates the generation process and the verification process.At the time of generating the meta-path,the correlation of the metapath is verified,and then the generating process is pruned according to the correlation of the meta-path.Thus achieved better time efficiency.However,in the complex heterogeneous information network,the network schema is often too large for meta-path mining.In order to solve the network schema failure,we construct a new schema of local heterogeneous network.This structure not only has the characteristics of high efficiency and small scale of the network schema in the simple heterogeneous information network,but also stores the abundant information contained in the network schema of the complex heterogeneous information network to the maximum extent.It plays a key role in the efficiency of this algorithm to replace the large-scale network schema on the complex heterogeneous information network with this new data structure.At the same time,in order to deal with the problem of node multi-type selection,this paper summarizes the shortcomings of the common similarity measurement methods of heterogeneous information network,and designs a novel type evaluation function.By considering the particularity and support degree of the input pair set and synthesizing the structural characteristics of the new local network shcema proposed above,a method of measuring the meta-path similarity based on this type of evaluation function is proposed.Based on the type selection method and meta-path similarity measurement method proposed above,a fast special path mining algorithm(FSPM)is designed on the new network schema,which can automatically extract the relevant meta-path from the complex heterogeneous information network.With the assistance of new local network schema and automatic type selection,the time complexity of the problem is greatly reduced.With the help of the mining meta-path,more in-depth research can be carried out in the aspects of similarity measurement,clustering,classification,link prediction,ranking,recommendation,information fusion and so on.The algorithm consists of three parts: the first part is about the fast generation of local hierarchical graph;The second part is to combine multiple local hierarchical graphs and generate the new network schema.The third part is the method of meta-path mining on the new network schema.Combining these three processes,we get the maximum similarity path mining algorithm in this paper.Through a lot of experiments on Yago database and DBpedia database,the proposed algorithm is evaluated in detail.Experimental results of link prediction on each knowledge map show that compared with other meta-path mining algorithms,this algorithm not only greatly improves the time efficiency,but also improves the accuracy to varying degrees.Through experiments,it can be concluded that this algorithm has the advantages of high efficiency and stability.
Keywords/Search Tags:Heterogeneous Information network, meta-path, network schema, data mining
PDF Full Text Request
Related items