Font Size: a A A

Research On Heterogeneous Information Networks Mining Method Based On Meta-Path

Posted on:2016-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:P J DingFull Text:PDF
GTID:2428330473964916Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of heterogeneous information network that represented by social network,the analysis demand of it keeps rising.Heterogeneous information network analysis becomes an important and hot topic of data mining.It is widely used in social network,web data management and protein structure prediction,etc.The purpose of heterogeneous information networks analysis is extracting useful knowledge from heterogeneous network,which involves multiple types of objects interconnected by multiple types of links.Heterogeneous networks analysis mainly includes the object classification,information retrieval,link prediction and ranking,etc.Object classification and similarity search are the significant content of data mining in heterogeneous network,also are the extensive attention of researcher.Although many relative algorithms were proposed,the user satisfaction needed to be improved.For instance,the existing similarity search algorithms did not consider the dynamic variation of objects in networks;the existing classifications that need a mass of iterative calculation were not appropriate for data management in heterogeneous network that objects increase dynamically.In this paper,an in-depth study about the above issues is launched.In this paper,the existing similarity search algorithms and classification algorithms in heterogeneous network were analyzed in detail.Besides,the role of meta-path was studied in heterogeneous network analysis.In view of continuously changing of objects in heterogeneous network over time,a similarity search algorithm based on meta-path that considers time factors was proposed.Due to increase of the numbers of objects in heterogeneous network over time,an inductive classification algorithm,which can directly predict the labels of new samples,was proposed.The main research results are as follows:(1)To solve the problems of similarity search and object classification in heterogeneous network,recent research are summarized and analyzed.Then the problems and application area of these research are pointed out,which will further make clear the orientation of similarity search and classification in heterogeneous network.(2)Because the existing similarity search algorithms only concern the number of links among objects,and don't consider the fact that links change over time,adynamic similarity search based on meta-path named PDSim is proposed.Firstly,the meta-path instances ratio among objects is obtained by the number of links among objects.Secondly,the time difference degree is attained by link settling time.Finally,the similarity among objects is acquired by the meta-path instances ratio and the time difference degree.In multiple instances of the similarity search,PDSim kept up with the interesting variation of object which dynamically changed with time.Compared with the traditional algorithm,the clustering accuracy of Normalized Mutual Information could be increased by 0.17% to 9.24% when applied to clustering.(3)Almost current algorithms are conductive classification,and they cannot directly predict the labels of new samples in heterogeneous network,so an inductive classification based on meta-path named Hic is proposed.Firstly,an inductive classification model is built by paths among labeled objects.Secondly,the labels of target objects are predicted by relative links of target objects.Compared with existing classification methods,Hic algorithm can achieve higher classification accuracy,standard information and smaller variance of classification accuracy.
Keywords/Search Tags:heterogeneous information network, meta-path, dynamic similarity search, inductive classification, link
PDF Full Text Request
Related items