Font Size: a A A

Key Technologies Of Heterogeneous Information Networks Data Mining

Posted on:2020-05-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J C LiFull Text:PDF
GTID:1368330611993036Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology and intelligent technology,human beings have moved from the IT(Information Technology)era to the DT(Data Technology)era.Taking advantage of the Internet,the big data technology is now entering an accelerated development period,with the total amount of data increasing by 50% every year,applied to all walks of life.The data on the Internet interact and interrelate with each other to form a large and complex heterogeneous information network(HIN).HIN is everywhere,such as e-commerce networks,social media networks,mobile communication networks,transportation networks,scientific citation networks,medical and health networks,etc.Scientific and reasonable HINs data mining has an important theoretical and practical application value.A HIN is composed of different types of nodes and all types of nodes connected with each other via complicated relationships.All these characteristics pose complex and intriguing challenges for modeling and analyzing HINs;to which well-known approaches for traditional homogeneous networks cannot be applied directly.Therefore,with the consideration of the characteristics of HINs,how to model and analyze HINs in a scientific way is a major research topic worth exploring.Based on the realistic application and theoretical requirements,this paper summarizes several key technical issues that need to be solved in the research of heterogeneous information networks from the viewpoints of “data preprocessing”? “network modeling”? “key nodes analyzing”? “link predicting”? “weighted link score recommending”.It includes entity disambiguation in multi-source data fusion,standardized modeling of HINs,HINs key nodes identification,HINs link prediction and user-item score prediction in weighted HCNs.For each key technical issue,we proposed a solving method by integrating multiple sources of information,including structural information,semantic information,as well as network nodes and edges,attribute information.The main research points and innovations of the thesis are summarized as follows:(1)We proposed an entity disambiguation method based on multidimensional feature similarity.Taking full use of entity's attribution information and its associated information within a HIN,we proposed an entity disambiguation method based on multidimensional feature similarity.First,we extracted the multidimensional features of entities and named objects,including the basic attribution features,association relationship features,etc..Next,we calculated the feature similarity for each type of extracted features of entities and named objects.Then,all the feature similarities are aggregated through an integrating algorithm.Finally,the reliability and effectiveness of our proposed method are illustrated with an empirical case study.Our proposed method not only utilizes the attribute features of entities and named objects but also fully exploits the relationship information in HINs for entity disambiguation.(2)We proposed a standardized modeling method for HINs.Aim at the characteristics of HINs,we proposed a standardized modeling method for HINs from two aspects: static structure and dynamic timing.For the heterogeneous information static network,the concept of network mode is introduced to describe the meta-structure of a heterogeneous information network.We also present the concept of meta-path to portray the relationship between entities from the structural and semantical aspect.For the temporal HINs,we first define the concept of the event to describe an information transferring process between two entities.Then,a temporal HIN model is established considering the timing sequence of all the event occurrences within entities in HIN.Finally,our proposed static structural and temporal HIN model is applicated with several examples,and the results show that the HIN standardization modeling method proposed in this paper is with good feasibility and scalability.(3)We proposed a semantic-based node importance evaluation method in HINs.We proposed a semantic-based HIN node importance evaluation model from the perspective of network capability.First,a meta-path based HIN capability comprehensive evaluation index(CCEI)is proposed to measure the network capability.The CCEI is not a simple superposition of the capabilities of all the nodes,instead,it fully considered the rich semantics hidden in the meta-paths.Next,we take consideration of the dynamic characteristics of temporal HINs and extend the established CCEI to the temporal HINs.Then,based on CCEI,we proposed a node importance evaluation method to measure the contribution of each node in the HIN by comparing the amount of HIN CCEI change before and after removing the node.Finally,the reliability and effectiveness of the proposed method are demonstrated with a case study.The results show that the semantic-based HIN node importance evaluation method proposed in this paper can effectively identify the key nodes in HINs.(4)We proposed a HIN link prediction method based on BP neural network.We take full advantage of the structural information and rich semantic information in HINs and put forward a novel integrated framework called metapath feature-based BP neural network model to predict multiple types of links for HINs.More specifically,the concept of meta-path is introduced,followed by the extraction of meta-path features for HINs.Next,based on the extracted meta-path features,a supervised link prediction model is built with a three-layer BP neural network.Then,the solution algorithm of the proposed link prediction model is put forward to obtain predicted results by iteratively training the network.Last,numerical experiments on the dataset of several examples are conducted to verify the effectiveness and feasibility of the proposed MPBP.It shows that our proposed method in the paper takes the meta-path feature as input and can achieve very good performance predicting multiple types of links in HINs.(5)We proposed a HIN user-item score prediction method based on meta-path similarity.For weighted HINs,we proposed a meta-path similarity based user-item score prediction model to solve the score prediction problem.Specifically,first,the meta-path features in a HIN are extracted under the guidance of actual requirements and latent semantics contained in the meta-path.Next,the meta-path similarity model is established to calculate the similarities in the useritem pairs and predict the rating score.Then,a user-item recommendation model is presented by assigning each meta-path with different preferences and combining all the meta-path similarities by an optimal process.Finally,we conduct considerable experiments on a HIN case to demonstrate the feasibility and effectiveness of the proposed method.It shows that the proposed method can achieve very good performance,making full use of the rich semantic advantages of meta-paths in HINs.
Keywords/Search Tags:Heterogeneous information network, entity disambiguation, normalized modeling, key node identification, link prediction, score prediction
PDF Full Text Request
Related items