Font Size: a A A

The Representation And Alignment Method Of Heterogeneous Data And Its Application

Posted on:2021-05-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WuFull Text:PDF
GTID:1368330602493438Subject:Computer applications engineering
Abstract/Summary:PDF Full Text Request
In real information systems,it is common that there are differences between data types and abstract structures,which is named as heterogeneous data.How to represent and align the objects associated with heterogeneous data is a challenge due to the diversity of data types and structures.According to data characteristics,heterogeneous data can be divided into structural heterogeneous data and typed heterogeneous data.There are various forms of structural heterogeneous data,and non-one-to-one correspondence exists between different data.Different types of heterogeneous data have different characteristics.The number of discrete data between any two data points is limited,which makes the processing more flexible and concise,while the data points of continuous data can be subdivided into an infinite number of values with regularity.Thus,it's challenging to fuse these two completely different types of data.A method for effectively representing heterogeneous data is mapping them to a low-dimensional embedding space through representation learning.However,in different application scenarios,different elements in heterogeneous data have their own distribution and characteristics and thus various elements need to be represented and aligned according to the characteristics of specific application scenarios.This paper focuses the representation and alignment method of heterogeneous data and its two typical application scenarios:knowledge bases question answering and recommendation under heterogeneous environment information,this article carries out innovative researches are conducted in the following aspects from these two application scenarios,mainly including:1.For knowledge bases question answering,the representation learning and alignment method of structured heterogeneous data are studied.We mainly focus on the representation and alignment of sequence data(text)and graph structure data(knowledge graph),and study the subgraph-representation-based alignment method for answering questions over knowledge bases.The main challenges of this problem are:(I)how to represent and align the complex tex t data containing multiple components(the entity)with the structural data;(2)how to deal with the complex inference process when the sequence problem involves multiple relational inferences.Our solution is to convert this research problem into the alignment of subgraph structure and text sequence and propose a knowledge bases subgraph based on directed acyclic graph structure.This subgraph contains related information of the multiple entities and relationship in the question.And we designed a deep framework,named DAG-SCHEMA,of long-short-term memory network and key-value memory network based on directed acyclic graph,which is used for low-dimensional representation learning and alignment of heterogeneous data.The experimental results show that this method outperforms other methods on widely used datasets,especially when dealing with complex problems with multiple entities,verifying that our method can effectively make representation learning and alignment for the structural heterogeneous data2.The representation learning and different alignment method of typed heterogeneous data are studies based on recommendation system under multi-type heterogeneous auxiliary information.The challenges of this research are:(1)How to integrate and represent multiple types of heterogeneous environmental information;(2)How to design the alignment loss function with different task characteristics.Therefore,our solution is to combine the characteristics of different types of attributes in heterogeneous auxiliary information and transform it into a node representation problem on heterogeneous network,so as to realize the fusion representation of heterogenous environment information.Furthermore,the design of alignment loss function based on inner product is studied and the data alignment is achieved by the factorization of user-item interaction.In the meanwhile,considering the di:fference of task inclination of different recommendation scenarios,the design problem of alignment loss function based on Bayesian personal ranking is also studied,and the representation and alignment of user and item are realized by taking the difference of users' preferences for items into account.The results in the recommendation datasets of three real heterogeneous auxiliary information scenarios show that our proposed algorithms,named Userltem2vec and COIR,can handle this problem more effectively than existing benchmark algorithms,especially for processing data nodes that are rare in heterogeneous data.3.The alignment loss and learning algorithm are improved for a special case of "implicit feedback" in representation learning and alignment method.The challenges of implicit feedback are the missing value of negative sample and the sparse supervised signal,and in a special class of cases,the feedback itself is diverse.Therefore,our solution is to construct a heterogenous network,which contains different types of nodes and edges,to model various types of positive feedback and design a self-attention mechanism based on implicit negative feedback,named INA,which can automatically learn and adjust the influence of negative sample in alignment supervised signal,in order to solve the parameters problem of negative sampling.The experimental results on several datasets verify that the proposed alignment loss function can effectively learn and align the representation of data.
Keywords/Search Tags:Heterogeneous data, Representation learning, Deep learning, Implicit feedback, Supervised learning
PDF Full Text Request
Related items