Font Size: a A A

Research On Natural Language Question Generation Over Knowledge Graphs

Posted on:2024-04-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:S BiFull Text:PDF
GTID:1528307364468004Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Knowledge graphs(KG),as an important way of representing knowledge,have received increas-ing concerns in recent years.By representing knowledge in a graph,KG makes it more convenient to organize,query,and exploit knowledge.With the advancement of KG technology,people are be-ginning to focus on enabling computers to better understand and respond to natural language queries,mine information from KG through questioning and provide more personalized knowledge services.KG-based question generation(KGQG)aims to generate answerable natural language questions based on a given subgraph,i.e.,a set of connected triples.It is widely used in various artificial intelligence systems,such as question answering(QA),dialogue,and online education.Traditional methods based on rules and templates are labor-intensive and poorly generalization capabilities.With the dramatic increase in available data and computing power,deep learning-based approaches have gained favor among researchers.Typically,automatic question generation is per-formed as a sequence prediction task and modeled using an encoder-decoder framework,which has made impressive progress compared to traditional methods.However,many things could still be improved in terms of correctness,generalization,and controllability of the questions generated by existing methodologies.First,triples,as highly condensed knowledge carriers,are not informative enough to generate well-expressed and grammatically correct natural language questions.Second,when generating complex questions,the model needs to gain the ability of compositional generaliza-tion,making it hard to clarify the intrinsic relationships between multiple triples.Finally,there is no practical complexity estimator when generating questions with controllable difficulty.Meanwhile,satisfactory controllability could not be achieved due to the simple difficulty modeling in the existing works.Motivated by the above discussions,this paper is devoted to alleviating the three limitations in KGQG: how to guarantee the correctness of generated questions with limited input information?improving the compositional generalization of the model and strengthening the mapping relationship between generated questions and multi-hop facts? building an automatic difficulty estimation and relieving the low diversity of generated results caused by an identical difficulty modeling and poor controllability.Specifically,this paper conducts research in the following three areas.1.For simple question generation,a model based on knowledge-enhanced and grammar-guided is proposed.External knowledge is employed to augment the entities and relations in the triples for limited input information.A global relation encoder is designed to facilitate understanding the answer to alleviate the semantic drift phenomenon.Each word is given a type prediction at the decoding stage,and the type distribution is incorporated into the current word gener-ation.Additionally,syntactic tree and semantic dependency evaluators are trained by mask prediction and autoencoders,respectively.Moreover,the evaluators utilize the generated se- quence’s syntactic and semantic dependency information to guide the current generation step.After decoding,the generated questions are evaluated to obtain reinforcement learning(RL)rewards,mitigating the exposure bias caused by teacher forcing during training.Experiments demonstrate that external knowledge can make the generated results more explicit and diverse.The type constraints and RL help improve semantic and syntactic correctness.The proposed model’s performance on public datasets outperforms the baselines comprehensively.2.For complex question generation,a modular dual learning-based model was proposed.The model leverages the intrinsic connection between KGQA and KGQG,and designs a uniform dual learning framework.First,for a given subgraph,the model generates a shared layer organi-zation via a discrete hidden variable,i.e.,dynamic routing.Second,the network bank contains multiple structurally identical neural network layers that can be placed anywhere in the modu-lar structure based on routing and can be reused.To effectively utilize the shared mechanism between dual tasks,a parameter transmission criterion based on the loss ratio is implemented to determine whether the current module accepts the inductive bias shared by the peer task.Exper-iments illustrate that the dual learning framework can simultaneously improve the performance of KGQA and KGQG.In addition,the modular shared network can significantly enhance the model’s compositional generalization and transfer the learned knowledge.3.For difficulty-controllable question generation,a method based on the soft template(ST)and counterfactual reasoning is proposed to solve the identical difficulty modeling and the lack of stable causal effect between complexity labels and generated results.First,an ST construction method based on a pre-trained language model is designed.The ST is a group of learnable parameters that do not require manual annotation.In parallel,a discrete dynamic ST selec-tor is introduced to maximize the diversity of templates for various question types.Then,a subgraph representation disentanglement module is implemented to decouple the input triples into relevant and irrelevant parts of the current question,reducing noise interference on the generation results.Consequently,a counterfactual reasoning module is designed,which com-bines ST and disentangled fact representations to optimize the decoder in a continuous prompt learning style.Counterfactual reasoning can explore the differences between counterfactual samples shaped by modifying specific attributes and actual samples to learn asking patterns with varying difficulty.An explainable complexity estimation method is proposed for datasets without available complexity annotation,considering the quantification of entities,interroga-tives,and QA.Experimental results prove that the proposed model significantly outperforms the baselines,especially in the controllability and diversity of the generated questions.
Keywords/Search Tags:Knowledge Graph, Question Generation, Reinforcement Learning, Dual Learning, Counterfactual Reasoning
PDF Full Text Request
Related items