Font Size: a A A

Automatic Commonsense Knowledge Base Construction And Completion For Chinese

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X W ShiFull Text:PDF
GTID:2518306776992879Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Understanding natural language usually requires knowledge beyond what is clearly stated in the text,that is,what people call commonsense.Commonsense graph can formally represent commonsense knowledge,so as to help applications better understand the meaning behind people's use of words and improve tasks related to natural language understanding.However,because commonsense knowledge is often hidden in natural language texts,it is difficult to extract it explicitly.Therefore,the automatic commonsense knowledge graph construction has always been a challenge of artificial intelligence.In addition,most of the existing large-scale commonsense knowledge bases are built in English,while Chinese,as an equally huge language system,the relevant research results and commonsense resources are very scarce,which hinders the further development of natural language processing for Chinese.In this paper,we mainly focus on the generative method to automatically construct the commonsense knowledge base in the Chinese field.At the same time,in order to deal with the lack of resources in the Chinese field,this paper explores how to use the multilingual knowledge distillation technology to realize the cross-lingual commonsense knowledge generation,and uses the way of human-in-the-loop rule verification to complete and correct the generated commonsense knowledge base,so as to finally get a more complete Chinese commonsense knowledge base.The main work of this paper includes the following:(1)This paper proposes a commonsense knowledge generation model for Chinese based on the attention mechanism.Firstly,this paper regards the existing Chinese commonsense knowledge triplet as a seed set,and then fine-tune the Chinese pre-training language model on this basis to let it learn to generate commonsense knowledge,so as to finally generate a new commonsense knowledge triplet,and then automatically construct the commonsense knowledge base for Chinese.In addition,this paper improves the attention mechanism and adds node information to further improve the quality of generation.In this paper,experiments are carried out on two data sets:ATOMIC and ConceptNet.The experimental results show that 44.73%of the generated triples in ATOMIC and 69.23%in ConceptNet are correct.(2)This paper proposes a cross-lingual commonsense knowledge generation model based on multilingual knowledge distillation.Firstly,this paper uses the fine-tuned English commonsense generation model COMET as the teacher model,and trains a Chinese commonsense generation model based on Chinese GPT as the student model,and then uses the MSE loss to make the output sequence distribution of the student model as close as possible to the output of the teacher model.In this way,the student model can master the ability of generating commonsense without training on a large-scale Chinese commonsense corpus.It can also obtain a Chinese commonsense generation model in a faster way.Experiments show that the effect of the Chinese commonsense generation model obtained by multilingual knowledge distillation has been improved.On the same two datasets,the manual evaluation results show that 50.29%of the generated triples in ATOMIC and 78.35%in ConceptNet are correct.(3)This paper proposes a human-in-the-loop rule verification method for error correction and completion of commonsense graph.For the constructed commonsense graph,this paper adopts a human-in-the-loop method to correct and complete it.Because it is unrealistic to verify each commonsense triplet manually,this paper uses declarative rules to correct and complete it,and uses crowdsourcing system to verify whether the generated rules are correct.In addition,in order to reduce the difficulty of crowdsourcing workers' understanding of rules and improve the accuracy of crowdsourcing tasks,this paper also designs three interpretation forms for assistance,and designs relevant experiments to prove that the proposed interpretation form is helpful for crowdsourcing workers to verify rules.
Keywords/Search Tags:Automatic commonsense knowledge base construction, Commonsense knowledge base completion, multi-lingual knowledge distillation, human-in-the-loop
PDF Full Text Request
Related items