Automatic Commonsense Knowledge Base Construction And Completion For Chinese

Posted on:2022-12-09

Degree:Master

Type:Thesis

Country:China

Candidate:X W Shi

Full Text:PDF

GTID:2518306776992879

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

Understanding natural language usually requires knowledge beyond what is clearly stated in the text,that is,what people call commonsense.Commonsense graph can formally represent commonsense knowledge,so as to help applications better understand the meaning behind people’s use of words and improve tasks related to natural language understanding.However,because commonsense knowledge is often hidden in natural language texts,it is difficult to extract it explicitly.Therefore,the automatic commonsense knowledge graph construction has always been a challenge of artificial intelligence.In addition,most of the existing large-scale commonsense knowledge bases are built in English,while Chinese,as an equally huge language system,the relevant research results and commonsense resources are very scarce,which hinders the further development of natural language processing for Chinese.In this paper,we mainly focus on the generative method to automatically construct the commonsense knowledge base in the Chinese field.At the same time,in order to deal with the lack of resources in the Chinese field,this paper explores how to use the multilingual knowledge distillation technology to realize the cross-lingual commonsense knowledge generation,and uses the way of human-in-the-loop rule verification to complete and correct the generated commonsense knowledge base,so as to finally get a more complete Chinese commonsense knowledge base.The main work of this paper includes the following:（1）This paper proposes a commonsense knowledge generation model for Chinese based on the attention mechanism.Firstly,this paper regards the existing Chinese commonsense knowledge triplet as a seed set,and then fine-tune the Chinese pre-training language model on this basis to let it learn to generate commonsense knowledge,so as to finally generate a new commonsense knowledge triplet,and then automatically construct the commonsense knowledge base for Chinese.In addition,this paper improves the attention mechanism and adds node information to further improve the quality of generation.In this paper,experiments are carried out on two data sets:ATOMIC and ConceptNet.The experimental results show that 44.73%of the generated triples in ATOMIC and 69.23%in ConceptNet are correct.（2）This paper proposes a cross-lingual commonsense knowledge generation model based on multilingual knowledge distillation.Firstly,this paper uses the fine-tuned English commonsense generation model COMET as the teacher model,and trains a Chinese commonsense generation model based on Chinese GPT as the student model,and then uses the MSE loss to make the output sequence distribution of the student model as close as possible to the output of the teacher model.In this way,the student model can master the ability of generating commonsense without training on a large-scale Chinese commonsense corpus.It can also obtain a Chinese commonsense generation model in a faster way.Experiments show that the effect of the Chinese commonsense generation model obtained by multilingual knowledge distillation has been improved.On the same two datasets,the manual evaluation results show that 50.29%of the generated triples in ATOMIC and 78.35%in ConceptNet are correct.（3）This paper proposes a human-in-the-loop rule verification method for error correction and completion of commonsense graph.For the constructed commonsense graph,this paper adopts a human-in-the-loop method to correct and complete it.Because it is unrealistic to verify each commonsense triplet manually,this paper uses declarative rules to correct and complete it,and uses crowdsourcing system to verify whether the generated rules are correct.In addition,in order to reduce the difficulty of crowdsourcing workers’ understanding of rules and improve the accuracy of crowdsourcing tasks,this paper also designs three interpretation forms for assistance,and designs relevant experiments to prove that the proposed interpretation form is helpful for crowdsourcing workers to verify rules.

Keywords/Search Tags:

Automatic commonsense knowledge base construction, Commonsense knowledge base completion, multi-lingual knowledge distillation, human-in-the-loop

PDF Full Text Request

Related items

1	Research On Visual Relationship In The Construction Of Graphic-Text Commonsense Base
2	Construction Of Commonsense Causal Knowledge Base
3	Research And Implement Of Web-based CommonSense Knowledge Share Platform
4	Research And Implementation Of Commonsense Reasoning Technology Based On Knowledge Fusion
5	Research On External Knowledge Integrated Reasoning For Commonsense Question Answering
6	Research And Implementation Of Answer Acquisition Algorithm For Commonsense Questions
7	Inferential Commonsense Knowledge from Text
8	Some Discussions About Knowledge Structures In A Knowledge Base
9	Knowledge Base Construction And Knowledge Discovery In Pharmaceutical Industry
10	Research Of Automatic Knowledge Base Construction Based On Hierarchical Multi-labels