Font Size: a A A

Research On Unsupervised Commonsense Question Answering Model Based On Data Enhancement

Posted on:2024-01-26Degree:MasterType:Thesis
Country:ChinaCandidate:W LiFull Text:PDF
GTID:2558307181954439Subject:Electronic Information (in the field of computer technology) (professional degree)
Abstract/Summary:PDF Full Text Request
Unsupervised commonsense question answering task is a task that uses the knowledge in the knowledge base to automatically generate question answering data and then train the model.The automatic generation of question answering data by machines can not only reduce the training cost of commonsense question answering models,but also avoid the model paying too much attention to label bias in manually labeled data.Therefore,the research on unsupervised commonsense question answering tasks has important theoretical significance and application value.The existing research mainly focuses on how to introduce more external knowledge to generate question answering datasets,but lacks in-depth research on improving the quality of question answering datasets themselves.To solve the above problems and enhance the quality of the question answering dataset,this thesis proposes an unsupervised commonsense question answering model framework that covers key elements such as interference item enhancement,data filtering,and course learning strategies.The main innovations of this article are as follows:1)It is proposed to construct a question sub-graph for each question,limit the candidate range of question interference items,enhance the correlation between interference items and the question and the correct answer,and improve the interference ability.The question sub-graph uses keywords as query criteria to retrieve relevant knowledge triplets from the knowledge base as candidate triplets for generating interference items,calculates the correlation between the candidate triplets and the question and the correct answer,and then uses the tail entities of the candidate triplets that meet the criteria as interference items for the question.2)A data filtering method combining the knowledge diversity evaluation results and fluency evaluation results of data is proposed to remove noise data from the question answering data set and improve data quality.Compared to existing methods,the data filtering method in this article not only focuses on the fluency of the question answering data itself,but also focuses on the knowledge diversity of the entire question answering dataset.By combining the diversity evaluation results and the fluency evaluation results,data filtering is performed,enabling the question answering dataset to take into account the fluency of data and the diversity of knowledge.3)A course learning strategy is proposed to reasonably plan the arrangement order of question answering data in the question answering dataset,making it easier for the model to learn knowledge from the question answering dataset.The course learning strategy simulates the gradual progression of human learning from simplicity to difficulty,using the similarity between interfering items and correct answers as the difficulty evaluation criteria for questions,and arranging the data in the question and answer dataset in a sequence from easy to difficult based on the difficulty of the question.Finally,this thesis verifies the performance of the model in five test tasks.The experimental results show that this model has better performance than other unsupervised commonsense question answering models,indicating that data enhancement on the question answering dataset can effectively improve the performance of the commonsense question answering model.
Keywords/Search Tags:unsupervised commonsense question answering, negative samples enhanc ement, data filtering, course learning
PDF Full Text Request
Related items