Font Size: a A A

Research On Chinese Open-Domain Question Answering Based On Discrete Prompt Learning Representation

Posted on:2024-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:Z YanFull Text:PDF
GTID:2568306941488604Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
As textual information on the Internet becomes more complex,it becomes increasingly difficult for people to obtain the information they need on the Internet,even with the help of search engines.In recent years,with the rapid development of machine learning,it has become possible to realize open domain question answering tasks.Open domain question answering refers to answering a question in natural language without a specific context.Because the existing English research lacks special designs for Chinese,there is a loss of accuracy when using these systems directly.Open domain question answering focuses on the research of two-stage models.However,the generalization performance of the existing dense passage retrieval model needs to be improved;the robust performance of the existing extractive reading model needs to be improved.In order to solve the above problems,the thesis focuses on the generalization and robustness of Chinese open domain question answering,combined with discrete prompt learning methods.The following work is carried out:(1)Aiming at the problem that the existing related datasets cannot meet the requirements of open domain question answering,a complete Chinese open domain question answering dataset DYKzh is constructed.Statistical analysis shows that the further increase in the retrieval complexity of the dataset is helpful to evaluate the generalization of the retrieval model;the reading comprehension task of the dataset is more complex,which is helpful to evaluate the robustness of the reading model.(2)Aiming at the generalization challenge of the retrieval model,a Chinese dense vector retrieval model(Span Prompt Dense Passage Retrieval,SPDPR)based on the span mask prompt template is proposed.This model proposes a span mask prompt template that is more suitable for the Chinese lexical structure and adopts a weight sharing strategy in the Siamese model.The retrieval accuracy of the model outperforms several existing methods on the DYKzh dataset.(3)Aiming at the challenge of robustness of the reading model,a Chinese extractive reading model(Multiple Passage Relation Reader,MPRR)that integrates multi-passage relationships is proposed.This model proposes to fuse multi-passage relational layers and adopts a balanced training strategy in the reading model.The reading accuracy of the model is better than the existing reading models on the DYKzh dataset.(4)To sum up,a Chinese open domain question answering system is designed.The above model can be intuitively demonstrated by the practical application of the open domain question answering system.The application results can effectively evaluate the model,and help the model to iterate quickly,providing an effective reference for the large-scale deployment of the system.
Keywords/Search Tags:Natural Language Processing, Open-domain Question Answering, Dense Passage Retrieval, Extraction Machine Reading Comprehension, Discrete Prompt Tuning
PDF Full Text Request
Related items