Font Size: a A A

Research On Semantic Matching Method Of Chinese Text Based On BERT

Posted on:2022-12-13Degree:MasterType:Thesis
Country:ChinaCandidate:J M GuFull Text:PDF
GTID:2518306614959939Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
With the advent of the digital age,people's demands are increasing in artificial intelligence fields such as automatic question answering,information retrieval,and machine translation.Since text semantic matching is a basic task in the above-mentioned artificial intelligence field,the research on text semantic matching task is very necessary.In order to improve the accuracy and speed of the model,two semantic matching models of Chinese texts based on BERT are proposed.(1)In term of improving the accuracy of the BERT,the relative position information of missing words and the poor generalization ability of BERT are improved.In this thesis,a Chinese text semantic matching model of relative position encoding and regularization strategy based on BERT called RPER is proposed.Firstly,the RPER uses relative position encoding to replace the absolute position encoding,so that the model can perceive the relative position relationship between words in different positions,and the text representation output is closer to the real semantics,thus improving the accuracy of the model.Secondly,a regularization strategy based on Dropout and symmetric KL divergence is proposed,which improves the accuracy of the model while improving the generalization ability of the model.(2)In terms of improving model speed of BERT,the pre-training parameters obtained by BERT through pre-training tasks are not targeted,which leads to slow convergence speed in the fine-tuning stage and slow data processing speed due to the number of BERT parameters reaching 100 million levels.In this thesis,a fast semantic matching model of improving pre-trained MLM and model compression called FMBERT based on BERT is proposed.Firstly,the pre-training MLM method was improved to improve the orientation of the pre-training model for downstream tasks.Based on the pre-training parameters of BERT,the Chinese text semantic matching dataset was pre-processed and the MLM task are used to continue the pre-training to accelerate the model convergence in the fine-tuning stage.Secondly,the FMBERT obtained after fine-tuning has the same performance as BERT,but the parameters are significantly less than BERT and the samples processing speed is faster.Four Chinese text semantic matching datasets ATEC,BQ,LCQMC and PAWSX are used for experiments.Compared with the BERT,the accuracy of the RPER on the test set of the four datasets has increased by 2.19% on average,and the F1 has increased by 3.83% on average.The FMBERT finally achieves the same prediction performance as the BERT with 58% of the parameters of the BERT,and the prediction speed has increased by 1.8 times.
Keywords/Search Tags:text semantic matching, BERT, regularization strategy, model compression
PDF Full Text Request
Related items