Research On Semantic Matching Method Of Chinese Text Based On BERT

Posted on:2022-12-13

Degree:Master

Type:Thesis

Country:China

Candidate:J M Gu

Full Text:PDF

GTID:2518306614959939

Subject:Computer Software and Application of Computer

Abstract/Summary:

PDF Full Text Request

With the advent of the digital age,people's demands are increasing in artificial intelligence fields such as automatic question answering,information retrieval,and machine translation.Since text semantic matching is a basic task in the above-mentioned artificial intelligence field,the research on text semantic matching task is very necessary.In order to improve the accuracy and speed of the model,two semantic matching models of Chinese texts based on BERT are proposed.(1)In term of improving the accuracy of the BERT,the relative position information of missing words and the poor generalization ability of BERT are improved.In this thesis,a Chinese text semantic matching model of relative position encoding and regularization strategy based on BERT called RPER is proposed.Firstly,the RPER uses relative position encoding to replace the absolute position encoding,so that the model can perceive the relative position relationship between words in different positions,and the text representation output is closer to the real semantics,thus improving the accuracy of the model.Secondly,a regularization strategy based on Dropout and symmetric KL divergence is proposed,which improves the accuracy of the model while improving the generalization ability of the model.(2)In terms of improving model speed of BERT,the pre-training parameters obtained by BERT through pre-training tasks are not targeted,which leads to slow convergence speed in the fine-tuning stage and slow data processing speed due to the number of BERT parameters reaching 100 million levels.In this thesis,a fast semantic matching model of improving pre-trained MLM and model compression called FMBERT based on BERT is proposed.Firstly,the pre-training MLM method was improved to improve the orientation of the pre-training model for downstream tasks.Based on the pre-training parameters of BERT,the Chinese text semantic matching dataset was pre-processed and the MLM task are used to continue the pre-training to accelerate the model convergence in the fine-tuning stage.Secondly,the FMBERT obtained after fine-tuning has the same performance as BERT,but the parameters are significantly less than BERT and the samples processing speed is faster.Four Chinese text semantic matching datasets ATEC,BQ,LCQMC and PAWSX are used for experiments.Compared with the BERT,the accuracy of the RPER on the test set of the four datasets has increased by 2.19% on average,and the F1 has increased by 3.83% on average.The FMBERT finally achieves the same prediction performance as the BERT with 58% of the parameters of the BERT,and the prediction speed has increased by 1.8 times.

Keywords/Search Tags:

text semantic matching, BERT, regularization strategy, model compression

PDF Full Text Request

Related items

1	Research And Implementation Of Semantic Matching Technology For Intelligent Question Answering System
2	Research And Implementation Of Text Summarization Technology Based On Semantic Understanding
3	Design And Implementation Of Question Answering System Based On Text To SQL
4	Research On Short Text Similarity Based On Deep Learning
5	Research On Chinese Text Summarization Technology Based On BERT-KA-PGN Model
6	Research On Text Classification Based On Subword-level Occlusion Prediction Method Of Bert Model
7	Research On Improved Text Representation Model Based On BERT
8	Emotional Text Classification Based On Deep Semantic Fusion Features
9	Research On Semantic Similarity Calculation Of Short Text Based On Neural Network
10	The Method Of Question Semantic Matching For Retrieval-based Question Answering