Font Size: a A A

Similar Text Matching Technology Based On Deep Learning Research And Application

Posted on:2022-07-21Degree:MasterType:Thesis
Country:ChinaCandidate:C JiFull Text:PDF
GTID:2518306524489454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With our country's technological strength improves,the content in webspace is expanding rapidly.A large number of data problems spring up,such as: The search results of the blog site are extremely repetitive because some people plagiarize the results of others.The intelligent customer service dialogue system needs to match the most relevant answer from the massive database according to the user's question.Duplicate files in the network cloud disk waste huge physical storage space.Solving the various problems caused by the explosive growth of data is an important research.Text semantic similarity calculation is the necessary technology to solve these problems.Text semantic similarity calculation is the research topic of this thesis.This thesis has done a lot of research work,trying to broaden the boundary of the field of text semantic similarity calculation.The classic text representation model Siamese LSTM model converts multiple pieces of text into vectors in the same semantic space and calculates the similarity of the text based on the cosine similarity.The classic text interaction model ESIM improves the accuracy of calculation results by extracting interactive information between texts and adding sub&mul feature.In 2018 the popular pre-training model BERT brings the accuracy of similarity calculation to a new level by using the Transformer network structure to extract contextual features from two directions and doing model pre-training in mang large corpus.This thesis proposes a RoBERTa fine-tuning model based on the attention mechanism,which can extract text semantic features and inter-text interaction features from different directions,levels and scales.This model uses the RoBERTa model and WWM(Whole Word Mask)technology to overcome the BERT`s problem that is poor support for Chinese.In this way,it can learn more great semantic representation of the text in the dimension of the phrase instead of the dimension of the single word because this is more in line with the characteristics of Chinese.Vetors from RoBERTa-net enter interactive-layer-net to get semantic Interactive Information,then pass the feature enhancement module and pooling module to produce the deep feature vectors.The FCnet uses the wide feature vectors produced by the wide-net and the deep feature vectors to calculate the semantic similarity between two texts.Based on three collected Chinese similarity text data sets,this thesis obtains the optimal model by designing multiple comparative experiments,for examples: replace the loss function,adjust the hyperparameters related to label-smooth,select the fit attention mechanism.The model in this thesis mainly compares with Siamese LSTM,ESIM,BERT and RoBERTa.Compared with the classic Siamese LSTM model and ESIM model,the accuracy of our model is improved by 5.23% and 3.99%.Compared with the pre-training models BERT and RoBERTa,there is still a stable improvement of about 1.5% and 0.56%.The above experimental results show that the RoBERTa fine-tuning model based on the attention mechanism proposed in this thesis has achieved better results in semantic similarity calculation.
Keywords/Search Tags:Similarity, deep learning, attention mechanism, RoBERTa
PDF Full Text Request
Related items