Research Of Short Text Representation And Similarity Judgment In Deep Learning

Posted on:2022-03-23

Degree:Master

Type:Thesis

Country:China

Candidate:Z Q Fang

Full Text:PDF

GTID:2518306515485664

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the continuous development of information technology,more and more information is now presented in the form of electronic text data.Facing the explosive growth of data volume,automatically extracting the required information efficiently and quickly has become a hot spot in natural language processing research.The popularity of the mobile Internet has made short texts the main body of electronic texts.Therefore,the use of short text corpus for semantic similarity research has a large amount of research materials and extensive application value.The task of judging short text semantic similarity refers to judging whether they express similar meanings from a semantic level for a given set of sentence pairs.The text similarity judgment can be regarded as a similar or dissimilar binary classification problem.At present,the algorithm research on short text semantic similarity determination tasks mainly focuses on the research of text representation methods.Among them,the BERT pre-training model based on deep learning has been deeply used in many tasks due to its flexible training methods and powerful representation capabilities.Research and application.In this paper,in order to improve the short text representation ability of the BERT pre-training model,two models,BERT＿RF＿S and Topic＿BERT＿S,are proposed.The main improvements are as follows:1.When the BERT pre-training model is applied to the similarity judgment task of short text,it will limit its ability to represent text due to the insufficient number of samples.In response to this problem,this paper proposes the BERT＿RF＿S model,which uses the fast gradient method to generate noise samples for input training to achieve representation enhancement and improve the model’s characterization ability.2.The BERT pre-training model only encodes the contextual semantic information of the text,and lacks topical information that can summarize the overall situation.In order to form a more comprehensive text representation,this paper proposes a topic model based on a variational autoencoder.The model can be trained unsupervised,and the generated topic information representation can be fused with semantic representation to make up for the lack of topic information at the word level.3.When fusing the semantic representation and the topic representation generated by the topic model,the effect is not ideal due to the different convergence speeds of different models.In order to solve this problem,this paper proposes the Topic＿BERT＿S model based on multi-task learning,which combines the supervised model and the unsupervised model,and learns the semantic information and topic information of the text.Finally,the BERT＿RF＿S model and the topic model can converge to the maximum at the same time.Excellent,to generate a text feature representation with more comprehensive information for short text semantic similarity determination.Finally,the Topic＿BERT＿S model is applied to the Chinese and English standard data sets in the fields of news,finance,and medicine.The results show that due to the improvement of the Topic＿BERT＿S model in representation enhancement,introduction of topic information,and multi-task learning,the accuracy of similarity determination is obvious.It is better than the model before the improvement,and compared with the current advanced algorithm,the Topic＿BERT＿S model is also at the mainstream level.

Keywords/Search Tags:

natural language processing, text similarity judgment, deep learning, representation learning, multi-task learning

PDF Full Text Request

Related items

1	Research On Automatic Personality Detection Method Based On Multi-task Learning
2	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
3	Research On Automatic Summarization Algorithm For Meeting Speech Transcribed Text
4	Sentence Semantic Similarity Learning Based On Deep Learning
5	A Neural Network Based On WXLNet And Multi-Task Lable Embedding For Sentiment Analysis
6	Research On Conversational Emotion Detection Based On Deep Learning
7	Research On Text Similarity Based On Bert
8	A Research On Deep Multi-label Learning Techniques For Text Semantic Indexing
9	Joint Learning Methods For Distributed Representations Of Natural Language
10	Research On Deep Learning-Based Representation Learning Algorithms