Font Size: a A A

Research On Text Representation Optimization Method Based On Pre-Trained Models

Posted on:2023-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Y X LiangFull Text:PDF
GTID:2568306845456074Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Representation learning of text is the basis of many natural language tasks,and the quality of text representation directly affects the performance of downstream tasks,including text classification and generation.Text representation methods have evolved tremendously from simple one-hot representations and bag-of-words representations to static word embeddings and supervised text representation learning.In recent years,pre-trained language models based on Transformer structure,represented by BERT,have made remarkable progress in various natural language downstream tasks and have become the default choice for extracting text representations.However,recent studies have shown that the text representations extracted by these pre-trained language models suffer from anisotropy,i.e.,the text representations of the models are extremely non-uniformly distributed in different directions,which makes their accuracy in calculating word similarity and text similarity low and affects the performance of the text representations on downstream tasks.In this thesis,we investigate and improve the word representation and sentence representation of the pre-trained language model,respectively.1.To address the anisotropy problem of pre-trained word representations,we propose a weight-based method to remove the dominant direction of word embeddings.Firstly,this paper measures the geometric features such as the average similarity of the pre-trained word embeddings of the BERT model.Then,by analyzing the principal component projection and singular value distribution,we find that the problem of non-uniform distribution of BERT pre-trained word embeddings in different directions impairs the expressive capability of the word vectors.In this paper,we propose to solve this problem by weighted removal of dominant directions in BERT word embeddings,where each dominant direction corresponds to a learnable weight to determine the proportion of that direction removed,and these weights are learned on the word similarity task.Experiments show that this method makes the word vectors more isotropic and improves the performance on three standard evaluation tasks: word similarity,word analogy,and text semantic similarity.2.To address the anisotropy problem of the sentence representation of the pre-trained models,a combination of the Prompt technique and contrastive learning is proposed.Firstly,this paper enhances the performance of BERT pre-trained sentence representation on semantic textual similarity task by Prompt engineering without training,analyzes the effect of different Prompt templates on sentence representation,and finds that punctuation usage has a great impact on BERT model sentence representation.Based on the above analysis,this paper proposes an unsupervised contrastive learning model based on Prompt data augmentation.Our method uses the output vector of the [MASK] token from the Prompt template as the sentence representation.The normalized temperature-scaled cross-entropy loss function(NT-Xent)on the Wikipedia corpus is used for unsupervised contrastive learning.The model is tested on several publicly available datasets of textual semantic similarity,and the effectiveness of the approach for improving BERT sentence representations is verified.
Keywords/Search Tags:Text representation, pre-trained language models, BERT, contrastive learning, prompt learning
PDF Full Text Request
Related items