Unsupervised Sentence Embedding With Prompt Learning And Sample Filter

Posted on:2024-03-09

Degree:Master

Type:Thesis

Country:China

Candidate:B Li

Full Text:PDF

GTID:2568307094484314

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of natural language processing technology,sentence embedding has received more and more attention as a method to convert natural language into mathematical representation.Sentence embedding can encode a sentence into a low-dimensional vector,which is convenient for computer processing and analysis.Since the learning of sentence embedding usually requires a large amount of labeled data to train the model,unsupervised sentence embedding contrast learning methods have emerged in recent years.Unsupervised sentence embedding contrast learning learns sentence embeddings by mapping sentences to low-dimensional vector spaces and comparing similarities between different sentences,and sentence embeddings can be learned without supervision by minimizing contrast loss.However,most of the current contrastive learning methods use random sampling,which may lead to the problems of uneven spatial distribution of samples and false-negative samples.Therefore,based on the research and analysis of the above problems,this paper constructs an unsupervised sentence embedding model based on prompt learning and sample filtering,and applies the proposed model to the actual software system.The detailed main work content of this article is as follows:(1)In order to alleviate the influence of uneven feature distribution in unsupervised sentence embedding space,a novel method to supplement negative samples is designed to optimize the feature space,that is,to enhance the original samples based on prompt learning to achieve the purpose of data expansion.The method designs two prompt templates to enhance the data of positive samples and negative samples respectively,and the specific prompt template can generate samples different from the original input sample features,so that the samples fill the entire feature space as much as possible.Through our model,we can make the sentence embedding space have better alignment and uniformity.At the same time,we conduct sufficient comparative experiments and parameter experiments,and the experiments show that the performance of our proposed model on 7 STS tasks is better than the current common unsupervised sentence embedding methods.Therefore,the proposed unsupervised sentence embedding contrastive learning model based on prompt learning has outstanding advantages in learning sentence embedding.(2)Although the above model has achieved good results in supplementing samples,the sentence embeddings generated by the prompt learning method have a certain randomness,which may produce some false-negative samples,that is,samples similar to the embeddings of positive samples,which may make the model unable to learn how to distinguish between different categories well.At the same time,the sampling method used by the original method is random sampling,that is,other samples in the same batch are treated as negative samples,which will also cause the generation of false-negative samples.Therefore,in order to solve the problems brought by the above model,an unsupervised sentence embedding model based on sample filtering and momentum contrast is proposed.The model uses unsupervised auxiliary information to calculate cosine similarity between samples.Through this method,there is a certain difference between the generated new negative sample and the positive sample,and the samples with high confidence are screened out to reduce the generation of false-negative samples.At the same time,momentum contrast is used to reuse the samples of the previous batch,reduce the sensitivity of the model to false-negative samples,and further improve the performance of the model.(3)This paper designs a semantic similarity computing system based on Python,takes the model proposed in this paper as the back-end support technology of the system,uses the python-flask framework for back-end development,and uses HTML,CSS,jQuery and other technologies to develop front-end pages.The system designs the user registration and login function,text preprocessing function,model selection function,text feature extraction function,and text semantic similarity calculation function.Users can choose from different models,including BERT,SimCSE,and OURS.The above functions can be achieved by running tests through the system.

Keywords/Search Tags:

Sentence embedding, Contrastive learning, Prompt learning, Sample filter

PDF Full Text Request

Related items

1	Improved Sentence Embedding Based On BERT And Prompt-learning
2	Research On Sentence Representation Based On Contrastive Learning And Deep Neural Network
3	Research On Knowledge Base Question Answering Model Based On Contrastive Learning
4	Research On Weakly-supervised Learning Based On Sample Selection Strategy And Contrastive Learning
5	Research On System Log Anomaly Detection Based On Sentence Embedding And Anti-noise
6	Open Intent Detection Based On Prototype Contrastive Learning
7	Research On Text Clustering Based On Self-Supervised Contrastive Learning
8	Sentence-embedding And Similarity Via Hybrid Bidirectional-LSTM And CNN Utilizing Weighted-pooling Attention
9	Deep Learning Based Machine Automatic Question Answering
10	Research On Anomaly Detection Of Log Data Based On Contrastive Learning And Word Embedding