Research On Text Clustering Based On Self-Supervised Contrastive Learning

Posted on:2024-03-03

Degree:Master

Type:Thesis

Country:China

Candidate:M Z Wang

Full Text:PDF

GTID:2568306941960469

Subject:Master of Electronic Information (Professional Degree)

Abstract/Summary:

PDF Full Text Request

Text clustering is one of the most fundamental challenges in unsupervised learning.Its purpose is to group semantically similar text segments without relying on human annotations.With the development of information technology,the amount of data is constantly expanding,and the relationship between data features and data has become increasingly complex.The difficulty of text clustering tasks has also increased,and traditional clustering methods are no longer able to handle high-dimensional and complex data types.The main reason for this is that the feature representation and clustering process of the text are separated from each other,and the two cannot form positive mutual feedback,thus unable to capture the complex relationships between samples well,which to some extent limits the performance of clustering algorithms.With the rapid development of deep learning,deep clustering has shown significant advantages over traditional clustering methods.Although good results have been achieved,but most existing deep text clustering methods require the use of pre trained representations in the general domain,which may not be the most appropriate solution for clustering in specific target domains.In addition,most existing deep text clustering methods require designing specific clustering schemes based on specific tasks,which may not be universal and therefore may not be well promoted.In order to solve the above issues,this paper proposes a self-supervised learning framework for text clustering,which aims to improve the feature representation iteratively by introducing classification objectives,so as to improve the clustering performance of the clustering algorithm as a whole.In each iteration,we first use the language model to retrieve the initial text representation,and then use our proposed classification separation and comparison clustering algorithm to collect clustering results from it.Then,through strict data filtering and data aggregation process,we retrieve samples with clean classification labels,which are used as supervision information,and update the language model with classification goals through rapid learning methods.Finally,the updated language model with improved representation ability is used to enhance clustering in the next iteration.In addition,this paper also proposes a deep text clustering method based on contrastive learning,which is a component of the framework CEIL.The basic idea of CDCC is to improve feature representation through contrastive learning and promotes better separation between categories through specific category loss,so as to achieve better clustering results.A large number of experiments show that the proposed framework significantly improves the clustering performance of clustering algorithms in the iterative process,and is suitable for traditional clustering algorithms and deep clustering algorithms.In addition,by introducing the proposed depth clustering method CDCC into the proposed framework CEIL,our model achieves advanced clustering performance on a wide range of text clustering benchmarks.

Keywords/Search Tags:

Text clustering, Contrastive learning, Prompt learning, Self-supervised learning, Natural language processing

PDF Full Text Request

Related items

1	Research On Text Classification And Short Text Clustering Technology Based On Contrastive Learning
2	Improved Sentence Embedding Based On BERT And Prompt-learning
3	Research On Key Technologies Of Sentence Text Matching And Recognition
4	Modeling And Learning Of Representations For Natural Language Sentence-level Structures
5	Research And Implementation Of Submission Recommendation System Based On Improved Contrastive Learning
6	Research On Chinese Text Summarization Based On Deep Learning
7	Chinese Language Identification Based On Transfer Learning
8	Research On The Application Of Semi-supervised Learning In Natural Language Processing
9	Large-Scale Semi-Supervised Learning for Natural Language Processing
10	The Syudy Of Few-shot Slot Filling Algorithms Based On Self-supervised Learning