Research On Deep Text Clustering Method Based On Semantic Information Enhancemen

Posted on:2024-09-15

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Zheng

Full Text:PDF

GTID:2568307130973919

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of the internet,text clustering has been widely used as an important unsupervised data analysis technique.In recent years,deep learning has demonstrated its powerful feature learning ability to process high-dimensional data.How to apply the method of deep neural network to the clustering task,that is,deep clustering,is a research direction recently.Although the existing deep clustering algorithms have achieved good results on text datasets,the existing methods still face many challenges due to the data characteristics of text which is semantic sparsity and semantic ambiguity.In view of the above two problems,this paper proposes a Deep Document Clustering method via Key Semantic Information Complementation(DCKSC)and the Semantic to Structural deep document Clustering algorithm(Sq2St).Aiming at the problem of semantic sparsity faced by classic deep clustering method in text clustering,DCKSC first enhances the original text data by extracting keyword data,and designs a key semantic information completion module to improve the traditional autoencoder to make up for the key semantic information which lost in the mapping process.Secondly,by combining clustering loss and keyword semantic autoencoder reconstruction loss,the model is more suitable for clustering task.Experiments show that the clustering effect of the proposed algorithm on five real datasets is better than the current advanced clustering method.The clustering results prove the importance of key semantic information completion methods and text data augmentation methods for deep text clustering.Aiming at the semantic ambiguity problem in the process of text clustering,we propose a novel and lightweight model called the Semantic to Structural deep document Clustering algorithm(Sq2St).Specifically,we design a semantic to structural autoencoder which maps from semantic information to structural information for a more comprehensive representation learning.With this novel autoencoder,a structure-enhanced semantic representation that combines semantic information and structural information can be learned.Then we use a self-training clustering objective to iteratively improve the clustering results.By integrating the self-training and semantic to structural autoencoder’s reconstruction into a unified framework,our model can jointly optimize the cluster label assignments and embeddings suitable for clustering.Experiments on several datasets validate the effectiveness of our model.

Keywords/Search Tags:

Document clustering, Semantic mapping, Deep clustering, Semantic enhanced, Representative learning

PDF Full Text Request

Related items

1	Study And Implementation On Latent Semantic Space Analysis And Web Document Clustering Based On LDA
2	Semantic Hierarchical Clustering Based Multi-document Summarization Research
3	Research On Indoor Simultaneous Localization And Semantic Mapping
4	Research On Document Clustering Technology Based On Latent Semantic Indexing
5	Research On Document Clustering Based On Semantic Similarity Of Hownet
6	Research And Application Of Clustering Algorithm For Multi-view Text
7	Research On Semantic Similarity Computation And Applications
8	Incorporating semantic and syntactic information into document representation for document clustering
9	The Research Of Index Techonology Based On Semantic Web Document
10	The Reserarch Of Trust Algorithm Base On Ontology In Semantic Web