Font Size: a A A

Research On Text Clustering Algorithm Based On Deep Learning Feature Extraction

Posted on:2022-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:D L LiuFull Text:PDF
GTID:2518306554470874Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of information technology,the benefits of network globalization have become more and more abundant.A large number of English texts carry a lot of important information and appear in a wide variety of resources.These massive text information data have the value of in-depth mining.Social development has very important significance.In order to enable information to be captured more accurately,text mining technology came into being.Among them,clustering algorithms have good application prospects in filtering information and integrating text.By reasonably and appropriately clustering these network information data,we can better grasp the current development trends of the world and grasp the latest development trends.However,English text has the characteristics of high dimensionality,sparse features,and many synonyms.When using existing text feature extraction models and text clustering algorithms for research and analysis,it is usually impossible to learn the relevance between text words and contextual semantics.The continuity of the text clustering results in lower accuracy and efficiency of text clustering.In order to solve the problem of text feature sparseness and improve the effect of text clustering,this paper has improved and improved the feature extraction model and clustering algorithm of text.The main work is as follows:1.Propose a text clustering algorithm based on the Word2 vec model to improve the traditional convolutional neural network.The algorithm uses the English news data set in the network as experimental data,uses the Word2 vec model to learn the implicit semantic connection between text words,and converts the text into the expression form of the original word vector.Then by increasing the hole convolution layer and adjusting the number of convolution kernels to improve the convolutional neural network method,the original word vector is extracted,which effectively increases the receptive field in the feature extraction process,and obtains a more representative and low-dimensional Text feature vector.Finally,the optimized K-means clustering algorithm is used to realize the clustering analysis of text features.The effectiveness and superiority of the algorithm are demonstrated through experiments.2.Propose a LSTM-CNN text feature extraction algorithm combined with ECA attention mechanism to extract text feature vectors.The algorithm first uses gating measures in the LSTM network to control the output of the neural network,retains the information features that need to be memorized for a long time,and discards the information features that do not need to be transmitted.Then,by adding the ECA attention mechanism,key text features are given higher influence weights to highlight their expressive effects.The onedimensional sparse matrix used replaces the fully connected model in the traditional attention mechanism,which greatly reduces the relevant calculation parameters in the network,and at the same time avoids the unnecessary features caused by the dimensionality reduction in the traditional attention mechanism.prediction.Finally,after secondary feature extraction through convolutional neural network,K-means clustering algorithm combined with RWMD similarity distance is used to cluster the proposed algorithm.After experimental demonstration,the clustering effect has been effectively improved.
Keywords/Search Tags:text clustering, Word2vec model, Convolutional neural network, LSTM network, Attention mechanism, K-means, Deep learning
PDF Full Text Request
Related items