Font Size: a A A

Research And Application Of PCA-PSO-FCM In Short Text Clusterting

Posted on:2021-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:C ChenFull Text:PDF
GTID:2428330602991429Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of social networks,the number of various types of information explosively is exponentially increasing.People's life,social,entertainment,reading are more dependent on social networks.To meet the growing demand for fast access to information media such as Weibo,Douban,Zhihu,Today's Headlines,etc.were born in the Internet.These media push short text messages to users,Users can learn about social issues,hot news,important events and other life-related information in a short time.With the continuous accumulation of information in these short texts,a lot of valuable information contained in short texts has a great impact on people's daily life,work,and study.And it has great research significance for many aspects such as economy,culture,politics,etc.It has a leading role and application value in the fields of public opinion monitoring,advertising,sentiment analysis,text classification,etc.Compared with long text,short text has shorter text length and fewer words.In order not to reduce the overall amount of information,a single word in a short text has a high amount of information and high generality.Traditional word vector feature space model convert short text to sparse spatial word feature matrix.When processing large amounts of data,it will face the problem of high spatial complexity,low resistance to noise and low robustness.Use word2 vec combined with text convolutional neural network to compress text information,which preserves the key features of text data to a great extent.Aiming at the space complexity of short texts,the optimization of reducing the sparseness of word vectors is made above.Short text has polysemy and multi-class features,Traditional clustering algorithms can only divide short text data into one category,thereby losing effective information of multiple types of border text data,and cannot completely reflect the true information of the text.And the accuracy of the clustering results is not high,and the clustering center is offset.In order to solve the problem of ambiguity and multi-classity in short text clustering,this paper proposes a PCA-PSO-FCM short text clustering algorithm with the support of Text-CNN.Using word2 vec model to train corpus and train word vectors and using Text-CNN's one-dimensional convolutional layer to learn features and map word vectors from high-dimensional to lowdimensional.Then using PCA to calculate the principal component contribution rate of each dimension to limit the movement of text particles in each dimension.Finally,the PCA-PSO-FCM algorithm was used to experiment with short texts to verify the effectiveness of the algorithm and compare the overall performance of the algorithm.The results show that the algorithm is significantly improved in short text clustering than traditional clustering algorithms.
Keywords/Search Tags:Principal-component-analysis, word2vec, Text-CNN, PCA-PSO-FCM, Short text clustering
PDF Full Text Request
Related items