Font Size: a A A

Research On Short Text Clustering Of Social Networks Based On Word2vec

Posted on:2021-03-14Degree:MasterType:Thesis
Country:ChinaCandidate:X J YangFull Text:PDF
GTID:2518306452464444Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
The development of the Internet is accompanied by the emergence of a large number of social software,and the way people communicate has changed dramatically.Social media platforms have become an important channel for information dissemination and people to communication.Text has also gradually become an effective medium for communication on the Internet.The fast-paced life is full of too much short text information.The fast-paced life is full of too much short text information.How to mine hidden information from massive short texts is a very challenging task in the field of natural language processing.Text clustering,as a basic and important method in this field,has always attracted the attention of researchers.In recent years,research on deep learning has gradually risen,providing new ideas for this clustering task.Based on previous theoretical research,this paper builds a short text clustering model with the help of Word2 Vec deep learning framework,and continuously improves the short text clustering effect.Firstly,for the problem of short text sparseness,this paper proposes to use the topic similarity of text to expand the content of short text.During the text pre-processing stage,the emoticons in the text were textualized,which enhanced the semantics of the text and made the text clustering features more obvious;Secondly,a theme model At-BTM is designed which integrates the attention mechanism.This model is used for preliminary screening and classification of short text data.With the addition of the attention mechanism,the ability to extract sentence information and associate related text is improved;Finally,based on At-BTM,this paper applies the WMD distance to the similarity algorithm model,and uses this distance as the basis for clustering,and proposes the At K-BTM model.In order to verify the effectiveness of the text-improved short text method and model,a comparative experiment is set up,which is to apply different methods and models to the four types of network short text data sets.These include the comparison between the topic-expanded data set and the topic-not-expanded data set,the comparison of emoticon clustering and the general clustering effect,and the comparison of ordinary clustering algorithms with the model proposed in this paper.Experimental results show that the proposed clustering model is feasible.
Keywords/Search Tags:Word2Vec, topic extraction, short text, clustering
PDF Full Text Request
Related items