Font Size: a A A

Social Media Short Text Clustering And Its Applications

Posted on:2019-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:L LiuFull Text:PDF
GTID:2428330545952601Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the advancement of network technology,on one hand,people's cognitive and habitual behaviors have changed quietly,from the early reading books or newspapers to the widespread use of social media for communication.The social media platform has become an important channel for people to disseminate information and maintain relationships.After entering the era of web2.0,social media platforms have developed rapidly.Chat,shopping and video platforms have been integrated into every aspect of people's daily life.On the other hand,with the explosive growth of online data,texts are important carriers of social networking.Facing massive texts and fast-paced life,people in the fragmentation time more favor to browse short and fine texts,that are,short texts.On the social media platform,a large amount of short text data such as Weibo,Tweet,video or news titles,Taobao comments,and QA have also emerged one after another.How to organize and manage these data has resulted in short text clustering methods.Short text clustering can be applied in topic discovery,personality recommendation,video classification and information retrieval.In recent years,the related work of short text clustering has made great progress.Researchers have tried to use many methods to improve semantic analysis capability for processing short texts.However,short texts are different from ordinary long texts,which usually have less information for their short length,10 words on average,even fewer.This causes high dimensionality and sparseness issues when we use traditional text representation methods such as BOW model.In addition,except for lacking rich contexts,the usage of words in social media is arbitrary.It is a great challenge to understand the semantics for short texts,and leads to the difficulty for designing an effective short text clustering algorithm.For alleviating the sparseness problem and improving the understanding of short text semantic,we have the following contributions:(1)For enriching the information of short texts,we have proposed a video clustering method by fusing multiple text resources related to videos,including video titles,related query terms and co-click video titles.Taking the real data of Youku video website as an example,the experimental results of different text clustering algorithms prove the effectiveness of the multi-source text data fusion method.(2)For reducing the dimensionality of short texts and fully using their contexts information,we have proposed a short text clustering method(NESTC)based on network embedding.This method first uses network embedding method to learn the semantic relationship between vocabularies from term correlation network.To overcome the"lexical gap" problem,the words are expressed as low-dimensional,dense,continuous real-value vectors,which neatly avoids the large-scale corpus-dependent phenomenon in traditional word embedding methods.After that,learn the distance between short texts based on the vocabulary representation,and use the distance-based clustering method for cluster analysis.The results on multiple social media short text data show that the NESTC method can effectively improve the accuracy of short text clustering.
Keywords/Search Tags:short text clustering, network representation learning, term correlation network, word embedding, short text distance, short text similarity
PDF Full Text Request
Related items