Font Size: a A A

Hierarchical Topic Modeling For Social Media Data

Posted on:2019-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y NiFull Text:PDF
GTID:2428330548476321Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Nowdays,the internet-based social media service has become one of the most widely used internet information technology services.As one of the major social media service platforms in the world,Twitter generates a large number of user-generated original content,retweets and comments every day.These social media data reveal the public concerns and interests,which are of great significance for the study of users' group interest,emotional inclination,aiming at the accurate personalized recommendation.In recently years,many scholars have devoted to the study of the model which can recognize and mine the topics of social media data.However,these models can only recognize and mining single layer topics rather than hierarchical topic tree.On the other hand,OLAP,or Online Analytical Processing,has been proven be very effective for analyzing multidimensional structured data.It can make relevant analysts to analyze the data from multi-dimensional consistently,quickly and interactively.OLAP technology can be applied to text data such as tweet,called text OLAP.One of the key techniques for realizing text OLAP is to mine and construct the dimension hierarchy based on the unstructured text content.Nevertheless,in contrast to the plain texts which text OLAP usually handles,social media data contains not only a large number of social short texts,but also abundant social relationship information.How to mine and utilize the social relationship information in social media data to achieve effective dimension hierarchy mining and construction is one of the challenges to apply traditional text OLAP technology to social data analysis.Taking Twitter data as an example,we propose the approaches of preprocessing social short text data and modeling hierarchical topics for social data,which supports OLAP on Twitter.In the process of preprocessing social short text data,we first propose a word weighted algorithm based on short text clustering analysis,and weight each word of all tweets.Then,we propose a word-scoring algorithm based on LDA and weighted word graph model,then score each words of all tweets.Finally,we define tweets' hotness,and calculate the score of each tweet combined with the tweet's hotness and score of each word in tweet.In this way,we eliminate the tweets with low score and obtain the tweets with high value.In the process of hierarchical topic modeling for social data,we propose an algorithm called thLDA,which can automatically mine and construct the dimension hierarchy of tweets' topics,which can be further employed in text OLAP analysis.It integrates the abundant social relationship information of Twitter data in the formalized modeling process.We conduct extensive experiments on huge quantities of real Twitter data and evaluate the effectiveness of thLDA.The experimental results demonstrate that thLDA outperforms other current topic models in mining and constructing the dimension hierarchy of tweets' topics.
Keywords/Search Tags:Twitter, Online Analytical Processing, Social Short Text, Dimension Hierarchy of Tweets' Topics, thLDA
PDF Full Text Request
Related items