Font Size: a A A

Research On Keyissues On Topic Detection And Topic Diffusionin Social Media

Posted on:2014-07-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:1228330467964335Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The widespread popularity of social media has literally given birth to the massive amounnt of User Generated Content (UGC). On one hand, the organization and management of the ever-increasing unstructed and heterogeneous UCG data has become a serious problem, on the other hand, mining the valuable and hidden knowledge and its evolutive character from the massive UGC data has an important value of society and economy.UGC is topic-centric, no matter personal life record or public event report, both centred on certain topic. It has a remarkabley practical significance to detect hidden topics from the massive amount of UGC data. First, topic-based information organization could improve information acquisition efficiency. Second, social media has involved as important consensus information field, so automatic detection of emerging topics could benefit data spicilegium of public opintion related information. Third, based on the precise result of topic detection topic propagation trend prediction could provide decision-making foundation for public mood easing policy. Therefore, this paper is on the ground of social media conducting a preliminary exploration of "topic" related issues from contend-based and behavior-based levels. The main research contents concern the follows aspects:topic-based UGC information organization, emerging topic detection and topic propagation trend prediction. Our original studying achievements are introduced as follow:1. A conversation detection algorithm for interactive short text flow is proposed. First, the short text flow is clustered with a hierarchical clustering method. It is impossible to define a universal clustering parameter due to the different communication customs and conversation topics, in this paper we introduce a clustering quality evaluation strategy based on inner-cluster compactness and inter-cluster separation to automatically determine the optimal candidate conversation combination without any prior knowledge. Ulteriorly, we proposed a LDA based short text similarity algorithm to measure the topic relevancy between different conversations. Finally, the candidate conversations combination is optimized by integrating both temporal relevancy and topic relevancy. The experiments on a real dataset of a SMS collection verify that the method proposed in this paper outperformed other algorithms in both recall and precision ratio. 2. We proposed an event-based personal photo recommendation algorithm. Traditional photo recommendation systems mainly adopt image content similarity comparing or key-word matching methods. While event-based recommendation could meet the higher demand of information acquisition. Temporal, spatial contexts and textual features are exploited for ordering the recommendation results in our framework. Thus, the event relevancy is decomposed into three dimensions:temporal, spatial and semantic. First, we cluster photo gallery into collections according to temporal with hierarchical clustering and clustering quality evaluation to determine the optimal clustering number. Second, we optimized the initial clustering center by introducing a point density detection algorithm, which could avoid the local optimal in spatial clustering. Third, we proposed a WordNet based short text similarity algorithm to measure the semantic relevancy. Then, in order to avoid the information deviation, we exlpoit a multi-criteria ranking algorithm i.e., Preference Ranking Organization Method of Enrichment Evaluation (PROMETHEE) to sort recommendations. A new method of determing the weights of the decision-makers of temporal, spatial and semantic dimensions is proposed based on the idea of maximizing deviations. Some satisfying results have been made by applying this approach on a real dataset.3. An emerging topic detection model for microblogging is proposed. We first propose a Promulgating Value Evaluation (PVE) model to filter noisy data which just talk about bubble trivials from huge amounts of user generated Weibo posts, and then extract the emerging terms from high promulgating value microblogging strem with an improved aging theory model.Based on the graph which leverages the mutual information between emerging terms, a node similarity based graph partitioning algorithm is then adopted to detect the emerging topics. Empirical results on a real streaming dataset which collects from the popular Chinese microblogging site Sina Weibo shows the effectiveness of our proposed approach.4. It is difficult to predict the topic propagation trendency with a precise mathematical model, since the propagation process is influenced by various factors. This paper establishes the trendency of topic-related post number as research target, and to our knowledge this is the first time to introduce chaos theory into social media analysis. We first compute the Lyapunov index of the time series of topic related posts with Wolf method to verify the chaotic dynamic characteristic in topic propagation. And then, we reconstructed the phase space of the time series of topic related post and obtained the chaos attractor. After that, based on the max Lyapunov index we successfully predict the topic related post number. Experiment results verify the feasibility of topic propagation prediction in social media with chaos theory. The preliminary study explored a potential solution for intensive research the topic propagation law in complex social network enviroment.
Keywords/Search Tags:Social Media, User Generated Content, Topic Detection, Topic Propagation, Trend Prediction
PDF Full Text Request
Related items