Font Size: a A A

Emerging Topic Detection From Social Media

Posted on:2018-08-25Degree:MasterType:Thesis
Country:ChinaCandidate:H P HuangFull Text:PDF
GTID:2428330542476901Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years,social networks have been one of the most important methods for information communication.Enterprise,government and individuals are always keen on topic detection gaining from social network data stream as quickly as possible,and then analyzing its evolution in real time in order to deal with it more effectively.In this context,the task of emerging topic detection from social media has gained lots of attention.The characteristics of online social media data are mass,rich and uneven in quality,which gives us more challenges and opportunities.At present,it has obtained the remarkable achievements for the research of detecting emerging topic from social network.However,there are still some shortcomings in the research methods,such as describing topic trend,detecting models and data processing rate.Based on the above problems,this thesis will analyze this topic from following three parts:First,the growth rate and the amount of change are often calculated separately when the existing models track the trend of the topic,and it underutilizes the rich features of social media.To address this problem,we propose a method of emerging topic detection based on incremental clustering and momentum model(CMM).At a certain point in time,we first use the Z-score to calculate the relative degree of the change scale of documents for each topic,and then use normal distribution map the degree into probability which is regarded as the quality of topic.Meanwhile,the growth rate of different aspects from topic will be considered as velocity.Then we use momentum formula to characterize the topic trend.Finally,a classification algorithm is used to detect the emerging topic.Experimental results show that our method can improve the F1 value by nearly 6%at least compared with the benchmark algorithm.Second,it is necessary to label data in advance for the classification of emerging topics based on the CMM,which costs a lot of time and affects experimental results by the quality of labeling directly.Thus,we combine unsupervised outlier detection method and CMM to discover emerging topic.We try to separately utilize normal distribution method and DBSCAN cluster method to detect outlier in order to get emerging topic.Meanwhile,we investigate the influence of various emerging features combination on F1 and the radius parameter selection of CMM + DBSCAN model.The experimental results show that F1 of CMM combined with anomaly detection is better than baseline approaches,and CMM + DBSCAN model is close to supervised model CMM + EN which is proposed previously with best F1 value in the middle moment.Third,with the increasing of data scale,the computational performance of cluster momentum model suffered from serious degradation,we take spark to implement the parallel computing process of CMM,which could ease the pressure of single machine and reduce data processing time.The experimental results show that the cluster momentum model based on Spark has an approximate linear acceleration ratio,which indicates that the method is scalable and has practical application value.
Keywords/Search Tags:Incremental Clustering, Momentum, Emerging Topic Detection, Anomaly Detection, Spark
PDF Full Text Request
Related items