Font Size: a A A

Research On Hotspot Detection And Tracking In Social Medium

Posted on:2017-02-18Degree:DoctorType:Dissertation
Country:ChinaCandidate:P L LiuFull Text:PDF
GTID:1368330569998471Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The association between social medium and real-life has been becoming closer with time.Now more and more people use social medium for getting news and publishing information.In these user-generating data,there are a large number of messages related with hot events or hot topics.The information over-fitting problem is a challenge for people getting these hotspots.It has become an important problem to automatically detect and track hotspots from these large amount of social data.On the one hand,the short,noisy and real-time characteristics of social data have brought a great challenge to traditional hotspot detecting and tracking methods.On the other hand,the plentiful information about geography,time and social relation contained in social data could provide benefit for researches.According to this characteristics of social medium,this dissertation has explored the hotspots' detection,extraction and tration in social medium.It makes following contributions:(1)Analysis of spatio-temporal patterns in retweeting.Social behaviors will influence the structure and information propergation in social network.And retweeting is one of the most important social behaviors.Through analyzing the Twitter data,we have explored the factors influencing retweeting,such as message's language,location,time and its publisher's follower number,list number,and verification.Experiments on real Twitter dataset demonstrate that the location and time have important influences.Therefore we study the spatio-temporal patterns of information propergation and find that users often propagate new messages published by users not far away.However,there are some exceptions.Namely when some hot events happened,relevant messages will be spread by user far away.Base on this conclusion,we put forward a method for detecting hot events thourghing analyzing spatio-temporal patterns of information flows.(2)Method for detecting unspecified types of hot events.Existing hotspot detecting methods are mainly aimed at detection of specified events,which depends on clues such as event's type,keywords or description.In unspecified event detecting task however,such clues are missing.According to this problem,we firstly explore the patterns of social behaviors and information propergation,and monitor the information flows from the geographic viewpoint.Through comparison with newspapers and news websites,we find that hot events can cause the abnormal changes of information flows.On this ground,we put forward a method for detecting unspecified events through monitoring information flows.Experiments on real dataset demonstrate that this method is effective in events detection.In addition,we find that Retweets are more effective than ordinary Tweets in the extraction of events' contents.Specifically,the text's amount of Retweetsis less,and they contain less noisies as well.Since users tend to spread messages closely related with hot events other than those related with daily lives,retweeting behaviors actually have a kind of filtering effect.The social network in Twitter can be viewed as a filter.(3)Extracting hotspots' contents from the set of short messages.Social messages at a specific date often contain multiple hotspots meanwhile.The algorithm should divide these hotspots and extract each hotpot's detailed information.Tradition methods are mainly aimed at regularized long documents such as webpages,news or blogs.And they often extract keywords based on TF*IDF,while semantic association between words are fewly considered.Since social messages are short and noisy,it is a challenge for traditional methods to process them.According to these characteristics of social messages,we put forward a new method for extracting hotspots based on semantic clustering of word vectors.This method analyzes data on the word level,and word vectors are used for measuring the semantic association between words.Experiments show that this method is effective in topic division and keywords extracted by it have stronger semantic associations.In most metrics,it beats traditional method.To evaluate the effect of word vector itself,we replace the vector space model in traditional methods with word-vectors-base model.Experiment results demonstrate that the latter model is better,especially when the vector's size is large.(4)A new method for tracking hotspots.Hotspots traction is for tracking the hotspots' s dynamic and tendency.The key task is finding the specific hotspot that a new message should be associated to.This is formulated as a classifying problem of social messages.On this ground,keywords are extracted from new generating messages for updating existing hotspots.Short and noisy Tweets will cause the “sparse problem” of feature vectors representing messages.According to this problem,we put forward a new feature vector model based on word vectors.To promoting the performance of message classifying,deep learning algorithm is used.Experiments on real data demonstrate that DBN algorithm is more effective than traditional machine learning algorithms,especially when the dimension of feature vector is small.However,its advantage is mainly in processing of short message,while it is not as good as traditional feature vectors in processing long documents.In addition,experiments give some empirical rules of adjusting parameters of word embedding models and DBN algorithm.(5)A new incremental learning algorithm for hotspot detection and tracking.Machine learning algorithms play an important role in hotspot detecting and tracking.However,most existing machine learning algorithms are batch supervised learning algorithms,which is not so good at processing socal data.According to this problem,we put forward a new incremental learning algorithm and use it in hotpot detection and tracking.This algorithm is based on neuroscience evidences and statistical theories.Itsnetwork is formed through self-orgnization of neurons.And novel neuron model and synapse model are used.Experiment results show that this algorithm is comparable with classic clustering algorithms such as k-means,while it is faster.More important,this incremental unsupervised learning algorithm provided a unified framework for hotpot detecting and hotpot tracking.In conclusion,this thesis has expolored hotpots' detection,extraction and tracking and extracted series of novel methods correspondingly.These methods have solved the problem of hotspots' detection and tracking effectively.On this based,we put forward an adaptive statistical neural network model for detecting and tracking hotspots unifiedly.
Keywords/Search Tags:social medium, hotspot detection, hotspot extraction, hotspot tracking, text mining, word embedding, deep learning, incremental learning
PDF Full Text Request
Related items