Font Size: a A A

Research On The Twitter Event Detection Method Based On Users' Behaviors

Posted on:2019-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:H L ZhaoFull Text:PDF
GTID:2348330569487726Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Entering the era of Web 2.0,various social media have prospered and become another world where people are active.Twitter is one of the most active social networking platforms and a hot platform for various data mining efforts.It provides a wealth of data.Benefiting from the real-time data acquisition API provided by Twitter,users can easily access real-time data on Twitter.Twitter based event detection is a method of extracting valuable events from these real-time data and can be used for public opinion monitoring,real-time warning,disaster reduction,real-time intelligent decision making,and news media sources to increase news release speed and reduce costs.There are many research methods on Twitter's event detection,the most important one is the method based on text clustering.In the past,event detection methods based on text clustering only considered the problem of poor short text aggregation,and improved the problem by introducing additional information such as time stamps and hashtags.However,due to much noise of tweets,these methods can result in mixing more noise tweets,thus affecting event detection.Focusing on the streaming tweet data,this paper proposes a new event detection method aiming at the characteristics of tweets' much noise.The main research contents and innovations are as follows:(1)An event detection method for noise environment is proposed corresponding the characteristics of much noise in tweets stream.The method first aggregates tweets into event clusters through incremental clustering,and then removes redundant clusters that describe the same events.Batch-Pass incremental clustering was proposed for the problems of Single-Pass incremental clustering.It introduces pre-clustering before Single-Pass incremental clustering.Pre-clustering adopts batch clustering methods such as hierarchical clustering,which can effectively improve the sequential problem of Single-Pass incremental clustering and improve the problem of poor short text aggregation to some extent.Aiming at the problem of event cluster duplication caused by poor social short text aggregation,a semantic SimHash-based event deduplication method is proposed.This method ensures the deduplication of events and can be applied to the processing of large-scale real-time data.(2)An event determination method based on users' behaviors is proposed corresponding the problem that the event detection result cannot completely correspond to the real world event.Through research and analysis on the statistical characteristics of Twitter users' behaviors,it is found that different behaviors of users have different effects in the process of social network dissemination.By extracting the statistical behavior features and burst features of the cluster of candidate events,supervised machine learning methods are used to implement event determination.The method extracts the user's statistical behavior features from the tweet text and metadata of the candidate event clusters,and combines the burst characteristics of the Kleinberg state sequence of the tweet to train the classifier and classify the candidate event clusters.The experiment results show that Batch-Pass performs better than Single-Pass.Event detection can detect all predefined events in data collection in the presence of noise interference,at the same time,has fewer abundant events than Single-Pass based event detection method.The event determination method based on user behavior is more accurate than Word2 Vec semantic-based,which is 6.88% higher.
Keywords/Search Tags:Twitter, event detection, incremental clustering, user behavoir
PDF Full Text Request
Related items