Font Size: a A A

Research On Specific Event Detection In Twitter

Posted on:2018-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:X J LuFull Text:PDF
GTID:2348330512488081Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,detection and tracking of some sensitive topics and events has gradually become the big things that a lot of government agencies around the world and corporate groups committed to achieve,awareness the occurrence of events contribute to help the decision-making and take appropriate measures,to avoid major losses,and may even benefit from it.The rapid development of the Internet makes a lot of Internet-based social networking platform came into being,Twitter is one of the world’s largest platform,with hundreds of millions of users,every day produce massive tweet data,many events implied,so the analysis and research on Twitter tweet data has extraordinary significance for the detection and tracking of events.Event detection is divided into unspecified event detection and specified event detection.The thesis will research on specified event detection in Twitter,where specific events refer to a class of events with priori information.Most of the traditional specific event detection typically use the threshold to determine whether an event has occurred,and the detection accuracy and recall rate cannot be achieved high at the same time.And the majority of the representation of the text only use a single feature,making the effect of text classification is not very good.In view of the above problems,thesis puts forward a specified event detection method on the basis of the previous research,the main work is summarized as follows:(1)Thesis proposes a method of text filtering based on text vector combination.For a specified event of tweets theme filtering,this method adopts the combination of the text vector of bag of words model based on information gain feature extraction(IG text vector)and the text vector based on the superposition of equal probability of word2 vec word vector(word2vec text vector)to characterize the short text of the tweet,and achieved a good classification effect in the experiment.Considering that the vector dimension of the bag of words model is generally large,it may cause dimension disaster.In thesis,we use PCA dimensionality reduction algorithm to reduce the dimension of IG text vector.(2)Thesis proposes a specified event detection method based on wavelet transform.Based on the wavelet transform,thesis extracts the characteristics of the waveform signal of the specified event timing sequence diagram,and uses the idea of classification to detect the event.Firstly,the specified event-related tweet data obtained by classification and filtering is used to obtain the waveform signal of the time series.And then we obtain a series of small waveform signals according to the set of the waveform window,and use wavelet transform to extract features.Finally we use the trained waveform classifier to classify these signals,so as to effectively detect the events.In thesis,through the above work to achieve the detection of specific events,and finally use the crawled data from Twitter to experimental test and verify the effectiveness and authenticity of this method.
Keywords/Search Tags:Twitter, classification, information gain, word2vec, PCA, wavelet transform
PDF Full Text Request
Related items