Font Size: a A A

Research On Joint Detection And Extraction Techniques For Social Events On Twitter

Posted on:2020-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:M XuFull Text:PDF
GTID:2518306548995459Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development and popularization of Twitter,it is possible to discover meaningful social events through real-time perception of Twitter,which will be of great significance to the security and stability of society.This thesis focuses on social event extraction techniques from Twitter,constructs a Twitter annotation dataset for event extraction,proposes a joint event detection and extraction technology based on deep learning,and applies the event extraction technique to solve practical problems.The main research contents and contributions are as follows:1)Construction of Twitter Annotation Data Set.As there is no available annotation dataset for Twitter event extraction currently,this thesis constructs it for experiments and training models.We use the official Twitter API to collect tweets related to social events based on keywords,use simhash algorithm to remove duplicate tweets,syntactic grammatical structure to remove obvious non-event related tweets,and then the remaining tweets are manually labelled according to the annotation criterion.Finally,we randomly select 3000 social event related tweets and 3000 irrelevant tweets from the annotation data to form a balanced annotation data set.2)A joint event detection and extraction model based on Bidirectional Long-Short Term Memory Networks(Bi-LSTM)and Conditional Random Field(CRF)is proposed.In this thesis,event detection and element extraction are modeled as a sequence classification task and a labelling task,respectively.The semantic information extracted by Bi LSTM is used as the common characteristics of the tasks.A joint loss function is defined for joint training,and a gating mechanism is added between event detection and element extraction to introduce the results of event detection for extracting elements.CRF is further used to jointly model the sequence labeling results to improve the overall results.Experiments show that the joint model proposed in this thesis can effectively improve the accuracy of the results.3)Application research of event extraction technology.This thesis defines a general framework of event extraction application systems,including data acquisition,data preprocessing,event extraction,event fusion,and event analysis.Fusion analysis of event extraction results can be used to solve different practical problems.This thesis focuses on two applications of Twitter event extraction: specific entity tracking and planned event forecasting.The former analyzes related events of a specific entity,extracts event elements including time and location from related event tweets,and correlates time to obtain the entity's activity trajectory to achieve the purpose of entity tracking.The latter compares the extracted time with the tweet posting time to determine whether it is a planned event,and further applies machine learning methods to estimate the probability of event occurrence,thereby achieving the purpose of forecasting.In summary,this thesis studies the social event extraction techniques for Twitter,proposes a joint method of event detection and extraction based on Bi LSTM-CRF,and its application significance is explored,which has theoretical significance for the development of event extraction techniques and practical significance.
Keywords/Search Tags:Twitter, Event Extraction, Social event, Joint model, Long Short-Term Memory(LSTM), Conditional Random Field(CRF)
PDF Full Text Request
Related items