Font Size: a A A

Event Extraction From Twitter

Posted on:2018-12-11Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhangFull Text:PDF
GTID:2348330542953045Subject:Artificial intelligence and its applications
Abstract/Summary:PDF Full Text Request
Event Extraction is one important task in information extraction field,which concentrates on extracting useful information that users are interested in.Twitter is a kind of Micro Blog,a social network platform where brief broadcast of real-time information are shared by focusing mechanism.A large number of texts are shared on this platform,which contains event information that users are interested in.Compared with official press texts,Twitter text is massive,informative and timely,which benefits on extracting abundant and precise event information.However,Twitter text is also casual,short and redundant.Traditional unsupervised approaches to extracting events' information from Twitter need to preset a hyper-parameter,the number of events in corpus,which affects the quality of events'information.Besides,traditional approaches haven't taken entities' relation into consideration.When extracting events information,they depended on the format of words.Main contributions in this thesis:(1)To overcome the problem of presetting hyper-parameter,a framework based on Dirichlet Process Event Mixture Model(DPEMM)was proposed,which contained DPEMM and post--processing based on frequency.The performance of this framework on three datasets were 6.1%,7.7%,6.00%higher than baseline.With analysis on the experimental results,we concluded the reasons why this framework was better.than traditional approaches.(2)To capture the relations between entities,a new framework called Dirichlet Process Event Mixture Model with Word Embedding(DPEMM-WE)was proposed,which contained DPEMM-WE and post-processing based on co-occurrence.The performance of this framework on two datasets were 1.5%,3.5%higher than DPEMM-based.framework.With analysis on clusters'structure,we found that clusters produced by DPEMM-WE contained more information and post-processing based on co-occurrence can filter noisy information in clusters.There are five chapters in this thesis.Chapter 1 introduced the research states of event extraction from Twitter and related work.Chapter 2 introduced related technology.Chapter 3 introduced the event extraction framework based on DPEMM and related experiments.Chapter 4 introduced the event extraction framework based on DPEMM-WE and related experiments.Chapter 5 concluded this thesis and introduced future work of event extraction from Twitter.
Keywords/Search Tags:Dirichlet Process, Event information extraction, Mixture model, Word embedding
PDF Full Text Request
Related items