Event Extraction From Twitter

Posted on:2016-12-08

Degree:Master

Type:Thesis

Country:China

Candidate:L Y Chen

Full Text:PDF

GTID:2308330503976712

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

As a representative media of the new era, Twitter has recently become a popular platform for expressing ideas, sharing information and exchanging views on the network, which has a great impact on society. Event extraction is the core problem in the field of information extraction and its main purpose is to extract information that people are interested in from unstructured text containing events and express them with structured form instead of natural language. It has dramatically practical significance and great applicant value to study the technology of event extraction in Twitter.Compared to the formal text, tweets are trivial and noisy. The expressions in tweets are informal and information in tweets is redundant. These features have led to a great challenge for the event extraction. In addition, there is a close relation between the quantity and quality of the annotated corpus and the performance of supervised method, and massive labeled data usually requires a lot of labor. So here we focus on unsupervised event extraction in Twitter. Our main contributions are summarized as follows.1. We make a research on the filtering of tweets. Tweets are noisy and tweets describing events are rare, so we try to improve the performance of event extraction by firstly filtering tweets. We proposed two methods, make experiments on an annotated dataset and compare the results.2. We make a research on event extraction in Twitter and propose an event extraction approach based on Latent Event & Category Model (LECM). LECM is an unsupervised Bayesian latent variable model. It is the extension and application of LDA in the problem of event extraction in twitter. This paper describes the system framework, the process of the extraction, description of LECM model and parameter estimation method in detail.3. Two datasets are used to evaluate the effectiveness of our approach. A small annotated dataset containing 2468 tweets and a large unannotated dataset containing 60,000,000 tweets are used to evaluate the extraction and classification performance. The results of our method on two datasets both outperform the state-of-the-art approach.4. We notice the fact that most of the tweets do not have clear expression of time information, improve the original model and propose an event extraction approach based on LECM-d. We put an additional preprocessing step and postprocessing step in the new framework, and modify the Bayesian model. We evaluate it on the big dataset containing 60,000,000 tweets. The precision is increased by 13.55% compared to the baseline and increased by 9.76% compared to the approach based on LECM.This paper consists of four chapters. The first chapter introduces the research background and significance, the motivation and the main research content. The second chapter describes the related theories and existing technologies of event extraction in Twitter. The third chapter introduces the proposed approach based on LECM and related experiment. The fourth chapter introduces the proposed approach based on LECM-d and related experiment. The fifth chapter is the summary and future outline of this work.

Keywords/Search Tags:

Event Extraction in Twitter, LDA, Graphic model, Bayesian model, Unsupervised Learning

PDF Full Text Request

Related items

1	Jointly Event Extraction And Visualization On Twitter
2	Research On Joint Detection And Extraction Techniques For Social Events On Twitter
3	Event Detection And Extraction For Financial News Texts
4	Event Extraction From Twitter
5	Investigation on Bayesian Ying-Yang learning for model selection in unsupervised learning
6	Research On Document-level Event Extraction Methods In Chinese Financial Domain
7	Research On Bayesian Learning Theory And Its Application
8	Research On Addressing Data Sparseness In English Event Extraction
9	Research On Key Technology Of Open Domain Event Extraction
10	Research On Few-shot Event Extraction Methods