Font Size: a A A

Event Extraction From Twitter

Posted on:2016-12-08Degree:MasterType:Thesis
Country:ChinaCandidate:L Y ChenFull Text:PDF
GTID:2308330503976712Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a representative media of the new era, Twitter has recently become a popular platform for expressing ideas, sharing information and exchanging views on the network, which has a great impact on society. Event extraction is the core problem in the field of information extraction and its main purpose is to extract information that people are interested in from unstructured text containing events and express them with structured form instead of natural language. It has dramatically practical significance and great applicant value to study the technology of event extraction in Twitter.Compared to the formal text, tweets are trivial and noisy. The expressions in tweets are informal and information in tweets is redundant. These features have led to a great challenge for the event extraction. In addition, there is a close relation between the quantity and quality of the annotated corpus and the performance of supervised method, and massive labeled data usually requires a lot of labor. So here we focus on unsupervised event extraction in Twitter. Our main contributions are summarized as follows.1. We make a research on the filtering of tweets. Tweets are noisy and tweets describing events are rare, so we try to improve the performance of event extraction by firstly filtering tweets. We proposed two methods, make experiments on an annotated dataset and compare the results.2. We make a research on event extraction in Twitter and propose an event extraction approach based on Latent Event & Category Model (LECM). LECM is an unsupervised Bayesian latent variable model. It is the extension and application of LDA in the problem of event extraction in twitter. This paper describes the system framework, the process of the extraction, description of LECM model and parameter estimation method in detail.3. Two datasets are used to evaluate the effectiveness of our approach. A small annotated dataset containing 2468 tweets and a large unannotated dataset containing 60,000,000 tweets are used to evaluate the extraction and classification performance. The results of our method on two datasets both outperform the state-of-the-art approach.4. We notice the fact that most of the tweets do not have clear expression of time information, improve the original model and propose an event extraction approach based on LECM-d. We put an additional preprocessing step and postprocessing step in the new framework, and modify the Bayesian model. We evaluate it on the big dataset containing 60,000,000 tweets. The precision is increased by 13.55% compared to the baseline and increased by 9.76% compared to the approach based on LECM.This paper consists of four chapters. The first chapter introduces the research background and significance, the motivation and the main research content. The second chapter describes the related theories and existing technologies of event extraction in Twitter. The third chapter introduces the proposed approach based on LECM and related experiment. The fourth chapter introduces the proposed approach based on LECM-d and related experiment. The fifth chapter is the summary and future outline of this work.
Keywords/Search Tags:Event Extraction in Twitter, LDA, Graphic model, Bayesian model, Unsupervised Learning
PDF Full Text Request
Related items