Font Size: a A A

Open Domain Event Extraction From Microblogs

Posted on:2016-09-07Degree:MasterType:Thesis
Country:ChinaCandidate:X X ChenFull Text:PDF
GTID:2308330503950622Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet and communication industry, social networking sites like micro-blog and wechat featuring short text have become the important resources of real-time information. For example, micro-blog is a platform based on the relation between users to obtain and deliver messages. Users can post anything at anytime to realize real-time sharing. In recent years, micro-blog has taken the leading role in the events reporting and spreading. So the research on the event extraction from micro-blogs has been a focus.Early studies on event extraction are mainly focused on extracting hotspots from news. With normative words and standardized grammar, event extractions achieve high precision from news. Compared with news, micro-blogs have informal style and colloquial language. Users frequently mention mundane events in their daily lives which are only of interest to their immediate social network. As a result, event extracting methods used for ordinary text aren’t effective for micro-blogs.Based on further research, we completed a system which is mentioned open domain event extraction and categorization for micro-blogs. In order to identify events, we recognized named entity and extracted event-referring phrases using sequence labeling method. Then the unsupervised clustering method is adopted to classify the events. At last, our system displayed significant events on calendar after measuring the strength of association between entity and date. Conditional Random Fields is suitable for sequence labeling tasks. Not only does it make full use of the context information, but also it modeled for the joint probability of entire sequence. So we use Conditional Random Fields to complete the task of event extraction. We applied LDA model to solve the issue of diversified categories in the open domain micro-blogs. In this paper, we built a corpus of micro-blogs and the experimental results show that the methodology adopted can achieve a good result of event extraction. In addition, the micro-calendar system is of great practical value.
Keywords/Search Tags:Event Extraction, Named Entity Recognition, Conditional Random Fields, Text Categorization, Latent Dirichlet Allocation(LDA)
PDF Full Text Request
Related items