Font Size: a A A

Research On Microblog’s Event Extraction

Posted on:2016-03-23Degree:MasterType:Thesis
Country:ChinaCandidate:Y LvFull Text:PDF
GTID:2308330473464459Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the popularity of Internet technology, the micro-blog is a social platform which publishes and shares information immediately. Also, its status and role is rising. There are massive contents generated by micro-blog platforms every day, including event information and the noise. The micro-blog platform exhibits “information overload” due to massive micro-blog contents. Therefore, it has a great significance for us to know how to extract the valuable events from the massive micro-blog contents and present them in a structured fashion, allowing users to capture events more intuitively.First,this thesis discusses the acquisition of data from micro-blog platform by both of the available APIs and crawlers. Then, the quality is identified and the noise is eliminated by the analysis of the characteristics of the text micro-blog. After the word segmentation and tagging of part-of-speeches, the disambiguation of named entities is performed based on the statistics and rules.Second, this thesis proposes to build a trigger word library extracted from the training corpus,and each word indicates event types and sub types of events. Then,we further expand and disambiguate the library. Now each trigger word can be regarded as the event trigger and the text containing the event trigger is considered as the alternative event. For these alternative events, we use the support vector machine to identify the event types. Then, the time and other event elements are extracted according the types.Experiments show that our methods can effectively extract the events from the micro-blog, which faciliates the knowledge inference, automatic extraction of abstracts and other natural language processing.
Keywords/Search Tags:Chinese micro-blog, event extraction, Named Entity Recognition, trigger words Recognition
PDF Full Text Request
Related items