Font Size: a A A

Civil Transportation Event Extraction Extraction From Chinese Microblogs Based On CRF

Posted on:2015-12-04Degree:MasterType:Thesis
Country:ChinaCandidate:J Q XiongFull Text:PDF
GTID:2298330452464137Subject:Computer Science
Abstract/Summary:PDF Full Text Request
In the domain of Natural Language Processing (NLP), EventExtraction always plays an important role. The key point of eventextraction is how to extract interested event information from massive,unordered and noisy information source.In this project, we choose text from “Sina Weibo” to do eventextraction work. Microblogging is a broadcast medium platform thatallows users to exchange small pieces of information element in the formof blogging. People produce hundreds of millions of microblogs everyday.With its140-character message, Microblog has yielded an enormous corpusof information, which is noisy but informative in some way. CivilTransportation related information, such as accident, traffic jam, roadconstruction, is one topic often mentioned in microblog. This kind ofinformation is time-sensitive. If they can be collected timely, after noiseelimination and event element extraction, we will obtain an real-timetraffic circumstance information source.However, previous work with standard NLP tools of event extractionperforms poorly on Chinese Microblog. In this thesis, I will describe thesystem we’ve constructed to extract events factors from Chinese microblog,including grabbing microblog, noisy elimination, topic selection, testsegmentation, named entity recognition (NER), event extraction anddemonstration. In particular, we grab the chatter from Sina Weibo to extractcivil transportation-related information. We mainly adopt ConditionalRandom Fields (CRF) probabilistic model to fulfill the task, and we alsouse Regular Expression to ameliorate our result. According to the experiment result, our method to eliminate noisefrom microblog text can effectively improve the precision and recall rate ofevent extraction. Eventually, our system works well in demonstrating thereal-time traffic-related information, and achieves a precision rate of83%.
Keywords/Search Tags:Event Extraction, Microblog, NLP, CRF, NER
PDF Full Text Request
Related items