Font Size: a A A

Research On Time Standardization In Chinese

Posted on:2011-06-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y X WenFull Text:PDF
GTID:2178360305495574Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In natural language, especially in news which always contains event description, people concern about content of events. But in an event, the time is an essential factor. Now in the network is flooding the massive news information, people discover that they must take long time trying to learn what happened, when it happen, as well as the relations between them. Therefore, the inference on temporal expression and event has evoked many studies in NLP. The processing of time information is also very important in NLP, such as in Entity Recognition and Natural Language Understanding. Moreover, time information is widely applies in the area of NLP, for instance, question-answering, summarization systems, information extractor and data mining etc.This paper focused on the time standardization of news text of Chinese. The task of this study includes locating time information appeared in news onto the timeline and expression it with the standard format. The task of time standardization will serve for further studies about the event-time mapping relations and Event's succession relations.1. Determining what kinds of expressions to standardize. Refers to the ACE Chinese Annotation Guidelines for Timex2 (Summary), defined and classified the time information of Chinese. Finally make sure the type of temporal expression for standardization.2. Recognize temporal expression。According to the characteristics of temporal expression in corpus, we summarized some patterns for time extraction:the pattern for public-date and report-date, the pattern for types in sentences. Then recognized time information needs to be standardized with patterns produced before.3. Time standardization. Make a division of many small modules, whose subset closely related in the module interior. Take a report as the unit of the module, convert temporal expressions to standard format. For instant, transforms "today" into "xxxx/xx/xx", "3 days" into "P3D" and so on.The results show that these ways are good. They have achieved a good effect for temporal recognition and time standardization.Finally, we has analyzed the wrong result in detail and proposed the solution mentality to some questions. We will make some improvements on the present research technique in the future.
Keywords/Search Tags:Time standardization, pattern Matching, Modulation
PDF Full Text Request
Related items