Font Size: a A A

On The Information Extraction Of The Sudden Events

Posted on:2006-10-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:E H YangFull Text:PDF
GTID:1118360152488973Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Corresponding with the rapid development of the Internet, we are surrounded by an immense sea of information. How to get accurate and valid information from this vast information sea is the goal that Information Extraction (IE) intends to achieve. In other words, Information Extraction involves extracting the interesting information from a mass of text, and representing it in a structured format. Its basic objectives are to increase the speed and improve the quality of information processing, and ultimately release manpower from the burden of intensive and inefficient text reading.Information Extraction, Information Retrieval and Text Summarization fall under the same text information processing research area, belonging to the domain of Natural Language Processing (NLP). Since the end of 1980's, Information Extraction has been a hot research issue in NLP. It has been driven to a remarkable degree by the construction of a text processing scheme by the U. S. and Europe. Information Extraction technology and evaluation are among the important factors in its plan. With regard to Chinese Information Extraction, research had started lately but is still in the exploration phase.The world has been experiencing an increasing number of "sudden events". A test of efficient government is how the organizations correspond to these spontaneous events. The exponential increase in the quantity of textual information held in digital archives has fuelled growing government interest in computer-assisted techniques for Information Extraction. Handling sudden outbursts is indeed a multifaceted effort, and one of the important tasks is the collecting, categorizing, processing and promulgating of event information. The major criteria for increasing and scaling the corresponding ability to handle sudden events are: collecting information in a timely, impersonal and accurate way; extracting information with great efficiency and speed; and providing the full and accurate reference data.This thesis focuses on extracting information regarding the sudden events, a.k.a. Event Information Extraction, based on analyses of various press reports. The study consists of the following tasks: analyzing the various texts concerning the event to obtain its relevant characteristics; applying the research method of Named Entity Recognition; examining the means of automatic pattern acquisition for information; and probing into the feasible models of Event Information Extraction to acquire essential information structures and specific information.Information Extraction is an organic unity of resources and techniques, customized to practical uses. This research is primarily constructed on the basis of Part-Of-Speech (POS) tagging. Since less work on extraction has been done with Chinese than with English, there are, without a doubt, wide gaps between Chinese and English in the accuracy of the extracting process, the amassment of the knowledge resources, etc. Therefore, in each processing step, we carefully analyze pros and cons created by the existing resources and the accuracy of the extracting process in order to lay a foundation for further research and to find a way to close the gaps in Event Information Extraction.This research will attempt:1. To propose a practical Information Extraction model for the Events.Starting with carefully analyzing the raw data, next employing the related information provided by the different media for the same event, and finally by observing the developmental peculiarities of the event, we will probe into a feasible Event Information Extraction model. This model is grounded on analysis of the text characteristics, applying clustering techniques to extract the event information structure automatically and then calculating properties values in order to obtain the specific information. This method, provided with better robustness, can be applied to any text aggregation of a sudden event.2. To implement a unsupervised pattern acquisition method with strong adaptability, and further identify predetermined relevant information in text...
Keywords/Search Tags:Information Extraction, sudden event, Named Entity Recognition, pattern acqusition, information frame, specific information, characteristics analyzing
PDF Full Text Request
Related items