Font Size: a A A

Research On Value Extraction From Chinese Text

Posted on:2011-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:F WuFull Text:PDF
GTID:2178360305995573Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
A value is additional information in the text of Chinese events or entities; it is similar to the entities'features.Two types of the value respectively are: One is to describe the physical characteristics of values, Such as percent, money,phone-numbers and URL, the other is to describe the characteristics of the value for event, such as crime,sentence and job-title, the all will be extracted when the corresponding event occur. The extraction of value is another important research in Chinese information extraction,and is a basic research too.It has important significance in many areas of natural language processing, such as machine translation, question answering, information retrieval and other research.Now we focus on the events,and named entity extraction in domestic, the study of value extraction is seldom.The extraction method is also concentrated in two areas:First, rule-based approach, according to the characteristics of the value itself and its context, combination of internal and external features,formulate appropriate rules for extraction, despite it has the high accuracy. But portability is not very strong. On the other hand,it is based on statistical methods, the most common is the Hidden Markov Model,Maximum entropy models and conditional random models, etc.. Statistical methods are mostly based on model approach, portability is very strong, relatively it needs to pay little cost and it is another method commonly used in Natural language processing.The main research in the following areas:Use the January 1998 of People's Daily as test corpus, we collected the characteristics of first value, and carried on selection of suitable features and the establishment of rule sets.For the second value information, find the corresponding trigger which can mark the event explicitly expressed.And extract features of the trigger in context.Using decision tree method, identify the sentences which contain target words.Preterit the extracted sentence, retain the results of segmentation and build text sets.Use Stanford parser to parse the text set, generating syntax tree and the text.Identify the characteristics of extract value from the syntax tree, thus for the corresponding extraction and analysis experimental results.We dealt the value information with the two different methods.For the entity characterizing values, because of the feature clearly we use the rule method; for the event argument values, as the rule is not strong, we use decision tree and the syntactic analysis method. Experimental results show that it is feasible, in the closed test, precision and recall rate is about 70%.Achieved better results.At the last, we analyzed the examples of error experimental, found the problems and proposed the corresponding solutions.In future research, we will further expand the corpus and do more research on value extraction.
Keywords/Search Tags:Value, Entity Characterizing Values, Event Argument Values, Decision Tree, Stanford Parser
PDF Full Text Request
Related items