Font Size: a A A

A Study On Chinese Functional Chunk Parsing

Posted on:2012-01-10Degree:MasterType:Thesis
Country:ChinaCandidate:H X LiuFull Text:PDF
GTID:2218330368487806Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The automatically parsing of Chinese functional chunk is transformed into the problem of sequence labeling in this paper. We build a sequence labeling model for Chinese functional chunk based on Conditional Random Fields which is a conditional probability model based on undirected graph. We can append any effective feature vector into Conditional Random Fields model at random. It has the ability of expressing the characteristics of long-distance dependencies and overlap, so it could solves the problem of label bias. Also, all of the feature could execute the global normalization and find the global optimal solution. Conditional Random Fields model has not that forceful assumption for the probability distribution of input or output like Hidden Markov Model, so it is very suitable to sequence labeling and we choose it for labeling of Chinese functional chunk.We focus on building a system for labeling Chinese functional chunks, through detecting the boundary of Chinese functional chunks and labeling the functional information in a sentence with correctly word segmenting and POS tagging. This paper proposes an approach that combines the feature template optimizing strategy with Conditional Random Field Model for automatic labeling Chinese functional chunks. On the testing data set, the precision, recall and F-1 measure of Chinese functional chunks reaches 85.84%,85.07% and 85.45% respectively, of which the F-1 measure of subject, predicate, object and adverb functional chunk reaches 85.16%,88.22%,81.75% and 91.98% respectively, and ranked the first in the close test of CIPS-ParsEval-2009 task3 Function Chunk.On the basis of combining the feature template optimizing strategy with Conditional Random Field Model, existing language resources Chinese thesaurus "Tongyici Cilin" is introduced into the processing module, of which the semantic information will be added to the feature template, the effect of data sparseness and ambiguous problem is remitted, thus the three performance indexes are increased to 86.21%,85.31% and 85.76% respectively, and better than the previous method based on Conditional Random Fields model solely.
Keywords/Search Tags:Chinese functional chunk, Conditional Random Fields (CRFs), Semantic information, Ambiguous structure
PDF Full Text Request
Related items