Font Size: a A A

Research On Confidence Measure In Dependency Parsing

Posted on:2013-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:J GuoFull Text:PDF
GTID:2268330392467957Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The syntactic parsing is the core issue of natural language processing. It can supportlot of applications, such as information extraction, information retrieval, and machinetranslations. The dependency parsing, with its simple grammatical form, easy-tagging,and facilitate applications, have recently gained a wide interest. Although the dependen-cy parsing has made some progress recently, its accuracy is still unable to meet the needsof practical application. In this thesis, we do not pay more attention to improve the accu-racy of dependency parsing. We propose a novel natural language processing task: TheConfidence Measure in Dependency Parsing. By computing the confidence of each arc ina dependency tree, we can only apply those highly confident arcs to particular practicalapplications. Thus the performance of these applications can be improved.Practically all data-driven models that have been proposed for dependency parsingin recent years can be described as either graph-based or transition-based. In this the-sis, we propose several confidence measure methods for these two sorts of models. Intransition-based parsing, we learn a model for scoring transitions from one parse stateto the next, conditioned on the parse history, and perform parsing by greedily taking thehighest-scoring transition out of every parser state until we have derived a complete de-pendency tree. There are two learning algorithms for transition-based models, the LocalLearning and Global Learning. For local learning models, we propose two methods ofconfidence measure. One is the Likelihood-based method, the other is Resampling-basedmethod. For global learning models, we propose a Weighted K-Best Voting method whichmakes use of the K-Best outputs of the parser. In graph-based parsing, we instead learna model for scoring possible dependency graphs for a given sentence, typically by fac-toring the graphs into their component arcs, and perform parsing by searching for thehighest-scoring graph. Graph-based dependency parsing usually uses an online learningalgorithm to learn a model, which is a discriminative linear model. Thus we’re not ableto achieve a probability/likelihood of an arc directly. To solve this problem, we proposea method which approximately estimates the marginal probability of an dependency arc.Then we use the marginal probability as the confidence value. Meanwhile, we proposea novel supervised algorithm, which is based on logistic regression to estimate the con-fidence. With this algorithm, more extra features can be exploited, and their weights be learned automatically to achieve a state-of-the-art confidence measure system.Furthermore, we propose several evaluation methods for confidence measure. Theycan be used not only as evaluation methods, but also can be the optimization objective ofconfidence computing.Finally, we apply the confidence measure methods to two practical applications tosee the efectiveness. One is the document-level sentiment analysis, and the other is asemi-supervised dependency parsing task. Experimental results show that the use of con-fidence information actually improves the performance of these applications.
Keywords/Search Tags:Dependency Parsing, Transition-based Model, Graph-based Model, Confi-dence
PDF Full Text Request
Related items