Font Size: a A A

Research On Reranking Technology For Chinese Syntactic Parsing

Posted on:2013-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChengFull Text:PDF
GTID:2268330392967826Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of networks, the demand for informationcommunication and processing is increasingly urgent, which makes the study ofnatural language processing (NLP) and related applications have been developedrapidly. And because the parsing study’s key position in NLP and its brightprospects in various applications, we give in-depth researches of the Chinesesyntactic parsing. In this work, the re-rank technology research for parsing is thefocus and main line of our study because its effectiveness in improving the parsingperformance. The specific work is as follows:Firstly, we give an introduction of the mainstream statistical parsing models,and conduct the corresponding experiment in Penn Chinese Treebank5.0tocompare the performance of different parsing models. According to theexperimental results, we analyze the different methods of model construction andthe effect on parsing, for example, the information contained in different models,the training data requirements, and the performance and the efficiency.Secondly, we apply different parsing models in the re-rank method.Specifically, we respectively apply probabilistic context-free grammar (PCFG)model, Stanford-1model and Berkeley model as the initial model. As for the twoimportant factors which impact the performance: feature selection and parametertraining. The setting in this article is:We use the feature sets given in Collins’sarticle and make a slightly change, and we use the maximum entropy method as thetraining method to get the parameters. We further explore the effect on the finalresults with features of different types, and the results show that the choice offeatures should consider the influence of the initial model.Finally, based on the study of the traditional re-rank method, we find that thetraditional re-rank method does not take full advantage of the information containedin the N-best candidate trees. Therefore we make some improvements on the modeland verify it through the experiment. Specifically, the traditional re-rank method isoften seen as a classification problem, and the cost function is set to maximize themargin between the parse tree with the rank1and the other parse trees. But we findthat the parse trees’ rank are set according their similarity with the standard tree inreality, that is, there is no “quality” difference but only “quantity” differencebetween the parse trees with different ranks, while the traditional method oftenignore this information. In contrast, we propose two improved models: thesegmentation model based on relative distance and the multi-class-merge model,and we conduct the corresponding experiment with these two models. The results show that the parsing performance can be further improved by using the improvedmodels, when using PCFG as initial model, there is a0.9percent increase comparedto traditional re-rank method.In addition, we realize a parse tree visualization system with three displaymodels:“phrase model”,“dependency model” and “frame model”. With the help ofthis system, we can display the parse trees with different structures, and also we canprovide the appropriate help for the feature selection in re-ranking method.
Keywords/Search Tags:Chinese syntactic parsing, PCFG, re-rank, visualization
PDF Full Text Request
Related items