Font Size: a A A

Research On Re-ranking Technology For Kazakh Syntactic Parsing

Posted on:2019-08-12Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiangFull Text:PDF
GTID:2428330566967006Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The researches of natural language processing include the part of speech,syntactic parsing,and semantic parsing.Syntactic parsing is the research object of Kazakh at this stage.Before that,we have completed the research of stemming,the part of speech,and chunking.The research of syntactic parsing also achieved good results.This article focuses on the Kazakh syntactic parsing.In the process of syntactic parsing of PCFG model,because the hypothesis of independence is too strong,it can capture the coarse-grained information of the sentence,that is,the ability to capture the sentence structure information is strong,but the language is complex,and the grammatical structure in the sentence is not completely independent.At the stage of syntactic parsing,lexical information is an important factor.This method ignores the importance of lexical information for syntactic parsing,so the ability to disambiguate in syntactic parsing is limited.On this basis,this paper proposes a method of re-ranking using perceptron.The perceptron can capture the fine-grained lexical information of the sentence and thus compensate for this defect of the PCFG model.The re-ranking method is valid for parsing with Kazakh syntax.Therefore,in this paper,we use the perceptron method to re-ranking the candidate analytic trees.The specific work is as follows:First of all,using the maximum entropy model to carry on the POS tagging of the Kazakh language,the experiment verifies the influence of different features on the tagging of words,and selects the best feature template for the tagging of parts of speech.Second,the syntactic parsing of Kazakh language,Kazakh syntax parsing reranking technology research in this article is divided into two phases: In the first stage,a simple syntactic parsing was performed using the PCFG model and the lexicalized model respectively,and the parsing effects of the two basic models for syntactic parsing were compared.The syntactic parsing candidate tree generated at this stage is used as input to the second-stage re-ranking process to perform further syntactic parsing of Kazakh language.The second phase is the re-ranking phase,which uses perceptron algorithms to reranking during the re-ranking phase.This method makes up for the lack of ambiguity in the first stage of syntactic parsing.During syntactic parsing,fine-grained lexical information can be obtained from the sentences,and the candidate trees generated by the basic model can be reordered.The main idea of perceptron re-ranking is to add lexical information to the candidate tree obtained by the PCFG model and recalculate candidate tree node scores with fine-grained features,so as to achieve a combination of a rough syntax parsing method and a fine-grained re-ranking method,resulting in the best Syntax parsing results.Experiments show that re-ranking syntax parsing is feasible.
Keywords/Search Tags:Kazakh Languase, PCFG model, Perceptron, Re-ranking, Syntactic Parsing
PDF Full Text Request
Related items