Font Size: a A A

Research On Identification Of Kazakh Base Phrase

Posted on:2015-07-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y WangFull Text:PDF
GTID:2298330431492084Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Basic phrase recognition is one of the important steps of shallow parsing, and isthe basic work of constructing Kazakh phrase chunk bank, the results will affect theresearch and applications of syntactic analysis and machine translation. There aresome achieved results in this area, the establishment of overall and large-scalestandard Kazakh corpus, the completion of the base phrase identification of shallowparsing is of great significance for the subsequent development of Kazakh language.According to the current research status of Kazakh language, this paper designsand implements the Kazakh language identification system. The system divided intocharacter extracting module, identifying module and correcting module. The first step,the paper do the basic work of basic phrase recognition at first. First of all, the taggingstandard of Kazakh basic phrase is researched, in order to reduce the workload ofmanual tagging, the corpus based on a single type of basic library is crossing tagged.and the Kazakh basic phrase tagged corpus is established. The second step, introducethe implementation of the conditional random field model of Kazakh basic phraserecognition process. Firstly, according to the characteristics of Kazakh basic phrase,extract features and establish the feature space, using the incremental feature selectionalgorithm to select feature template. Secondly, using L-BFGS algorithm to estimatethe parameters of features, the Viterbi algorithm is used to output the optimalrecognition in the recognition process. The third step, based on the preliminaryidentification, using error driven learning algorithm based on transformation tocorrect the recognition results, and the evaluation standard has been improved, itresearches the effect of twice identification.In order to test the feasibility of the method, using the Xinjiang daily Kazakhversion corpus, test in different test conditions, the experimental results show that,this identification method improves the identification accuracy rate, and thegeneralization degree is good. Kazakh basic phrase recognition is effective. In thispaper, the Kazakh basic recognition work, prepares for constructing the Kazakhphrase chunk and syntactic analysis at next step.
Keywords/Search Tags:Conditional Random Field, Kazakh, Basic Phrase, Incremental featureselection algorithm, Transformation-Based error-driven Learning algorithm
PDF Full Text Request
Related items