Font Size: a A A

Research On Transition-Based Analysis Technology For Kazakh Sentences

Posted on:2018-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:H WuFull Text:PDF
GTID:2348330533956506Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Sentence-level Kazakh language processing includes Kazakh POS tagging,chunking,syntactic parsing and semantic parsing and Kazakh language processing have been develop to constituent parsing.In this paper,we main study the Kazakh POS tagging,chunking,syntactic parsing and semantic parsing.Traditional Kazakh Traditional methods process the above tasks by their state-of-the-art models independently.We usually call these methods by pipeline methods.They have two major drawbacks.First,they suffer the error propagation problem,where the errors in lower layer tasks will spread to higher-layer tasks.Second,since they optimize a single model locally,lower-layer tasks cannot use the information from higher-layer tasks.Because of the two problems,many researchers pay more attention to joint models,which process multiple adjacent tasks with a single model,so that the above problems can be avoided and improved performances can be achieved.Another advantage is that the joint models can facilitate language researchers to understand the relations between different tasks.For this problem,the joint model is a good solution to this problem.Since the search space of the joint model is the product of the search space of each task,we choose up to two tasks to join.The method of statistical model is generally divided into transfer-based analysis method and analysis model based on graph model.The analysis method based on the transfer is slightly worse in performance than the graph-based analysis method,but its decoding efficiency has obvious advantages.In order to improve the performance of the transfer-based analysis method,we have two possible methods.First,Improve the performance of lower-level tasks,thereby improving the overall mission performance.Second,The establishment of joint model,the use of the interaction between the task to enhance the overall rmance.In this paper,we have four aspects of POS tagging,chunking,parsing of three tasks:1.We designed the lexical annotation and the chunks to carry on the joint analysis,the partitime and the chunks interact to enhance their respective accuracy rate,and use the result as syntactical input,thus improve the accuracy of syntactic analysis.2.Design of mixed model for word-like annotation and joint analysis,both to solve the serial model error propagation and lower-level tasks cannot use the task information of the problem,but also solve the joint model feature selection problem.Improve the accuracy of speech and markings as a whole.3.Improve the Beam-Search decoding algorithm,that is,the fixed column value B to the dynamic column value,that is,the candidate set of the option scores are compared with the maximum score in the set,the design fixed thr eshold for pruning smaller candidates result.Improve the accuracy of the search space,so that the accuracy of the results improved.4.Design the reward function to reduce the beam-Search decoding algorithm in the search when the pruning of the best results of the risk,so that more accurate decoding,thereby improving accuracy.
Keywords/Search Tags:Pipeline mode, Joint model, Transition-Besed parsing, Beam-Search decoding algorithm
PDF Full Text Request
Related items