Font Size: a A A

A Study Of The Shallow Syntactic Analysis Methods In Vietnamese

Posted on:2018-04-09Degree:MasterType:Thesis
Country:ChinaCandidate:Y C LiuFull Text:PDF
GTID:2358330518461970Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the increasingly frequent contacts and in-depth cooperation between the two countries in political,economic and cultural fields.Language exchanges are becoming more and more important.Because of the language difference,resulting in barriers to communication,and further has become a stumbling block to the development of the two countries;at the same time,Vietnamese Natural Language Processing in artificial intelligence has played a central role in lexical analysis,shallow syntactic analysis is the same at the bottom of the foundation and the premise of Natural Language Processing,and relates to the follow-up work,and is for the upper application services.In order to better development of the two countries,it is imperative to solve the language problem,in order to solve the above problems,the Chinese-Vietnamese Machine Translation’s work is more and more important.This paper carried out the research of shallow parsing Vietnamese,mainly completed the following work:1.Collection,sorting and preconditioning for Multi-Category Words,entity and chunk corpus.The corpus is the foundation of Natural Language Processing in the process of project construction,so it is particularly important to construct the corpus.corpus mainly originates from the limited data published and artificial markers by proofreading.2.Putting forward a method based on conditional random field method for Vietnamese Multi-Category Words.Firstly,The paper analyzes the characteristics of Multi-Category Words,selecting effective features,makes the feature template;secondly using the conditional random field to carry on the statistical modeling,so getting Vietnamese disambiguation model,solving the problem of Multi-Category Words will help to improve the accuracy and quality of POS corpus of POS tagging,it is as far as possible to prevent the backward error for the follow-up work of the cumulative transfer and provides the foundation and support for Vietnamese NER.3.Putting forward a hybrid method for integration of physical characteristics of Vietnamese named entity recognition.Firstly,According to the Vietnamese language and physical characteristics,selecting of global and local features are as the effective features,constructing entity recognition mode,secondly,making rules using above characteristics;finally,using the combining method of ME model and rule sets.The entity features can be used as an effective feature in the chunk.4.Putting forward a combining method of CRFs and TBL to analysis Vietnamese chunk.Firstly,According to the characteristics of Vietnamese chunks and language,selecting effective features,using CRFs to make model,getting chunk based on CRFs model.Secondly,using the method of TBL and the evaluation function to obtain rules,As an effective feature of entity recognition,chunk can improve the recognition accuracy.
Keywords/Search Tags:Vietnamese, Multi-Category Words, NER, Chunk, Maximum Entropy, CRFs, Transformation-Based Error-Driver Learning, Entity Library, Rules
PDF Full Text Request
Related items