Font Size: a A A

Research On Disambiguation Methods In Vietnamese Dependency Syntax Analysis

Posted on:2020-10-20Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhaoFull Text:PDF
GTID:2438330599455736Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Syntactic analysis is one of the key underlying technologies in natural language processing.Its basic task is to determine the syntactic structure of a sentence or the dependency between words in a sentence.At present,the method of parsing has changed from syntactic structure analysis to dependency parsing.The syntactic dependency analysis of large languages such as Chinese and English is relatively mature.But in a small language like Vietnamese,due to the scarcity of corpus published on the Internet,the experimental performance is not high.Moreover,all existing Vietnamese syntactic analysis methods ignore the influence of ambiguity on syntactic analysis.In order to achieve a better Vietnamese syntactic analysis methods.This paper focuses on the impact of ambiguity in Vietnamese sentence analysis.Analyze and summarize the ambiguity phenomenon.According to the results of the analysis,this paper firstly performs part-of-speech tagging and noun phrase chunking on Vietnamese corpus,and use the result as a feature in Vietnamese sentence analysis.The specific content is divided into the following aspects:Aiming at the problem of part-of-speech tagging in Vietnamese,a method based on the combination of part-of-speech disambiguation and part-of-speech dictionary is proposed for part-of-speech tagging in Vietnamese.A multifunctional words dictionary and a non-multifunctional words dictionary were generated using the part-of-speech dictionary,and the corpus was marked for subsequent processes.The multifunctional words of Vietnamese are analyzed,and the contextual word features,contextual partof-speech features and prepositional features are integrated into the CRF model to construct a multifunctional words disambiguation model.In the part-of-speech tagging,the multifunctional words and the non-multifunctional words are marked separately,the non-multifunctional words are marked using the non-multifunctional dictionary,the multifunctional words and the unregistered words are marked using the disambiguation model.Finally,the two results are combined to obtain The final result,the experimental precision was 95.73%.Aiming at the problem of Vietnamese noun phrase chunking,a method of Vietnamese noun phrase chunking based on BiLSTM-CRF model and constraint rules is proposed.Using the part-of-speech tag obtained above as a feature,the spliced pattern is integrated into the input vector of the model.The BiLSTM-CRF model,which has a good effect on the sequence labeling problem,is used for noun phrase chunking.Finally,the constraint rules obtained by analyzing the Vietnamese noun phrases are integrated into the output of the model,and the model is further optimized to obtain the final result.The experimental precision,recall and F-value were 88.08%,88.73% and 88.40%.Aiming at the problem of Vietnamese dependency parsing,a Vietnamese parsing method which combines the features of part of speech and noun phrase markers is proposed.By analyzing the ambiguity in syntactic analysis,the above-mentioned part of speech and noun phrase mark are taken as features.In order to better integrate features into the model,the syntactic analysis method based on sequence annotation and the Attention-BiLSTM model for syntactic analysis are selected.The input vector of the model is a combination of two eigenvectors and a word vector.The experimental UAS and LAS reached 85.76% and 85.18%.
Keywords/Search Tags:Vietnamese, Syntactic Analysis, Part of Speech Tagging, Noun Phrases chunking
PDF Full Text Request
Related items