Font Size: a A A

Research On The Vietnamese Noun Phrase Extraction With The Support Of Linguistic Features

Posted on:2020-02-16Degree:MasterType:Thesis
Country:ChinaCandidate:W H WangFull Text:PDF
GTID:2415330620953210Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Noun phrases are an important part of text information and contain the core meaning of sentences.As an intermediate structure which is higher than words but lower than sentences,the noun phrase has more complete and clearer meanings than a single word,which can effectively solve the problem of language ambiguity.In addition,correctly identifying noun phrases can help with grasping the main frame structure of sentences and reducing the difficulty and complexity of parsing.This paper takes Vietnamese noun phrase recognition as its task.Based on Vietnamese noun phrase tagged corpus,internal structure and boundary features of Vietnamese noun phrases are statistically analyzed and the acquired linguistic features are incorporated into the recognition model to identify Vietnamese noun phrases.The main research contents and innovations are as follows:(1)This paper conducts a corpus-based statistical survey about the linguistic features of Vietnamese noun phrases,which reveals the strong correlation between Vietnamese noun phrases and part of speech(POS),the characteristics of Vietnamese noun phrase POS patterns and the boundary features of Vietnamese noun phrases.After that,this paper makes a comparative analysis about the differences between the ordinary linguistic features and the linguistic features of Vietnamese noun phrases mentioned above.This part is the linguistic basis of this study,which also enriches and supplements the existing Vietnamese noun phrase linguistic research.(2)Supported by the rule base of Vietnamese noun phrases POS patterns and the dictionary of Vietnamese noun phrase adjacency words formed by linguistic investigation,this paper transforms the boundary features and POS patterns of Vietnamese noun phrases into binary features,and thus integrates these features into CRF(Conditional Random Fields)model to identify Vietnamese noun phrases.The experimental results show that the proposed methods can effectively improve the recognition effect of CRF model.(3)Considering that word vectors cannot represent phrase-level information,this paper models the similarity between each word vector and noun-phrase vector,gets the vectorized representations of noun phrase boundary information and successfully incorporates them into deep learning models to recognize Vietnamese noun phrases.(4)Considering the characteristics of the POS patterns of Vietnamese noun phrases,this paper adds the multi-head attention mechanism into Bi-LSTM(Bidirectional Long-Short-Term Memory)+CRF,which enables the model to pay more attention to the combinational relationship within inputted word sequences.Considering the strong correlation between Vietnamese noun phrases and POS,this paper incorporates attention mechanism into the input layer of Bi-LSTM+CRF,which enables the model to flexibly adjust the weights of word vectors and POS feature vectors in the input layer according to different inputs.The experimental results show that the two improvements of Bi-LSM+CRF can effectively improve the recognition effect of the model.(5)Based on the experiments of Vietnamese noun phrase recognition using CRF model and deep learning model,this paper compares the recognition effects of these models and analyzes the differences of the two models in using linguistic features of Vietnamese noun phrases.After that,this paper gets the best Vietnamese noun phrase recognition architecture,which is the Attention-over-Input-Layer+Bi-LSTM+CRF model with word vectors,POS feature vectors and boundary vectors as its inputs.The recognition accuracy of this method reaches 91.65% and the recall rate reaches 92.48%.(6)After verifying the validity of the linguistic features of Vietnamese noun phrases in the task of noun phrase recognition through experiments,because the method used in the incorporation of the boundary information into the deep learning model is indirect,this paper explains the validity of this method through visualization method,which enhances the interpretability of the method.This paper makes a comprehensive study about the linguistic features and automatic recognition technologies of Vietnamese noun phrases and improves the recognition model according to the linguistic features of Vietnamese noun phrases.The results of this study improve the recognition effect of Vietnamese noun phrases.The overall research ideas and methods can provide reference for relevant researchers.
Keywords/Search Tags:Vietnamese Noun Phrase, Automatic Recognition, Linguistic Features, CRF, Deep Learning Methods
PDF Full Text Request
Related items