Font Size: a A A

A Study Of Chinese Hierarchical Syntactic Boundaries Based On Phrase Structure

Posted on:2022-05-02Degree:MasterType:Thesis
Country:ChinaCandidate:C J YangFull Text:PDF
GTID:2518306524951849Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
In recent years,with the widespread application of artificial intelligence technology,deep natural language analysis such as syntactic analysis has attracted more and more attention.The main task of syntactic analysis is to analyze the composition of a sentence and transform it into syntax tree.syntactic analysis can analyze the building blocks of a sentence and the relationship between words,so as to help machines understand natural language,and use it in machine translation,automatic question and answer,abstract generation and other semantic understanding application fields.Syntactic analysis is a classic task of Natural Language Processing(NLP).This article mainly studies the boundary problem in Chinese hierarchical syntactic analysis.First,by analyzing the hierarchical nature of the hierarchical syntax analysis of the phrase structure and the structural characteristics of Chinese,a syntactic structure tree is proposed that replaces the core words as the form of lexical chunks,and combines lexical chunks layer by layer.In the process of syntactic boundary analysis,the methods of chunks recognition and core word extraction of lexical chunks were discussed separately,and experiments were carried out using different models.details as follows:1.Core word extraction module.In this paper,the problem of extracting the core words of a chunk is regarded as the problem of finding the importance of each core word in the vocabulary block,and then the word with the highest importance value is taken out as the core word of this chunk.We improve the Text Rank importance ranking algorithm to find the importance of words.We improve the recognition accuracy by adding word similarity information,location information,and part-of-speech information.2.Chunks recognition module.Firstly,Think of chunks recognition as a problem of tag sequence recognition.Then use the Bi-directional Long Short-Term Memory(Bi LSTM)model,Conditional Random Field(CRF)model,and the combination of the two models(Bi LSTM+CRF)to identify the boundary markers of chunk.Among them,CRF can learn the transfer characteristics of the output mark sequence result and the adjacent mark collocation relationship in the predicted sequence,which achieve the effect of joint decoding of the predicted mark sequence.Bi LSTM can learn context features to solve the problem of long-term dependence in sequence prediction.Bi LSTM+CRF model gives full play to their respective advantages and improves the recognition effect of sequences.The model proposed in this paper was compared and tested on Chinese Penn Treebank(CTB8.0)corpus.This experiment uses three models for chunk recognition,namely CRF,Bi LSTM,Bi LSTM+CRF.And use three kinds of information to improve the Text Rank model,which are word similarity information,location information and part-of-speech information.Moreover,the above three kinds of information are combined to improve the Text Rank model in order to see the effect of syntactic boundary analysis.We also tested the recognition effect of each model under different sentence lengths.The results show that the recognition effect of the Bi LSTM+CRF model and the Text Rank model improved by using three kinds of information is the best.Compared with the baseline method,such as the LR method,the F1 value increased by 6.58%,and the sentence Overall Accuracy increased by3.68%.Experiments verify the effectiveness and stability of the model.
Keywords/Search Tags:Core Word Extraction, Chunk Recognition, Similarity Information, Location Information, Part-of-Speech Information
PDF Full Text Request
Related items