Font Size: a A A

Research On Chinese Syntax Analysis Based On Graphs

Posted on:2020-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:H Z GuoFull Text:PDF
GTID:2438330596497513Subject:Electronic and communication engineering
Abstract/Summary:PDF Full Text Request
Dependency parsing is a method of automatic grammar analysis of natural language.Its main intention is to analyze the interdependence between components in a complete sentence.This syntactic analysis method is inspired by dependent grammar linguistics.It is in the middle of the basic research and practical application.Therefore,improving the accuracy of the dependency parsing has a great effect on natural language processing.The traditional syntactic analysis mainly summarizes the characteristics of each language through linguists,and summarizes a series of rules to analyze sentences.This method has higher requirements for linguistic experts,and each language should summarize their respective grammar rules.It is not possible to establish a uniform standard for all languages.In the past ten years,with the consummate of corpus and the improvement of machine performance,the data-driven method has attracted more and more people's attention.Among them,the more classical ones are based on transformation and graph-based dependency parsing.Judging from the results of previous studies,the correct rate of dependency parsing based on transformation is not very high,and the implementation method is also complicated.This thesis uses graph-based dependency parsing as the research focus.Firstly,the graph-based dependency parsing process is studied from the theoretical level,and three aspects are studied:(1)The representation of sentence-dependent structure.This thesis uses three methods to represent the dependency structure of sentences,each with its own advantages and disadvantages.Demonstrate the dependency structure of sentences from different levels.(2)The construction of the analytical model,including the initialization of the analytical model and the weight training process.(3)Parsing algorithm: compare two common parsing algorithms.Since the dependency parsing is independent of the language,in order to improve the scalability of the system,the Chu-Liu-Edmonds algorithm which can be non-projection parsing is finally selected.Then a set of graph-based dependency parsing system is designed.The Python language is used to realize the three parts: attribute information extraction and construction model,weight learning and sentence analysis.Finally,the algorithm and model are tested.Based on the analysis of the test results,not only the learning algorithm is improved,but also the storage structure of the analytical model is optimized.This thesis uses Tsinghua University's semantic dependent network corpus.The training corpus has twenty thousand sentences and the test set has two thousand sentences.First,an analytical model is trained using less attribute information of dependency pairs,and the test set is used to test the model and analyze the experimental results.Then add the attribute information to train the analytical model,and then test the model with the test set,and compare the results of two experiments.The results show that increasing the attribute information of the dependent pairs can improve the accuracy of the system.With the deepening of the research,the experimental system has also been greatly improved compared with the original.This thesis also determines the optimal number of training iterations for the model trained by using different attribute information to prevent the system from over-fitting.Finally,this thesis also compares the results of statistics-based analysis and rule-based analysis of Chinese sentences containing a single verb.Based on the above research,it is verified that the proposed improved method is effective.
Keywords/Search Tags:dependency parsing, learning algorithm, analytical algorithm, analytical model
PDF Full Text Request
Related items