Font Size: a A A

Research On The Identification Of The Core Components Of Typical Chinese Sentence Patterns

Posted on:2019-11-12Degree:MasterType:Thesis
Country:ChinaCandidate:M S HuangFull Text:PDF
GTID:2438330566483691Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of society,every field needs natural language processing.Syntactic analysis as one of the key components of natural language processing,plays an increasingly important role in the construction of national defense and national economic construction.Typical sentence pattern is an important part of modern Chinese.Typical sentence analysis not only enriches the knowledge of linguistics,but also plays an important role in the improvement of syntactic analysis,so it is particularly important to analyze typical sentence patterns.Other sentences pattern can also be applied to the model of this article.In the study of the core component identification of Chinese typical sentence patterns,this paper mainly focuses on the establishment of models based on the combination of rules and statistics.The main tasks include:(1)We study typical sentence patterns,core components of typical sentence patterns and sentence patterns of “?” especially,and extract the rule template of ”?”and the core components' location in the sentence.(2)It is very possible to make mistakes in the original part-of-speech tagging tools when typical sentence patterns' part-of-speech are marked.In order to improve the accuracy of part-of-speech tagging of typical sentence patterns,an error learning method based on transformation was proposed.The transformation template is obtained through large-scale corpus and experiment.When the corresponding rewriting rules and activation environment are encountered in the transformation template,the part of speech will be rewritten.The corresponding transformation template of “?” is given.(3)We establish the minimum spanning tree model to analyze the sentence.We use the participle tool to participle the input sentence,realize node localization of words;Each two word nodes are interlinked,and the cost of two word node connections is calculated through the corpus;The minimum spanning tree is selected to form the minimum interconnection cost,to analyze the core components of the sentence.And the verification is given through experiments.(4)We establish the model of combining rules and statistics to analyze thesentences.First,we use rules to match the typical sentence patterns,and if they can match,the sentences can be analyzed directly.For the typical sentence patterns that the rules can't match,the hidden markov model is established and the sentences is analyzed by using the viterbi algorithm.In this paper,the hidden markov model is further improved,the hidden markov model is established based on the part of speech,it simplify the our work and increase the efficiency and effectiveness of the typical sentence analysis,and the verification is given through experiments.Through the syntactic analysis experiments of the two models,we find a model combining statistics and rules has a higher accuracy and validity for the extraction of the core components of the syntactic structure.We improve the hidden Markov model,by which the workload is simplified and the efficiency of the model is improved.
Keywords/Search Tags:Typical sentence pattern, Core components, Minimum spanning tree, Hidden Markov, Viterbi algorithm
PDF Full Text Request
Related items