Font Size: a A A

Automatic Recognition Of Relation Words In Chinese Complex Sentence Based On Decision Tree

Posted on:2019-05-15Degree:MasterType:Thesis
Country:ChinaCandidate:Y J ZouFull Text:PDF
GTID:2428330548472422Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The word processing in Chinese information processing has made good progress now.Subsequent to word processing,sentence processing plays a critical role in the transition to text processing.Compound sentence processing is the basis of sentence processing.The relational words in compound sentence not only serve as the connection part between the clauses,but also can identify the hierarchical structure of compound sentences.Therefore,the automatic recognition of relational words in compound sentences is the key research content in compound sentence processing.Current methods of identifying relational words are mainly based on rules,statistics or the two combined.Based on the statistical method,we can get better accuracy than the rule by identifying the model.This paper analyzed the characteristics of the compound sentences'relational words from the perspective of dependency relationship,which showed that the dependency characteristics of relational words can reflect its characteristics as a relational word.Then it analyzed the compound sentences with the dependency syntax in the Corpus of Chinese Compound Sentence(CCCS)developed by the Institute of language research of Central China Normal University with the help of the Harbin Institute of Technology language technology platform(LTP).Based on the analysis results,the Chinese compound sentence dependency tree bank was built.After summarizing the dependency characteristics of the relational words in the dependency tree,seven features were selected as the important basis for identifying compound sentences'relational words.Lastly,the features of the compound sentences' relational word feature extractor were extracted and quantified,and compound sentences' relational word recognition model was constructed based on a C4.5 decision tree algorithm.Then we do pruning on decision tree model to make it more generalization.The paper takes the Corpus of Chinese Compound Sentence(CCCS)as the experiment corpus.The original data set was divided into four parts by random sampling.Three of them were used as training sets to train the decision tree model,and one was used to test the effect of the decision tree model in the recognition of compound sentences.The experiment results show that the decision tree model has a higher accuracy rate for the recognition of compound words in Chinese compound sentences,which shows the feasibility and effectiveness of the proposed method.
Keywords/Search Tags:dependency grammar, relational words' identification, characteristic vector, decision tree
PDF Full Text Request
Related items