Font Size: a A A

Research On The Rule Excavation Method Based On Decision Tree In Automatic Identification Of Relation Words In Chinese Compound Sentences

Posted on:2015-03-30Degree:MasterType:Thesis
Country:ChinaCandidate:L XiangFull Text:PDF
GTID:2268330428968444Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Automatic identification of relation words in Chinese compound sentence is a difficult problem in Chinese information processing. The research group has already get some achievement and developed a system of automatic identification of relation words depending on rules, but all the rules in this system are artificially excavated. The researcher analyses a lot of corpus, and get some conclusion about the relation words in the compound sentences, then do some summarize to get the formalize express of the rule. Using the existing rule, the system can deal with the identification of part of the compound sentences, because of the number of rules is limited, the kind of compound sentence is various, the system can’t identify all the compound sentences. So using computer to automatic excavate rules, and identification all the compound sentences plays an important part in automatic identification of relation words in Chinese compound sentence.Through the analysis of rules in the rule base, we find out that a whole rule contains constraints part and results part. If we want to excavate a new rule from a kind of sentences, we need to identify all the relation words in the sentence. The identification of relation words is a classification problem, using the rules to judge the words are real relation words or not. So, this article proposes a method of excavate new rules by using decision tree algorithm based on rule base.First of all, we use the relation word to query the rule base, pretreat the related rules and construct them into an array, then fill up the vacancy in the array by using several kind of methods, build a decision tree from the array. At last, using the post-pruning method to simplify the decision tree. We dig up the necessary information to identify the relation word from the compound sentence using the constraints in the array, store the route of judge the relation word in the decision tree, then integrate the identification result and the route stored before into a new rule.All the compound sentence used in the experiment are from the corpus of Chinese compound sentence, which is made up by the study center of language and language education in huazhong normal university. Through the analysis of the result of experiment, we find out that using this way can accurately identify the relation words in those compound sentences which can not be identified by the system of automatic identification of relation words, and we can excavate new effective rules to complement the rule base.
Keywords/Search Tags:Chinese information processing, relation words in Chinese compoundsentence, decision tree, automatic identification of relation words in Chinesecompound sentence, automatic excavate rule
PDF Full Text Request
Related items