Font Size: a A A

Research On The Methods Of Relation Words Automatic Identification In Chinese Compound Sentences Based On Collocation Strength

Posted on:2015-01-16Degree:MasterType:Thesis
Country:ChinaCandidate:L S SongFull Text:PDF
GTID:2268330428968457Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
At present, the main problem of Chinese information processing is how to achieve the automatic identification of Chinese sentences."Word processing" is mainly divided into simple sentence processing and compound sentences processing. The research now is mainly about simple sentence processing. The compound sentence is the bridge linking the sentence and discourse.So the automatic identification of compound sentences is more important. But very few people involved because of its difficulty. The relation words is an important component in a compound sentence, logical semantics of a sentence is mainly reflected in the relation words, therefore,automatic identification of relation words is the key to sentence recognition. However, the "quasi relation word" in the compound sentences sometimes acts as a relation word, sometimes not as a relation word, so to realize computer automatic identification of relation words is very difficult.Collocation strength refers to the degree of mutual attraction between two words.The bigger collocation strength means the greater probability of co-occurrence of the two words.There was collocation relationship between the two words who had great collocation strength and high probability of co-occurrence.In this paper,automatic identification for the relation words in Chinese compound sentences was based on relation words who had collocation relationship.Generally, there are three methods used for automatic identification of relation words in Chinese compound sentences:based on rules, based on statistical methods and based on rules the based on the combination of rules and statistical methods.The method proposed in this paper is based on large-scale corpus, it’s a method entirely based on statistical methods.In this paper,first,we picked up the quasi relation words which had been partitioned and annotated,then distinguished if they are collocation by a method based on collocation strength.The evaluation methods got the statistics of he frequency of a single word and collocation distance and co-occurrence frequency based on large-scale corpus. And then calculated the collocation strength by the statistics.If the value of collocation strength bigger than the threshold,there was collocation relationship between the two quasi relation words.After determining the collocation relationship, the system identified the two quasi relation words automatically in the context.First,the system divided the quasi relation words and its context into two sequences,one was identified as relation word and the other was not.Then, calculated the collocation strength of them and their collocation word by Relative Word Frequency(RWF) and compared the two values.If the quasi relation word was identified as relation word was determined by which sequence’value of collocation strength is bigger. Tath’s all the process of automatic identification for quasi relation words.
Keywords/Search Tags:Chinese Compound Sentences, Relation Words, Automatic Identif-ication, Collocation Strength, Relative Word Frequency
PDF Full Text Request
Related items