Research On The Methods Of Relation Words Automatic Identification In Chinese Compound Sentences Based On Collocation Strength

Posted on:2015-01-16

Degree:Master

Type:Thesis

Country:China

Candidate:L S Song

Full Text:PDF

GTID:2268330428968457

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

At present, the main problem of Chinese information processing is how to achieve the automatic identification of Chinese sentences."Word processing" is mainly divided into simple sentence processing and compound sentences processing. The research now is mainly about simple sentence processing. The compound sentence is the bridge linking the sentence and discourse.So the automatic identification of compound sentences is more important. But very few people involved because of its difficulty. The relation words is an important component in a compound sentence, logical semantics of a sentence is mainly reflected in the relation words, therefore,automatic identification of relation words is the key to sentence recognition. However, the "quasi relation word" in the compound sentences sometimes acts as a relation word, sometimes not as a relation word, so to realize computer automatic identification of relation words is very difficult.Collocation strength refers to the degree of mutual attraction between two words.The bigger collocation strength means the greater probability of co-occurrence of the two words.There was collocation relationship between the two words who had great collocation strength and high probability of co-occurrence.In this paper,automatic identification for the relation words in Chinese compound sentences was based on relation words who had collocation relationship.Generally, there are three methods used for automatic identification of relation words in Chinese compound sentences:based on rules, based on statistical methods and based on rules the based on the combination of rules and statistical methods.The method proposed in this paper is based on large-scale corpus, it’s a method entirely based on statistical methods.In this paper,first,we picked up the quasi relation words which had been partitioned and annotated,then distinguished if they are collocation by a method based on collocation strength.The evaluation methods got the statistics of he frequency of a single word and collocation distance and co-occurrence frequency based on large-scale corpus. And then calculated the collocation strength by the statistics.If the value of collocation strength bigger than the threshold,there was collocation relationship between the two quasi relation words.After determining the collocation relationship, the system identified the two quasi relation words automatically in the context.First,the system divided the quasi relation words and its context into two sequences,one was identified as relation word and the other was not.Then, calculated the collocation strength of them and their collocation word by Relative Word Frequency(RWF) and compared the two values.If the quasi relation word was identified as relation word was determined by which sequence’value of collocation strength is bigger. Tath’s all the process of automatic identification for quasi relation words.

Keywords/Search Tags:

Chinese Compound Sentences, Relation Words, Automatic Identif-ication, Collocation Strength, Relative Word Frequency

PDF Full Text Request

Related items

1	Automatic Recognition And Rule Mining Of Chinese Relation Words In Compound Sentences Based On Dependencies
2	Automatic Establishment Of The Hierarchies Of The Dependency Relation Of Chinese Compound Sentence Based On Collocation Of Relation Words
3	Research On The Rule Excavation Method Based On Decision Tree In Automatic Identification Of Relation Words In Chinese Compound Sentences
4	Analysis Of Hierarchical Structure In The Marked Compound Sentences Based On Collocation Of Relation Words
5	Hierarchy Division Of A Compound Sentences With Non-saturated Relation Word Via Neural Network
6	Research Of Rule Parser In Relation Words Of Compound Sentences Automatic Identification System
7	The Analysis And Research On Deterministic Dependency Parsing Of Chinese Coordinative Relationship Compound Sentences
8	Modern Chinese Words With The Automatic Extraction Method
9	The Automatic Analysis Method Of Chinese Three-sentence Complex Sentences Based On Deep Learning
10	Relation Recognition Of Non-saturated Chinese Compound Sentences With Two Clauses Based On Deep Learning