Font Size: a A A

Eliminate The Ambiguity Of Relation Words In Compound Sentences Based On Rules And VSM

Posted on:2018-08-31Degree:MasterType:Thesis
Country:ChinaCandidate:C LiFull Text:PDF
GTID:2428330518482358Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the field of Chinese information processing,word and expressions research has become mature these day.And the corresponding research results have been widely applied.The compound sentence in the Chinese syntax has a very important position.The relation word of compound sentence is an important means of connecting clause.Therefore,the study of compound sentence is still inseparable from the relation words.The study of compound sentence is based on the automatic recognition of relation words.The current mainstream is based on the combination of rules and statistical models.Although the automatic recognition of the compound sentence has been achieved some success,but the accuracy rate has yet to be improved.One of the reasons is the relation words identification is based on the word segmentation system.The error of automatic word segmentation and part-of-speech tagging will cause deviation for recognition results.To solve this problem,this paper proposes a method to eliminate the ambiguity of compound sentence words based on rules and vector space model(VSM).In this paper,we first used the NLPIR Chinese word segmentation system of the Chinese Academy of Science to pre-process the automatic word segmentation and part-of-speech tagging for Chinese compound sentence corpus,then counted and analyzed the pre-processing compound sentence corpus.And we summarized the scope,character and distribution of the compound sentence relational word.According to the manifestations of segmentation of ambiguous fields,we induced and classified it.Secondly,we quantitatively analyzed the segmentation ambiguous fields and extracted the characteristic patterns,formalized the characteristic patterns into rules,and identified the ambiguous fields of the compound sentence.Then,according to the characteristics of VSM,we constructed training set and test set.Through the training set,we determined the size of the part of speech matrix and got the optimal context of the classification effect by the CHI method.Finally,we chose part-of-speech as a quantitative weight strategy,constructed the part-of-speech matrix and the context vector of ambiguous field which is to be eliminated.And we mapped it to mufti-dimensional space,calculated the distance from the context vector to each part-of-speech vector.We use the nearest part-of-speech vector of the part-of-speech type mark the ambiguous field,so as to achieve the purpose of the whole disambiguation.Experiments show that the method based on rules and VSM has a high correct rate for disambiguation of relation words.Disambiguation of the correct rate reached 95.94%.It shows that the method proposed in this paper is feasible and effective.
Keywords/Search Tags:Relation words, Segmentation disambiguation, VSM, Disambiguation rules
PDF Full Text Request
Related items