Font Size: a A A

Chinese Complex Sentences And Automatically Determine

Posted on:2009-12-26Degree:MasterType:Thesis
Country:ChinaCandidate:L P HongFull Text:PDF
GTID:2205360245476630Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Chinese information processing has already finished character processing stage, and has also solved word processing problem well. Now it is marching to the stage of sentence processing. As a sort of grammar entity, complex sentence connects clause as its down level and connects paragraph as its up level. It has a good linking up function between clause and paragraph, and it also have many properties in aspects of grammas,semantics and pragmatics. Complex sentence has complex and special structure of its' own, so it becomes an urgent problem in sentence processing. At present, few studies dedicate to computer processing of complex sentence.Chinese sentence can be divided into two classes of simple sentence and complex sentence by structure. Research to the simple sentence mainly lays particular emphasis on the analysis of the sentence composition and relation between compositions; this is not enough to complex sentence, further investigation on relations of clauses which is direct base units of complex sentence is needed. Therefore, researching on the basis of definition of complex sentence,differences between simple sentence and complex sentence ,and classification of complex sentence, this paper proposes strategy of "divide and rule", it means to realize segmentation of complex sentence first ,then base on this, automatic judgement on the relation of complex sentence can be realized.The main research contents are summarized as follows: the first part is to realize the correct segmentation of complex sentence. So complex sentence can be divided into some ordered set of clauses by determining the function of comma, and meanwhile cut-off points can be judged. The second part is aimed to judge the relations of clauses. Through fully excavating the inner link of words and word tags in sentences, it revealed logic-semantic relation between clauses. It realizes automatic judgement on the relation of clauses well at last.According to every partially characteristic of the research contents described above, we select the optimal statistical model respectively: Support Vector Machine (SVM) and Conditional Random Fields(CRF).In order to obtain better classification accuracy, through observing and analyzing a large number of related language phenomenon, we integrate linguistic knowledge into statistical model, then the optimization model is generated.All the corpus we used in experiment is from TCT 973(Tsinghua Chinese Treebank).The Treebank has size of 1000 thousand Chinese characters. Finally, all the experiments get good results in opening test and closing test. The segmentation of complex sentence achieves about 84.70% in accuracy, and the accuracy of automatic judgement on the relation of clauses is 94.86%.At last, integrated experiment result is accuracy of 83.26%(data reported all above is result of opening test).If features could be improved and increase the marking information of relative conjunctions, the system can get a better effect in expectation.
Keywords/Search Tags:relation of complex sentence, segmentation of complex sentence, SVM, CRF, parsing
PDF Full Text Request
Related items