Font Size: a A A

Analysis And Study Of The Characteristics Of Chinese Three-part Causative Complexes Based On Relational Word Collocation

Posted on:2022-01-12Degree:MasterType:Thesis
Country:ChinaCandidate:C Y XiaoFull Text:PDF
GTID:2505306350967079Subject:Computer technology
Abstract/Summary:
Chinese complex sentence is a more complex sentence composed of two or more clauses.The elements that play the role of connecting clauses in complex sentences are called relational markers or relational words.Relational words play an explicit or implicit role in identifying the semantic relationship between clauses in complex sentences.Therefore,in order to have a complete and accurate understanding of the semantics of the whole complex sentence,it is necessary to make a more in-depth study of the relative words in the complex sentence.This paper takes the trisection in causal sentences as the research object,and analyzes the dependence strength and span distribution of the relative words in this kind of complex sentences.As an important characteristic value,it provides support for the construction of the knowledge base of the relative words collocation in complex sentences.In order to improve the accuracy and reliability of the experiment,this paper creates a more representative causal trisection corpus,which is named diasc,based on the trisection in CCCS corpus and supplemented by the trisection obtained by web crawler.On the basis of diasc corpus,this paper makes statistics on the word frequency of 48 causal Related words,and selects 28 of them as the research subjects.This paper first interprets the definition of Chinese complex sentences from two aspects,and then describes the classification of Chinese complex sentences in detail.Then all the 12 kinds of relative words are listed in order to deepen the readers’understanding of relative words.Then,this paper gives a new definition and extension of the dependence strength between relational words,proposes a new quantitative calculation method,and gives the definition of the span of relational word pairs.The next step is the implementation of the double eigenvalue extraction method based on strength and span distribution,including word segmentation,punctuation removal,word frequency statistics,co-occurrence statistics,dependence strength calculation,span distribution statistical analysis and so on.Finally,12 pairs of relational words with high dependence intensity(high research value)are obtained:"why-because","because-so","because-then","because-so","because-so","because-so","because-so","because-so","because-so","because-so","because-so","because-cause".And the span between them is between 2-15 words.Because the purpose of this paper is to build a relational collocation knowledge base based on dependence intensity,this paper also gives a brief description of the process of building a relational collocation knowledge base.Finally,according to the experimental results of this paper,when constructing the relational collocation knowledge base,it is necessary to select the relational word pairs with high dependence.And in the selection of relevant examples,we should select the relative word pairs in the range of 2-15 words.The relational collocation knowledge base created in this way will have high research value,which also lays a solid foundation for the future in-depth learning.
Keywords/Search Tags:Collocation of relative words, Chinese complex sentences, Dependence intensity, Span distribution, Knowledge base of collocation of relative words, Chinese information processing
Related items