Font Size: a A A

Quantitative Study On English Complex Sentences

Posted on:2022-12-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:1485306782998359Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Sentences are the highest-level grammatical units in human language,and complex sentences are the most complex type of sentences.It is well known that complex sentence is indispensable linguistic device for expressing or conveying more complete and complex information.The definition of the English complex sentence is not consistent,but Quirk et al.'s definition in A comprehensive grammar of the English language(1985)is generally or widely accepted as a type of sentence that contains at least one main clause with one or more subordinate clauses embedded as sentence constituents.The embedded subordinate clause is divided into two categories,one is a clause that serves as an argument in the main clause,such as subject,object,etc.,which is called complementation clause or nominal clause;the other is a subordinate clause that modifies the main clause or one of its sentence components,i.e.,adjective clause or relative clause and adverbial clause or adjective clause.Previous researches on English complex sentences germinated in children's language acquisition,developed in English language teaching in primary and secondary schools,extended in the measurement of syntactic complexity of English complex clauses,and further enhanced in machine translation(MT).The research object gradually transits from observing individual cases of limited corpus to analyzing the totality of big data research corpus,while the research paradigm gradually shifts from introspective thinking to empirical argumentation.In the era of big data,natural language processing(NLP),which is closely related to artificial intelligence,is developing rapidly,and the requirements for the quality of machine translation of English complex sentences are increasing,and the researches on multiple paths of text simplification realization have emerged,thus making the efficiency and accuracy of machine processing of English complex sentences further improved.Although previous researches on English complex sentences have been advancing,we are still far from the truth of complex sentences.For instance,what are the characteristics of clause density,which is a typical marker of English complex sentences in written language? How is the embedding depth of subordinate clause distributed? When we are short of knowledge on this crucial issue,it is difficult to say that we have a clear and accurate macroscopic and comprehensive knowledge on English complex sentences,and it is even more difficult to say that there will be a substantial breakthrough in the text simplification of English complex sentences in machine translation(MT).To summarize,we have used Brown and LOB corpus and a self-built news corpus of 200,000 words from the New York Times to investigate the macro-and micro-dimensions of English complex sentences.In the process of doing the researches,the complex sentence frequency,complex sentence length,subordinate clause density,embedding depth,dependency distance,dependency direction,and hierarchical distance are measured respectively.The frequency of complex sentences is not irregular or completely arbitrary due to various factors such as sentence length and type of embedded subordinate clauses.Regardless of the differences in style and linguistic variety,the frequency of complex sentences is 39.41% at a macro level.In short,four out of ten sentences produced in daily life will be complex sentences.If we take the linguistic variety into account,the frequency of complex sentences in American English is 36.78%,while in British English it is 42.05%,with a slight difference between the two varieties,but the difference value is not significant.In addition,from the perspective of stylistic differences,the frequency of complex sentences fluctuates between 20.18% and 51.90%,and the effect of stylistic differences is significant.Among them,the highest frequency of complex sentences is found in the genres of religion,literature,biography and prose,while it was the lowest in the genre of fiction.This may be closely related to the seriousness or formality of the text.In addition,another key factor affecting the frequency of complex sentences is the length of sentence.Due to the limitation of people's cognitive ability,the sentence length of complex sentence in a text does not increase arbitrarily.It is found that the distribution of sentence length of English complex sentences conforms to the extended positive and negative binomial distribution.Moreover,stylistic differences have a significant effect on their distributions.Nevertheless,the distributions of sentence lengths of complex sentences do not differ significantly across linguistic varieties.It is well known that complex sentences are embedded by many subordinate clauses,and the number of subordinate clauses directly affects the frequency and length distributions of complex sentences.The two measure indicators,namely clause density and embedding depth,are closely related to the number of subordinate clauses.Interestingly,the distributions of clause density and embedding depth are not affected by differences in language style.More importantly,there is a tendency to minimize the syntactic complexity of English complex sentences in terms of the goodnessof-fit of both clause density and embedding depth indicators.The types of subordinate clauses embedded in complex sentences are diverse and their structures are complex and variable.Different types of subordinate clauses may bring different values of syntactic complexity increase to the corresponding complex sentences.Our study found that there was a significant difference among nominal clauses,relative clauses and adverbial clauses in terms of the increase in syntactic complexity.Among them,the nominal clauses are the largest,which may be related to their argument structure as the central verb in the main clause,while relative clauses and adverbial clauses are the smallest and syntactic complexity between them is approximately equivalent.Although the syntactic complexity of nominal clause is the highest,it contains many subcategories,which may have different effects on the syntactic complexity of nominal clause as a whole.Using dependency distance as an indicator,we found no significant differences in syntactic complexity among subcategories of nominal clauses.Between relative and adverbial clauses,which have similar syntactic complexity,we know that relative clauses are favored by many researchers for their structural complexity and variety compared to the latter.We have systematically explored the factors that make a difference in the syntactic complexity of relative clauses in terms of dependency distance and direction.In terms of sentence length,the clause embedding position has a significant effect on its distribution,while the dependency distance of relative clauses is not related to their embedding position.In the actual use of nominal clauses,relative clauses and adverbial clauses,we found that the relative clauses are often accompanied by a high percentage of ellipsis of the introducing words.With the help of the three indicators of dependency distance,hierarchical distance and hierarchical number,we found that dependency distance is the main constraint on the ellipsis of the introducing words,while hierarchical distance has little effect on it,but it is significantly influenced by hierarchical number.When and only when the hierarchical number is 1,i.e.,when the hierarchical number of relative clauses is the lowest,it is easier to elide the introducing words.This study proposes and initially practices a new path for the study of English complex sentences,bringing some potential new methods of quantitative researches to the study of English complex sentences,and filling some gaps in the field of English complex sentence researches from a quantitative empirical perspective.The new approach will help us to have a more comprehensive and clear understanding of the macro overview of English complex sentences.
Keywords/Search Tags:English complex sentences, Brown corpus, LOB corpus, syntactic complexity, quantitative research methods
PDF Full Text Request
Related items