Font Size: a A A

A Study On The Vocabulary Distribution And Growth Pattern For Spoken English

Posted on:2016-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:Z WangFull Text:PDF
GTID:2285330470978496Subject:Foreign Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Based on the BNC spoken corpus, this paper mainly addressed three issues concerning the vocabulary growth pattern. They are the inter-textual vocabulary distribution of the BNC spoken corpus, the comparison of the lexicon characteristics between the BNC spoken corpus and the BNC written corpus, and the test of the mathematic model for the vocabulary growth model.The size of the sample corpus from BNC spoken corpus is 4047400, and it is cut into 4000 individual texts with each text of about 1000 tokens. As is shown in the result, the vocabulary size of each individual text basically falls into normal distribution. After calculating the TTR, the BNC spoken corpus has a lower vocabulary density than BNC written corpus. The high frequency words reveal the characteristics of spoken English. According to the tolerance interval of 95% possibility, the theoretical upper bound and the lower bound vocabulary size of the sample size are calculated.Three mathematic models, namely Brunet’s model, Tuldava’s model, Herdan’s model, are tested against the empirical vocabulary growth pattern of BNC spoken corpus. It is found out that Brunet’s model is the most suitable one concerning the goodness of fit. Also, the Brunet’s model can be extrapolated. The parameters calculated in this study can be applied in texts of other sizes.
Keywords/Search Tags:Corpus, Spoken English, Vocabulary Growth, Mathematic Models, Inter-textual Distribution
PDF Full Text Request
Related items