Font Size: a A A

Construction And Application Of The Northeastern Native Spoken Language Corpus

Posted on:2010-01-29Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhangFull Text:PDF
GTID:2178360272479358Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, many linguists and computer experts have recognized the phenomenon of dealing with various languages by using the corpus. In this circumstance, various types of corups came into being. But the construction of the Northeastern Native Spoken Language Corpus (NNSLC) is still in the blank. This paper selects the construction of the NNSLC as the research topic and refers to the notes of the Notes Corpus in "People's Daily". It inspects the characteristic of the northeastern native spoken language by companing the construction of the NNSLC with the grammatical phenomenon in the Notes Corpus of "People's Daily".First of all, I surveyed the development and the status quo of the corpus-building at home and aborad, the status quo of computer linguistics as well as building the spoken language corpus. Then I decided to make the "People's Daily" notes corpus as aknowledge source and the northeastern native language as the unknown basic corpus. I have dissussed the basic skills of marking the spoken language and explained the processing of NNSLC in detail.Secondly, I took the Microsoft SQL Server as the technology platform of building the corpus. Based on this, I determined the general design and the basic framework of the NNSLC and discussed means of implementing the corpus along with the problem that putting the corpus into the storage in the corpus management system and how to provide an access control interface for the corpus which has been dealt with.Finally, I have studied the characteristics of the northeastern native spoken language including both the lexical and grammatical study of the northeastern native spoken language by making use of this corpus. And I have also counted the high-frequency vocabularies with the Northeastern characteristics.In this paper, the main characteristics are showed in the following two aspects:1),It is the first time to build the NNSLC and study the Northeastern spoken language by using this corpus which includes the study of vocabulary and grammar as well as the statistics of frequency and the sentence length.2),According to the characteristics of the corpus itself. I have provided a wealth of access control interface for the corpus which have been marked or have already got counted. The function of the corpus has been enhanced.In this paper, an initial size of spoken corpus has been built and compared by some experiements some certain condusions can be drawn. At the same time, it provided a good platform for linguists and the study of northeastern native spoken language.
Keywords/Search Tags:Northeastern native spoken language, Corpus, Data tables, Corpus notes
PDF Full Text Request
Related items