Font Size: a A A

A Corpus-based Study Of ? Gtam Rgyud Gser Gyi Thang Ma?

Posted on:2021-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhaFull Text:PDF
GTID:2415330623972954Subject:Linguistics and Applied Linguistics
Abstract/Summary:PDF Full Text Request
Language is being a symbolic system for expressing our thoughts and ideas,it is also being a communication tool between humans and machines.The rise of the NLP has improved human-computer interaction,and effectively promoted the development of information processing studies such as speech recognition and automatic retrieval,text error correction,and machine translation.For that large-scale corpora are used as basic resource construction among the AI and cognitive computing technologies which is widely spread,we can say that the study of Tibetan language in the future will be bound to stay close with the informatization.The prerequisite for Tibetan language to become informatized is to collect and construct large-scale corpora.Being efficiently process natural language and text information could make corresponding effectiveness to the social and research studies.For such reason,the resolution of how to expand the scale of language application is a huge responsibility and burden that the society gives to us.And how to apply language research in a timely and convenient way for the society and the human is a duty what NLP workers should undertake.For this reason,by using applied linguistic theoretical knowledge and computer technology to quantify natural language,which means by using the quantitative and qualitative methods to distinguish the structure and function of vocabulary and so on.This paper selects<gtam rgyud gser gyi thang ma>which is written by a Tibetan monk during his touring to the Southeast Asia in 20th century.By constructing the corpus,we studied the Geng Dun Qun Pei text style by using the manner of linguistic feature statistics.At the same time,we compare the similarities and differences between the three versions of<gtam rgyud gser gyi thang ma>and summarized the textual features of this book.The second chapter of the thesis mainly involves three points:1.briefly describes the generation,concept,and function of the corpus.2.by citing grammatical theory knowledge and language information processing related theories,it has explained that the classification of parts of speech must be considered in terms of the meaning,form,and function of the vocabulary.3.A brief introduction to the methods and related knowledge of Tibetan part-of-speech tagging.In the third chapter,made a a quantitative survey on the characters and syllables of<gtam rgyud gser gyi thang ma>by introducing the concepts and application fields of computational stylistics.And then,the vocabulary length and vocabulary richness,function words,high-frequency words,and high-frequency word class of the<gtam rgyud gser gyi thang ma>is accurately calculated in this chapter.high-frequency word class,etc.Had an Interpretation of the quantitative analysis and grammatical theoretical knowledge much more.At last,we gained the word length and average word length of the<gtam rgyud gser gyi thang ma>in the last section.Chapter 4 mainly makes quantitative analysis and statistical research through the classification and division of function words and national words.And interpreted on the word frequency changing of the national words and function words in Grains of<gtam rgyud gser gyi thang ma>by using the grammar and modern linguistic theories.In the chapter 5,we have studied on the named entities of<gtam rgyud gser gyi thang ma>.Analyzed the word frequency of the named entities like people,places,countries,tribes inside the book.Finally,we analyzed the reasons for the phenomenon of word frequency.We had a briefly summarize of this paper at the end of the paper.
Keywords/Search Tags:Corpus, Quantitative, gtam rgyud gser gyi thang ma, Vocabulary
PDF Full Text Request
Related items