Font Size: a A A

Preliminary Study On Statistic Of The Kazak Word Based On Corpus

Posted on:2011-03-16Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2178360305487270Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Vocabulary is the most active and vibrant element of a language system. General of Kazakh vocabulary is the core of whole vocabulary of a language which plays an important role in Kazakh language teaching dictionary compilation.Introduces the relation of Zipf in Kazak word segmentation.which is based on statistic the frequency of the word.Through the system continuous Kazak character bunch input can be segmented,and then the cut apart word bunch output can be gotten, the cut apart word bunch usually is two Kazak word bunch,and dictionary can be gotten.the dictionary stores Kazak word and the frequency that the word appears in these disposal tests,combines proceeding Kazak covariance of article experiment, Experiment result expresses the relation of frequency of the Kazak word,the resultant Kazak word frequency distribution is accord with power-law of Zipf.we adopt to tradition and amelioration methods and develop a calculate system call "the general of Kazakh words".Base on the automatic method of Kazakh words recognition arid extraction provide a quantitative way for Kazakh Basic Vocabulary for Language Engineering. The last describes a new algorithm based on N—Gram which can automatically recognize Kazak phrase and a novel Kazak lexicon construction approach for phrase segmentation is proposed.
Keywords/Search Tags:law-words, Zipf, repetition frequency degree, the general usage of Kazakh words, N-Gram
PDF Full Text Request
Related items