Font Size: a A A

The Development Of Part-of-speech Tagging Software For Kazakh Language

Posted on:2016-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:E S A K A S H E NuFull Text:PDF
GTID:2308330476450609Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Part-of-tagging plays an important part in natural language processing(NLP). Kazakh is one of the common languages of ethnic minorities in Xinjiang region, some of the basic issues in NLP and information processing are become a problem that need to be solved urgently. In the modern society, the development of advanced technology as Machine Translation, Searching Engine, and information security technology all cannot do without the relevant research of NLP. It is a foundation engineering that create high–quality annotated corpus in modern Kazakh language, so the design and implementation of Kazakh tagging system has important theoretical and practical significance.According to the unique features of Kazakh language, this paper built a Kazakh words annotated corpus. At first, this paper proposed the introduction of concept of the natural language understanding. Then introduce the purpose, significance and corpora of this paper in detail in different languages and the relevant domain and international research. Last, study the design part of speech tagging system and related technology, implemented the basic Kazakh word tagging system.This paper not only systematically studied part-of-tagging theory for the Kazakh,but also made the processing of unknown word recognition in Kazakh. This research construct artificial annotated database, dictionary database, part-of speech data base in line. And achieve a Kazakh speech basic annotation system. The open corpus testing shows that the correct rate of POS tagging was 74.32%, and the closed corpus was 76.4%.Besides the part of speech tagging, this paper also achieves the function of word frequency statistics and POS statistics, namely simply statistic each words and each POS in the system.
Keywords/Search Tags:Kazakh language, NLP, Part of speech tagging, Corpus
PDF Full Text Request
Related items