Font Size: a A A

Research On Spelling Checker/Corrector For Kazakh Corpora

Posted on:2009-07-31Degree:MasterType:Thesis
Country:ChinaCandidate:L Y E J E M H M T YiFull Text:PDF
GTID:2178360245985756Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of publishing industry, there are a large number of publication about E-books, E-newspaper, E-mails and documents emergency. How to guarantee the correction of these texts seems more and more important. The study of Kazakh text automatic correction system has become an urgent task.This paper carries out elementary discuss on Kazakh text correction system on the base of the study and analysis of the English and Chinese text correction technology and progresses beneficial try on the theory and technology of text correction, Then puts forward some primary method of Kazakh text. The errors of Kazakh contain two parts: non-word error and real-word error.This paper based on the conclusion of the error categories, carried on the discussion with different ways. Used characteristic of Kazakh alphabet, and support of the Kazakh lexicon in a certain scale. Compare the stem and affix, find non-word error out of it. Meanwhile, make use of that statistic probability of those Kazakh syllables and Characters to carry out a further eliminate to the non-word error. This kind of direct error detection, prevent the large calculation, decrease the complex of algorithm.On the part of Kazakh real-word error detection, according context, the paper progresses error detection by using 2-garm statistical model that takes advantage of the local connection of the text and the present probability and carries out the real-word error detection by using the Winnow algorithms that adopt the feature of text adjacency and longer distance.On the part of Kazakh language correction, the paper firstly carries on a proofreading for some special errors by using the characteristic of Kazakh language error, then brings about the correction suggestions for the non-word and real-word error by using the minimum edit distance and 2-gram model.
Keywords/Search Tags:Kazakh language, text correction, minimum edit distance algorithm, non-word errors, real-word errors, N-gram
PDF Full Text Request
Related items