Font Size: a A A

Research Of A Kazakh Sentence Similarity Computing

Posted on:2013-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:A G L H Y D E JiangFull Text:PDF
GTID:2218330374466860Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Sentence similarity computing technology in the field of natural languageprocessing system has a very wide range of applications, such as: instance-basedmachine translation system, automatic summarization system, information retrievalsystems and question answering system. Therefore, the similarity calculation hasbecome a priority research issues for natural language processing.This article focuses on the Kazakh between sentence similarity computingproblems, how to effectively extracted Harvin-Chinese bilingual corpus-basedmachine translation is most similar to the input sentence instance sentence. Specifically,this paper includes the following:First of all, the Kazakh stemming algorithms. Kazakh adhesion language is alanguage affix the stem attached to a variety of word formation and the conformationchange of lexical meaning and grammatical meaning. Thus, the characteristics ofKazakh sentence similarity computing certain difficulties. Instead, the sentencesimilarity calculation, the first sentence the minimum module word exact value forstemmer would be beneficial to calculate the sentence similarity. This paper describesthe algorithm of the of Kazakh finite state automata, to achieve word stemmingalgorithms.Again, presented a combination of word form, word order, sentence length and anglevalues of similar units Kazakh sentence similarity calculation method. Instance-inmachine translation (EBMT) using the principle of analogy to be translated to the givenconditions of similar instances, be able to produce a smooth translation. Therefore, howin the large-scale instances of library to retrieve the most similar instance of greatsignificance for the quality of machine translation systems. This paper presents anKazakh similar case retrieval method and designed a series of similarity measure usedto calculate the similarity of the input sentence training corpus instance, to improve the quality of the retrieval and translation.Ultimately, on the basis of calculation of Kazakh sentence similarity calculation,multilingual instance-based machine translation system, ha statement sub-similaritycalculation. Similarity computation with the user input sentence most similar Kazakhinstance, in order to make the appropriate translation work, and improve the quality ofthe translation between the Kazakh and Chinese.
Keywords/Search Tags:Natural language processing, Kazakh language, machine translation, sentence similarity, stemmer
PDF Full Text Request
Related items