Font Size: a A A

Reasearch Into Chinese-English Sentences Alignment Based On Particle Swarm Optimization

Posted on:2013-04-18Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2248330362974224Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Construction of bilingual corpus plays an important role in such areas as NaturalLanguage Processing, Data Mining, Machine Translation, Dictionary Compiling andMultilingual Information Retrieving, etc. The abundance of the bilingual translationtextss on the Internet result in a critical issue, that is, how to establish a bilingual corpusby processing the bilingual translation texts. The key of processing bilingual corpora isalignment, the core of Chinese-English bilingual alignment sentences is to seek for boththe source segment and the translation one in the bilingual texts.The granularity of the alignment contains: chapters, paragraphs, sentences, wordsand phrases, etc. For the sentence granularity is less than the paragraph granularity, thusthe aligned sentence is so critical that aligned paragraph can benefit from it, meanwhile,the aligned sentence provides necessary precondition for aligned words and phrases, etc.Chinese and English are the most typical languages in the world, therefore,Chinese-English bilingual sentences alignment is of important research significance.Moreover, it is an essential step for Chinese-English bilingual corpus heading topractical application. Based on the research of methods of sentences alignment forconstructing Chinese-English bilingual corpus, this paper proposes that Chinese-Englishbilingual sentence alignment problem can be transformed into the problem ofsearching an optimal solution in the Chinese-English bilingual sentences. The mainwork of this paper includes:①This paper introduces related technologies of Chinese-English bilingualsentences alignment, pretreatment method and evaluation criteria of sentencesalignment. Then this paper analyzes the difficulties of alignment and proposes atwo-step iterative model to search an optimal solution in bilingual sentences, thus solvesChinese-English bilingual sentences alignment problems;②In view of theinconsistency between the feature of Chinese and Englishbilingual sentence space, this paper introduces the Canonical Correlation Analysis(CCA)to find the typical subspace and formalize a representation which is the Chinese-Englishbilingual sentences alignment fitness function by analyzing alignment problem. Thereby,the alignment problem is transformed into searching the optimal solution in thesubspace.③This paper imposes Particle Swarm Optimization algorithm to search the optimal solution of the fitness function. Take into account that the elementary ParticleSwarm Optimization algorithm is prone to premature convergence, stagnation, and theoversize of potential solution space, this paper improves sentence alignmentperformance and proposes an improved PSO to search the optimal solution.In this paper, in terms of the formal description of Chinese-English bilingualsentences alignment problem,the fitness function is proposed, which transforms theChinese-English bilingual sentences alignment problem into finding the optimalsolution with method of PSO. The experimental results show that the method proposedachieves a good alignment accuracy rate and is feasible and effective to solveChinese-English bilingual sentences alignment.
Keywords/Search Tags:Natural Language Processing, Bilingual corpus, Chinese-English SentencesAlignment, Canonical Correlation Analysis, Particle Swarm Optimization
PDF Full Text Request
Related items