Font Size: a A A

Chinese Uygur Kazak Kirgiz Bilingual Corpus Processing System Design And Implementation

Posted on:2014-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:T L F M M MuFull Text:PDF
GTID:2268330425468073Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Language is unique to humans, and has close relationships with humans. Manypeoples of the world have their own languages, together with the matching characters.The informatization of languages and characters has been a major concern; relevantresearches around the world have been gradually deepening; and researches ofinformatization of minority languages have been put on the agenda. China has a varietyof ethnic minority languages, and relevant researches stem from1930s and1940s, whilethe level of minority language processing, informatization is still rather low. With therapid development of society, the informatization requirements of minority languagehave been increasing urgently. Multi-lingual corpus processing system, as one of thecore elements of multilingual information sharing platform, its importance isincreasingly apparent during the construction of the multilingual information sharingplatform.This project makes the Internet as the major platform of research and utilizing.With the consideration of the status of the informatization status of minority languagesin China, this project raised a design idea of service oriented, C/S based, distributedbilingual parallel corpus. Comprehensive analyses of the bilingual corpus systemnecessity and functionality design were careful achieved during the first phase. Itconstructed the identity authentication mechanism to secure the core data, implementedthe system functionality of document importing, auditing, deleting, document propertiesviewing, etc. it also made researches on technologies of document alignment, sentencealignment, and word alignment. At last, this project achieved the word segmentationand word composition in Uighur. This project implemented with C#.The core application of the corpus processing system is placed on the Internet andis available for all the clients, so as to efficiently increase the corpus construction speed,meanwhile, cut the costs. This system consists of document management module,alignment management module, and query module. At present, the system has passedthe relevant tests and works well.
Keywords/Search Tags:Corpus, sentence alignment, word alignment, c#
PDF Full Text Request
Related items