Font Size: a A A

The Desing And Implementation Of Uyghur-Chinese Parallel Corpus Processing System

Posted on:2015-01-08Degree:MasterType:Thesis
Country:ChinaCandidate:S M L N Y Z AiFull Text:PDF
GTID:2298330431491846Subject:Computer technology
Abstract/Summary:PDF Full Text Request
As an essential resource for machine translation technology research, parallel corpus has an important role in machine translation, cross-language information retrieval, cross-language public opinion monitoring, lexicography, parallel teaching aspects. Traditional time-consuming and labor-intensive process of building the corpus difficult to adapt to the language and information changing speed, thus achieve parallel corpus processing system has great practical value for building parallel corpus.Corpus construction is a complex project, and requires the participation of different levels of professionals. Build high-quality, highly consistent bilingual corpus is a high challenging work.The system is carried out Research and Design mainly from three aspects. Firstly, proposed a flexible personnel roles and task management system for the flexibility of natural language and the subjectivity phoneme of participants, and make this system available to assign roles and tasks for every participants and every person who involved in the construction. Through the system, achieve a virtual organizational structure which can build a corpus construction team. Secondly, carried out extensive testing for the integrity of the data, the data table structure, database access, database concurrent connections aspects, and to achieve a stable and reliable database access algorithms. Because the system is to process documents, sentences and words, it is necessary to keep this affiliation and alignment of the relationship between the three levels of alignment, so Data tables were tested for data integrity. Thirdly, design and achieve manual alignment architecture of documents, sentences, words and affixes, simultaneously design the automatic alignment interface. Currently, carried out integration test of sentences and words automatic alignment components.To concede, this article is design a more comprehensive system of bilingual corpus build system, the system is completed document, sentence and word alignment, artificial sentence, sentence alignment and audit tasks of Uyghur-Chinese corpus of News Feeds for nearly three years., and build a120,000Uyghur-Chinese sentences and words aligned corpus.
Keywords/Search Tags:bilingual corpus, sentence alignment, word alignment
PDF Full Text Request
Related items