Font Size: a A A

Research On Web-based Chinese-English Bilingual Dictionary Generation

Posted on:2015-11-14Degree:MasterType:Thesis
Country:ChinaCandidate:S WuFull Text:PDF
GTID:2298330422488493Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The bilingual translation has been known as the bridge that enables the culturalexchange from cross-language perspective. The bilingual dictionary should be also used asan important tool in the bilingual translation as well as a major source of studying foreignlanguage. Due to the fact that it will take a lot of energy and time to work out an dictionaryand the two target languages, namely Chinese and English, are also going through a rapiddevelopment featured by the emergence of new words, the update of the bilingual dictionaryhas failed to keep up with the times, which also constitutes a major challenge for thecompiler. Based on the data collected from Web corpus, this paper has sought to make anin-depth analysis of the Chinese-English translation extraction from the perspective ofinformation extraction, information filtering, knowledge acquisition and knowledgeverification so as to push forward the automatic compilation of the Chinese-Englishbilingual dictionary and the automatic acquisition of the Chinese-English knowledge.The paper can be divided into the following parts.(1) The paper will look deep into the major problem and current situation of theextraction issue during the course of Chinese-English translation so as to overcome itsshortcoming by providing a means of Web-based extraction, whose basic principle will bealso dealt with.(2) The extraction technology that is based on the Web information and regularexpression will be also adopted so as to collect a great amount of target material withChinese-English translation from Web. During the process of preprocessing the targetmaterial, a material filtering system based on the evolution of the predicate expression willbe also proposed. To be more specific, the rules of material filtering will be firstlyestablished so as to achieve the automatic filtering and also lay a solid foundation of theprecise extraction in the future translation.(3) Based on the property of the translation material, the paper has also come up withtwo modes of translation extraction, namely the form-based and the statistics-based. As tothe latter one, three specific methods of translation extraction, which are from theperspective of changing probability of occurrence of Chinese characters, informationentropy of Chinese characters and the cohesion of the phrase respectively, are alsointroduced in the paper. For the few of the translation to be extracted that cannot be handled by the three methods above, an extraction method that is based on the stop words will bealso employed so as to ensure the recall rate of the extraction.(4) After the extraction, the paper will also offer a correction method of English wordsbased on the frequency of occurrence and editing distance. What is more, a new way that istargeted at the optimization and integration of translation will be also provided. Aftercategorizing the extracted translation and calculating the accuracy, quantity and credibilityof each group, the paper will arrange all of optimal translations according to the credibilityso as to create a Chinese-English bilingual dictionary in an automatic manner.The experimental results based on a great amount of Web materials have shown thatthe Web-based Chinese-English bilingual dictionary proposed in the paper is featured by theexcellent feasibility and practicability and thus can be used to greatly improve the compilingefficiency.
Keywords/Search Tags:Chinese-English translation, Web mining, Information filtering, Knowledgeacquisition
PDF Full Text Request
Related items