Font Size: a A A

Study On Some Issues Of Bilingual Assisted Translation Search Engine

Posted on:2010-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhouFull Text:PDF
GTID:2178360275951804Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of China's economy and foreign exchange, translation market expands rapidly, the use of computer translation tools is growing year by year, a variety of machine translation techniques and methods followed fast. There are two major methods in the field of machine translation, rule-based methodology and corpus-based approach. The first method is very difficult to solve the problem of ambiguous language, but the second adopts translation memory, users can use original text and translation to establish one or more of the corpus, in the process of translation, the system searches the same or similar translation resources in the library automatically, and gives result.The paper put forward a new aided translation's pattern, Bilingual Assisted Translation Search Engine. It's different from traditional machine translation, does not rely on the computer's automatic translation, but according to relevant translation list given by system, and get the correct translation by people. Compared with automatic translation machine, it has better quality; compared with artificial translation, it has more efficient. For users, the relevant translation we provide more accurate and matching unless there is a large corpus, so the core of the system is the construction of the bilingual corpus. The paper adopted the method of web data mining and search engine technology, completed the construction of large-scale corpus automatically.In the study of this paper, I have finished the work following:1. After analyzed the relevant application of the information technology in the translation field, I have known the development of the translation technology, and then bring forward a new aided translation method in the basis of technology memory. The method adopted web data mining to construct corpus, according to the keywords users requested and then give them relevant translation list.2. Adopted the method of web data mining and search engine technology, collected the single-page text and two-page text with bilingual information in internet, after a series of complicated identify, purification and analyzed the DOM structure of the web page, then got the Chinese and English parallel translation corpus and save them to database.3. Based on large-scale corpus, indexed them by Lucene and provided users search application interface, responded relevant translation list for them and they can get the correct translation.4. Finally, we separate the system into 4 major modules, collection, extract corpus, index and interface, and given a solution on distributed integration.
Keywords/Search Tags:Aided Translation, Search Engine, Web Data Mining, Bilingual Corpus
PDF Full Text Request
Related items