Font Size: a A A

Research On Methods Of Intelligent Bilingual Search And Search Engine

Posted on:2010-05-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:D F LiuFull Text:PDF
GTID:1118360275999050Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of economic and international communication, translation market is quickly expanding. Translation using computer software, called machine translation (MT) becomes popular and MT based tools attract users' interests. There are various machine translation technologies which can be mainly classified into two categories: (1) rule based approaches and (2) corpus based approaches . The former has difficulty in solving language disambiguation , while the advantage of the later is the utilization of translation memory. In corpus based approaches, users can utilize originals and their corresponding translations to build one or several corpora. Then when a translation task is arrived, system will automatically search the corpora for the same or similar originals and finally return translation results.However, the corpora of many sorts of translation software using translation memory are manually built. The capacity of these corpora is limited and their update is slow. With the development of Internet, Web search engines are widely used as an important information retrieval means. We argue that the combination of translation technology and Web information retrieval can provide satisfactory, realtime and dynamic translation service for users.Judging from the object of study, data contained by Internet appears out great capacity , partly structurization, diversity, dynamic , distributed and the isomerism characteristic. Chinese-English pair of Web page resource having been stored on Internet website great capacity especially after having accumulated for many years.The great value of bilingual corpus for machine translation , machine aided translation,bilingual dictionary to compile , the bilingual terms drawing automatically, bilingual contrast studies as well as bilingual education already gets more and more approved. for machine translation or the machine aided translation ,bilingual corpus can produce effect in the field of two mainly. The bilingual corpus can provide translation example unceasingly for the translation engine based on translation memory,on the other hand, bilingual corpus are a buried treasure from which we can excavate the various fine-grained degree translator knowledge for machine translation and machine aided translation, these translate knowledge may arrive at a positive role to each links of translation .This thesis studies how to build Web-based large scale bilingual corpus, retrieval mechanism and search system implementation of searching intelligent bilingual Web pages. With the help of general Web search engines, bilingual corpus can be built automatically and MT can be realized. Furthermore, we make use of internet robot technology, Web page filtering technology, sentence matching algorithm, data mining, word segmentation technology, bilingual matching technology, intelligent user interface, personalized search, meta-search, rank aggregation algorithms, text information retrieval, Java programming, and so on. Our research can provide not only bilingual Web pages for professionals, but also high-quality translation service for market users. Our system can improve translation quality and efficiency by avoiding manual translation, thus it has great market and social profits.Our thesis is summarized as follows.(1) Within the large-scale Web information sources, we focus on finding Chinese-English bilingual Web pages including single-page bilingual ones and counter double-page bilingual ones .. For the latter, we propose a novel algorithm, called DBWCM (Double Bilingual Webpage Corpus Mining); for the former, we design a step-by-step approach, propose two novel algorithms, i.e., IPSBW(Identification and Purification of the Single Bilingual Webpage) and BSMCM(Bilingual Sentences Matching and Corpus Mining) . By using those algorithms, alarge number of original texts and their corresponding translation versions areextracted from Web pages. Thus we can build a large-scale bilingual corpus whichlays a solid foundation for aided translation..(2) We investigate the current user interfaces of search engines. Based on concept search and latent semantic analysis (SLA) , we build a bilingual synonymicon to expand search keywords, which suggests suggest related keywords given the input keywords of users. In addition, our bilingual query expansion improves traditional user interfaces, the intelligence and recall of bilingual aided search. Also, in order to improve search precision, we study how to learn and update users' preferences by explicit and implicit relevance analyses, and then personalized search results. PEBK (Personalized Expansion of the Bilingual Keyword) and(Personalized Sort of the Bilingual Results) are two novel proposed algorithms for personalization by considering the contexts of users such as time, interests, location, and so on.. (3) Meta-search are applied in bilingual translation to broaden search coverage. We analyze rank aggregation algorithms of meta-search and raise the problems of PageRank algorithm. Moreover, we propose a novel algorithm to enhance rank aggregation, called RSBS (Results Sort of the Bilingual Search) . Experimental results show that our algorithm is effective.(4) Last, we set up a bilingual search system based on our bilingual corpus by using Lucene and Java programming language. Seven function models are devised in our retrieval system. They are internet robot model, Web pages identification and purification model, bilingual Web pages matching model, index model, retrieval model, personalized search model, and user interface model. Our bilingual search system can provide users with aided translation service.Intelligent bilingual aided translation search covers several research fields. It has contained much field knowledge such as artificial intelligence , linguistics , machine translation completely , search engine , the Web data mining , the data base and so on. This thesis proposes some effective aided translation approaches by using Internet. In the future, there are still some interesting topics in building a efficient and highly intelligent aided translation system which are needed further exploration.
Keywords/Search Tags:Aided Translation, Search Engine, Bilingual Corpus, translation search
PDF Full Text Request
Related items