Font Size: a A A

Academic Bilingual Resource Research Based On Web Paper Library

Posted on:2009-12-14Degree:MasterType:Thesis
Country:ChinaCandidate:X L LiaoFull Text:PDF
GTID:2178360272986744Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bilingual resources play a very special role in the field of computational linguistics research and provide strong supports to researches and applications such as machine translation, bilingual lexicography, terminology extraction and cross-language information retrieval. Research on bilingual resources is confronted with the following three issues: How to get bilingual resources? How to process bilingual resources? How to develop applications with bilingual resources? Based on academic bilingual resources, these issues are discussed and resolved in this paper.It's a natural way to get academic bilingual resources from web paper library. This paper designed and implemented a web paper library crawler. The crawler crawlled the library incrementally and efficiently got academic bilingual resources to build a dynamic updating academic bilingual resource corpus. Then this paper discussed how to align sentences on academic bilingual resource corpus. This paper realized the classic statistics-based sentence alignment algorithm and made a series of improvements including not using colons in sentence punctuation, choosing better sentence pair evaluation function, combinating keyword information of academic bilingual resources and increasing match patterns to process, etc. These improvements significantly increased the accuracy and recall of the sentence alignment algorithm. After the sentence alignment, the academic bilingual resources were stored as xml files. Finally, this paper built a phrase-based statistical machine translation system on the academic bilingual resource corpus. This confirmed the availability of the academic bilingual resources.Academic bilingual resource research based on web paper library provided a new perspective to the solution of the three issues: accessing, processing and application of the bilingual resources. How to solve these issues in an even better way will be the next research direction in the future.
Keywords/Search Tags:bilingual resources, crawler, sentence alignment, machine translation
PDF Full Text Request
Related items