Font Size: a A A

Research On Bilingual Lexicon Construction Between Chinese And English From Comparable Corpora

Posted on:2013-03-27Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2248330371993525Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bilingual lexicons, as fundamental linkage for cross-language information processing, are important resources in natural language processing, and play an important role in many applications, such as machine translation and cross-language information retrieval etc. After in-depth study of existing bilingual lexicon construction methods, this paper proposes a dependency relationship mapping model to improve bilingual lexicon construction (BLC) between Chinese and English from comparable corpora. The contributions lie in:1) Implementing a baseline BLC system based on a dependency context model. Following a traditional dependency context model, this paper extracts the words in the dependency tree within a fixed window size to form the words’ contexts and thereby constructs lexicons between Chinese and English. The effect of window size, association strength metrics and similarity calculation methods are also compared.2) Proposing a dependency relationship mapping model for BLC. To overcome the disadvantages of the dependency context model, the paper proposes a dependency relationship mapping-based method for BLC, which constructs bilingual lexicons by mapping dependency context words and their relationships simultaneously.3) Investigating the methods for automatic acquisition and optimization of the dependency relationship mapping. To avoid the limitation of manually deriving dependency mapping rules, this paper proposes an automatic method for acquiring dependency mapping rules, which are filtered by an ablation testing algorithm. The weights for these rules are further automatically learned by perceptron algorithms to improve the performance and adaptability.Experimental results on bilingual lexicon construction between Chinese and English show that the dependency relationship mapping model can significantly improve the performance of BLC between Chinese-English and English-Chinese. And also, the automatic acquisition and filtering methods can effectively tease out the key ones from all dependency mapping rules. Followed by weight learning by perceptron algorithms, the BLC performance is further improved, and the adaptability of the dependency mapping model is also enhanced.
Keywords/Search Tags:Bilingual Lexicon Construction, Dependency Context, DependencyRelationship Mapping, Perceptron Algorithm
PDF Full Text Request
Related items