Research On Key Technologies Of Web-Based Construction Of Bilingual Vocabulary

Posted on:2010-10-31

Degree:Master

Type:Thesis

Country:China

Candidate:L Guo

Full Text:PDF

GTID:2178360275958663

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Construction of binlingual vocabulary is a foundation work for the field of natural language processing.Binlingual vocabulary has a direct influence on the performance of nature language processing system such as machine translation,cross-language information retrieval.The words such as person names,place names,organization names, technical terms and all kind of new words appear frequently.These words can not be fully included no matter how large the bilingual dictionary is.They are named OOV(out of vocabulary).As time changed,new OOV appear.In order for a NLP system such as MT, CLIR,its bilingual lexicon needs to be constantly updated with new OOV translations.General speaking,we should resolve two types of technology for construction of bilingual vocabulary:(1) Acquire OOV,(2) Acquire translation of OOV.In this thesis, some researches have been done on how to resolve these two types of technology.In this thesis,we acquire OOV through the technology based on the identification of base phrase.First of all,mark the base phrase in the text,and then we see the base phrase that does not appear in the dictionary as candidate OOV.We use the Chunking model for identification of base phrase.The language is English in our method,but our method is language-independent.There are many transliteration words in OOVs.We can use the special method for acquiring the translation of the transliteration word.For example,transliteration model can be used.But first work is to identify the transliteration word.We propose two statistical models for identification of the transliteration word.The experiment shows that the precision is more than 97%.At the same time,some researches have been done on the identification of literal translation word and free translation word.We use the maximum entropy model as classifier,and use word-formation as feature.Finally,we draw some useful conclusions.Finally,we research how to mine OOV translation from comparable corpus.After computing the similarity between the context of source language word and target language word,we could confirm whether these two words is translation paries or not.We use dice and document retrieval model to calculate the similarity of the context,respectively.

Keywords/Search Tags:

identification of transliteration word, identification of the literal translation word, translation mining, identification of base phrase, translation pairs extraction

PDF Full Text Request

Related items

1	The Research On English-Chinese Name Entity Translation
2	Word Pair Extraction And Web-based Mining Of OOV Translations
3	Research On Chinese Complex Noun Phrase Translation Extraction Based On Multi-strategy
4	Study On Several Key Problems In The Training Process Of Phrase-based Statistical Machine Translation
5	Research On Key Technologies Of English-Chinese Machine Translation System
6	Research And Implementation Of Hierarchical Phrase-based Translation Model In Statistical Machine Translation
7	Research And Implementation Of Hierarchical Phrase-Based Translation Model In Statistical Machine Translation
8	Research On Extraction Of Bilingual Multi-word Term Translation Pairs From Comparable Corpora
9	Research On Text Generation Technology Oriented To Template Based Machine Translation
10	Study On Word Alignment For Re-ordering Of Web-mined OOV Translation Candidates