Font Size: a A A

Research On Key Technologies Of Web-Based Construction Of Bilingual Vocabulary

Posted on:2010-10-31Degree:MasterType:Thesis
Country:ChinaCandidate:L GuoFull Text:PDF
GTID:2178360275958663Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Construction of binlingual vocabulary is a foundation work for the field of natural language processing.Binlingual vocabulary has a direct influence on the performance of nature language processing system such as machine translation,cross-language information retrieval.The words such as person names,place names,organization names, technical terms and all kind of new words appear frequently.These words can not be fully included no matter how large the bilingual dictionary is.They are named OOV(out of vocabulary).As time changed,new OOV appear.In order for a NLP system such as MT, CLIR,its bilingual lexicon needs to be constantly updated with new OOV translations.General speaking,we should resolve two types of technology for construction of bilingual vocabulary:(1) Acquire OOV,(2) Acquire translation of OOV.In this thesis, some researches have been done on how to resolve these two types of technology.In this thesis,we acquire OOV through the technology based on the identification of base phrase.First of all,mark the base phrase in the text,and then we see the base phrase that does not appear in the dictionary as candidate OOV.We use the Chunking model for identification of base phrase.The language is English in our method,but our method is language-independent.There are many transliteration words in OOVs.We can use the special method for acquiring the translation of the transliteration word.For example,transliteration model can be used.But first work is to identify the transliteration word.We propose two statistical models for identification of the transliteration word.The experiment shows that the precision is more than 97%.At the same time,some researches have been done on the identification of literal translation word and free translation word.We use the maximum entropy model as classifier,and use word-formation as feature.Finally,we draw some useful conclusions.Finally,we research how to mine OOV translation from comparable corpus.After computing the similarity between the context of source language word and target language word,we could confirm whether these two words is translation paries or not.We use dice and document retrieval model to calculate the similarity of the context,respectively.
Keywords/Search Tags:identification of transliteration word, identification of the literal translation word, translation mining, identification of base phrase, translation pairs extraction
PDF Full Text Request
Related items