Construction of binlingual vocabulary is a foundation work for the field of natural language processing.Binlingual vocabulary has a direct influence on the performance of nature language processing system such as machine translation,cross-language information retrieval.The words such as person names,place names,organization names, technical terms and all kind of new words appear frequently.These words can not be fully included no matter how large the bilingual dictionary is.They are named OOV(out of vocabulary).As time changed,new OOV appear.In order for a NLP system such as MT, CLIR,its bilingual lexicon needs to be constantly updated with new OOV translations.General speaking,we should resolve two types of technology for construction of bilingual vocabulary:(1) Acquire OOV,(2) Acquire translation of OOV.In this thesis, some researches have been done on how to resolve these two types of technology.In this thesis,we acquire OOV through the technology based on the identification of base phrase.First of all,mark the base phrase in the text,and then we see the base phrase that does not appear in the dictionary as candidate OOV.We use the Chunking model for identification of base phrase.The language is English in our method,but our method is language-independent.There are many transliteration words in OOVs.We can use the special method for acquiring the translation of the transliteration word.For example,transliteration model can be used.But first work is to identify the transliteration word.We propose two statistical models for identification of the transliteration word.The experiment shows that the precision is more than 97%.At the same time,some researches have been done on the identification of literal translation word and free translation word.We use the maximum entropy model as classifier,and use word-formation as feature.Finally,we draw some useful conclusions.Finally,we research how to mine OOV translation from comparable corpus.After computing the similarity between the context of source language word and target language word,we could confirm whether these two words is translation paries or not.We use dice and document retrieval model to calculate the similarity of the context,respectively. |