Font Size: a A A

Chinese-Slavic Mongolian Named Entity Translation Based On Word Alignment

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:P YangFull Text:PDF
GTID:2308330461983058Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Named entity recognitions and translation are important factors to influence the performances of statistical machine translation. Now the research for Slavic Mongolian named entity recognition is less and they always based on the rule-based method; This approach requires manual annotation to the Slavic Mongolian corpora and the writing rules; that could lead to the time-consuming and difficult to cover all the named entity. According to the above problem, this paper puts forward a kind of automatically extracted from the Chinese to the Slavic Mongolian parallel named entities method.We implements the Chinese named entity recognition based on CRF. Focused on the on the key issues, that is recognition’s size and the characteristics select. We take full experiment to choose that. We get the conclusion that the recognition based on the word get better performance from the full experiment and we got the meaningful characteristics of named entity recognition that is based on the context of the word characteristics, word segmentation and part of speech features, various entities before the suffix words, etc.; Finally we got a recognition model of performance better. The entity classes in Chinese named entity recognition F value to an average of 91.67.We presents a Slavic Mongolian Chinese - named entity translation framework in this paper. It uses the asymmetric - Slavic Mongolian Chinese named entity alignment strategy. Above all, the Chinese side for named entity should be recognized. Then we can get from the Chinese-Slavic Mongolian word alignment results using sliding window method to extract candidate-Slavic Mongolian Chinese named entity pair. Then we get the words alignment consistency, vocabulary translation probability characteristics and the characteristics of language model estimation to the candidate translation to estimate the degree of confidence from the corpus. Then we select the high confidence level as the candidate translation to determine final extraction results for us. The accuracy of the Chinese- Slavic Mongolian named entity translation pairs can reach to 81.54%.
Keywords/Search Tags:Named entity recognition, Named entity translation, CRF, Word alignment
PDF Full Text Request
Related items