Font Size: a A A

Research On The Automatic Obtainment Of Chineses Semantic Knowledge Via Cross-Lingual Projection

Posted on:2017-08-17Degree:MasterType:Thesis
Country:ChinaCandidate:X Q LiFull Text:PDF
GTID:2348330509957114Subject:Computer science and technology
Abstract/Summary:PDF Full Text Request
Semantic Knowledge Base is a structured database, in which the relationship between the entities are stored. Nowadays, semantic Knowledge Base has become the hot research in academia and industry, and Knowledge Base has a very important significance in the practical application, such as, semantic search, question and answer. However, the existing semantic knowledge base has expressed in English only, and the scale of Chinese semantic knowledge base are always small. So, it is very important to build Chinese semantic knowledge base for the researchers who are working in Chinese natural language processing.Machine translation technology provides a way to translate a sentence automatically from the source language to the target language. This technology methods generally used in the automatic translation of natural language. We present a statistical machine translation approach to translate the existing Knowledge base which does not represent in Chinese to build a Chinese Knowledge Base. Unlike the machine translation in the sentence level, we present an approach to translate the label of object from source language to Chinese.Knowledge Base translation is a very difficult task, because knowledge base contains highly specific vocabulary and it may be out of vocabulary in exsiting bilingual parallel corpus. So we propose an approach to extract the parallel corpus from Internet with the source-side label of object to construct the training data. Otherwise, to take into account of the Baidu Encyclopedia which is a large Chinese knowledge base, we take use of the bilingual labels in Baidu Encyclopedia to expand the Knowledge Base translation.The layered structure of semantic Knowledge base can be used to improve the performace of knowledge base translation. So we introduce a transliteration feature which takes advantage of the type and property of object to translate the out-of-vocabulary(OOV) of person object in knowledge base. Otherwise, in order to utilize the attribute of entity in knowledge, we additionally add a genre feature to improve the translation of OOV of name entity.The labels of the object in Knowledge Base always built out of only a few words and so can't express enough contextual information, therefore it is very difficult to translate the lables to specific domain. Taking account of the graphic structure of Knowledge Base, we expand the label with its property and other objects conneted to it. In order to improve the accuracy of the translation in domain specific knowledge base, we present a topic-adapted probabilistic phrase translaiton features, and the topic distribution of labels are training in the extension of lables. We use the Latent Dirichlet Allocation to compute the topic distribution.
Keywords/Search Tags:knowledge base, statistic machine translation, bilingual corpus, out of vocabulary, topic model
PDF Full Text Request
Related items