Font Size: a A A

Research On The Construction Of Chinese-Vietnamese Bilingual Knowledge Graph Based On Wikipedia

Posted on:2022-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:P F HanFull Text:PDF
GTID:2518306524952189Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the "One Belt And One Road" initiative,China is getting closer in political,economic and other fields with Vietnam.The establishment of a shared knowledge base between China and Vietnam will play a great role in promoting bilateral cooperation.However,The huge and nonstandard knowledge on the Internet and the high cost of manual tagging of small language knowledge such as Vietnamese lead to the lack of Chinese Vietnamese public knowledge base.In view of the clear structure and rich content of encyclopedia knowledge,this paper studies the construction and research methods of Chinese-Vietnamese cross language knowledge map based on encyclopedia.The purpose is to build a unified classification system,which will be Chinese and Vietnamese knowledge link and complement each other,and further build the knowledge base of the Chinese,the knowledge sharing,for many online applications such as information retrieval of Chinese and Vietnamese,machine translation between Chinese and Vietnamese and so on to provide strong support.This paper mainly completed the following research work:(1)The framework of cross-language knowledge graph is mostly made use of Wikipedia,but there are few Chinese entities,so it is difficult to build a large-scale cross-language knowledge graph with Chinese as the core.How to use the existing large-scale Chinese encyclopedia knowledge base such as Baidu Encyclopedia to assist the construction of cross-language knowledge map is an urgent problem to be solved.However,Wikipedia and Baidu Encyclopedia belong to different classification systems,which increases the scope and difficulty of cross-encyclopedia retrieval.Aiming at the construction of Baidu Encyclopedia and Wikipedia public classification system,a variational semi-supervised Baidu Encyclopedia classification that integrates Wikipedia knowledge is proposed to realize the transfer and unification of the two classification systems.Because the body structure of the encyclopedia entry is complex and too long,the abstract of the entry is simpler and also contains more information than the body.Therefore,the abstract of encyclopedia article is selected to classify encyclopedia.Because the structure and repetition of the encyclopedia abstract are mostly similar but the text length is different.Therefore,the deep semantic features and statistical features are integrated to represent the abstract.The semantic features were extracted by word embedding and attention mechanism,and the statistical features were extracted by word bag model,which solved the problems of similar structure and different length of the abstract.Due to the huge amount of data in encyclopedia knowledge,and the low classification accuracy of unsupervised method but with a high cost of manual annotation,we used a small amount of annotated Wikipedia knowledge to conduct semi-supervised classification of massive data in Baidu encyclopedia.The experimental results show that this method can accurately establish the classification system of Baidu encyclopedia,and unify the classification system of Wikipedia and Baidu encyclopedia.(2)Cross language knowledge linking is the task of creating links between different language entities or articles,which is the basis of building cross language knowledge graph.The current cross-language link tasks are mostly based on Wikipedia,but the small-language cross-language links,especially the small number of existing Chinese-Vietnamese cross-language links,have caused difficulties in building a Chinese-Vietnamese cross-language knowledge graph.Aiming at the shortage of existing Chinese-Vietnamese cross-language links,this paper proposes a Chinese-Vietnamese bilingual knowledge link based on encyclopedia,which realizes the alignment and completion of the Chinese-Vietnamese bilingual encyclopedia knowledge,and makes up the missing data of Wikipedia through the massive data of Baidu encyclopedia.The vector trained in the classification task is used,and then the corresponding entries of Chinese Vietnamese are searched in cross encyclopedia,and the equivalent entities in different languages are linked by learning method.Some features are defined based on the link structure in Wiki to evaluate the similarity between the two entities.The experiment shows that the method can effectively improve the accuracy of cross language knowledge alignment.The experimental results show that the method can improve the cross language knowledge alignment accuracy effectively.(3)Build a Chinese-Vietnamese bilingual knowledge map retrieval prototype system.The system collects text from the encyclopedia website platform and integrates the semi-supervised classification model and the knowledge link model proposed in this paper,then after that,automatically analyzes the data,constructs the knowledge map,and finally presents the same language related information of Chinese-Vietnamese knowledge and the same cross-language entry information to users in the form of interface.
Keywords/Search Tags:Chinese Vietnamese bilingual, semi-supervised classification, attention mechanism, transfer learning, knowledge linkage, knowledge graph
PDF Full Text Request
Related items