Font Size: a A A

Research On Approaches To Bilingual Knowledge Graph Construction From Social Web Sites

Posted on:2019-06-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:T X WuFull Text:PDF
GTID:1368330590475131Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With continuous development of Semantic Web,different kinds of interlinked datasets consisting of tens of billions RDF triples have been published in World Wide Web(WWW).These datasets are also called knowledge graphs,which are fundamental resources to support semantic search,question answering,information analysis and other intelligent applications.Thus,constructing knowledge graphs has been an important research topic in both academia and industry.There already exist many works on the methods of knowledge graph construction,but they do not pay full attention to a very important source in WWW,i.e.social web sites,including the sites of electronic commerce,encyclopedia,question answering,blog,game,travelling and etc.Meanwhile,with the development of information globalization,cross-lingual knowledge alignment has become the key technology to many cross-lingual applications(e.g.cross-lingual information retrieval and cross-lingual semantic annotation).However,since English is the most spoken language in the countries all over the world,the number of English knowledge(including concepts,instances and triples)always plays a dominant role in existing multilingual knowledge graphs.The relatively small number of other language's knowledge is one of the main obstacles for cross-lingual knowledge alignment.Thus,how to effectively construct a bilingual knowledge graph when given any two languages,i.e.construct monolingual a knowledge graph for each language and perform cross-lingual knowledge alignment,is a research direction which urgently needs to explore.Existing related works only study how to construct a bilingual knowledge graph from online encyclopedias.Based on the above discussion,this dissertation chooses to study the approaches on bilingual knowledge graph construction.Since social web sites contain a large number of categories denoting concepts in taxonomies and tags also representing concepts in folksonomies,this dissertation chooses to apply a top-down way to constructing a bilingual knowledge graph from the schema level to the instance level.It first tries to mine relations among the concepts in social Web sites.This task here is called schema knowledge mining,and its existing methods relies on language-specific features and rules so that they are not general to any language.Considering cross-lingual knowledge alignment is one of the key tasks of bilingual knowledge graph construction,cross-lingual concept matching is chosen as the second task in this dissertation.However,the existing methods strongly depends on string similarities after translation and domain-specific information,causing that it is not general to any domain and often has unsatisfied matching performance.The third task aims at utilizing the technique of instance type inference to introduce instance knowledge to the constructed bilingual knowledge graph,but the existing works also relies on language-specific rules so that they are also not general to any language.Hence,to overcome the problems in the above three tasks,this dissertation provides the following solutions:1)With respect to schema knowledge mining,a new method combining machine learning with rules is proposed.Rules are embedded into the machine learning process.This method does not contain any language-specific feature and rule,so that it is general to any language.In experiments,this method is applied to schema knowledge mining in English social web sites and the Chinese ones,and the precision,recall and F1-score of the proposed method are all better than those of other baselines on the benchmark.Besides,it can help generate large-scale and high-quality English and Chinese schema knowledge.2)With respect to cross-lingual concept matching,a novel method based on bilingual topic models is presented.This method contains two new bilingual topic models,each of which can be used to learn vector representations of the concepts in different languages.The similar degree between the two given concepts in different languages is decided by their corresponding vector similarity.This method does not leverage any domain-specific information,so that it can be applied to any domain.Experimental results show that the proposed method outperforms other baselines in both precision@1 and MRR on two different kinds of English-Chinese taxonomies.3)With respect to instance type inference,a new method based on random walk is proposed.It performs random walks on the constructed graph consisting of extracted instances,attributes and concepts,and computes the probability of some concept being the given instance's type.This method does not contain any language-specific rule,so that it is general to any language.In experiments,this method is applied to instance type inference in English Wikipedia and the Chinese one,and the precision,recall and F1-score of the proposed method are all better than those of other baselines on the benchmark.Besides,it can help generate large-scale and high-quality English and Chinese type information.
Keywords/Search Tags:Bilingual Knowledge Graph, Social Web Sites, Schema Knowledge Mining, Cross-Lingual Concept Matching, Instance Type Inference, Semantic Web
PDF Full Text Request
Related items