Font Size: a A A

Large-Scale Chinese Taxonomy Construction And Semantic Search Service

Posted on:2017-05-16Degree:MasterType:Thesis
Country:ChinaCandidate:J Y LiFull Text:PDF
GTID:2308330485472882Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Containing a large amount of facts, Chinese Wikipedia is considered as an important data source for constructing knowledge graphs. The taxonomy is a basic component in knowledge graphs, which specifies types of entities and relations between classes and en-tities. In existing knowledge graph systems, taxonomies are normally constructed either manually or by language-dependent approaches automatically. However, these automatic approaches are mostly designed for English language environment, which cannot be used to construct Chinese taxonomy directly. In this paper, we first analyze the titles and cate-gories in Wikipedia and extract 12 efficient features, from the perspectives of syntax, link structure and statistics, to train isA relation classification models. Skip-gram model is used to learn word embeddings from plain text in Wikipedia. We design Chinese patterns and heuristic rules to infer isA relations and high-level concepts. And a strategy based on association rule mining is utilized to establish isA relations between classes. We propose a bottom-up algorithm to construct Chinese taxonomy from individual isA relations and de-velop a Chinese taxonomy construction and search system (CTCS2) to provide semantic service.We summarize the main research contributions of this paper as follows:·We propose different methods to generate isA relations. A feature-based method is designed to extract isA relations from Chinese Wikipedia titles and categories. A two-stage learning method based on word embedding is used to mine isA re-lations from Wikipedia articles. We also apply inference-based and mining-based approaches to generate isA relations that are not explicitly expressed in Wikipedia categories.·We assemble these isA relations into a complete taxonomy by implementing the Chi-nese taxonomy construction algorithm. There are three operations, which are node merging, cycle removal and subtree merging, in the algorithm to build taxonomy in a bottom-up manner. We also develop a system (CTCS2) to provide semantic service.·Our constructed Chinese taxonomy is large, containing 581,616 entities,72,873 classes and 1,317,956 isA relations. We evaluate our taxonomy in scale and accu-racy measures. The experimental results show that our taxonomy has high coverage with an accuracy of over 95%.
Keywords/Search Tags:Knowledge graph, Taxonomy, Wikipedia, isA relation mining
PDF Full Text Request
Related items