Font Size: a A A

Distributed Queries On Massive Knowledge Graphs

Posted on:2016-02-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J H JinFull Text:PDF
GTID:1108330488457741Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The volume of data people have collected has reached Zettabyte in the era of Big Data. To improve the qualities of query results, many search engines use knowledge graphs as their data sources. The knowledge graph is a huge network that contains places, people, cities and movies of the real world. With knowledge graphs, the search engines can infer the relationships among the entities and provide high quality results to the users. Currently, the knowledge graphs are noisy and massive in nature as they are automatically extracted from online knowledge bases, such as Wikipedia. Due to the features, it is challenging to perform queries efficiently on knowledge graphs. For this purpose, this dissertation aims to provide efficient and effective techniques for querying the knowledge graphs.The existing query techniques usually model the knowledge graph query as a subgraph graph matching problem. Although some recent works have been done to solve the problem, they still have some deficiencies. First, most of the existing models aim to find the answers that exactly match the users’ queries. Unfortunately, the knowledge graphs are incomplete and noisy in nature, so the models cannot be readily applied. Second, some recent research works propose sophisticated indices to accelerate query processing. However, these graph indices are almost infeasible for the knowledge graphs since they require expensive preprocessing processes. Third, the knowledge graphs are so massive that they should be stored and queried in a distributed manner. However, the existing distributed graph computation platforms cannot efficiently perform the queries on knowledge graphs. Thus, it is important to develop novel models, algorithms and platforms to address these issues.This dissertation studies the query model, the distributed query algorithm and the platform, and designs effective and efficient query processing techniques for the noisy and massive knowledge graphs. Specifically, we first propose a query model for the noise of knowledge graphs by finding the subgraphs that are most similar to the given query graph; we then design a distributed Top-k query algorithm for the billion-node knowledge graphs, accelerating the query processing using a novel bounding technique; after that, the optimization techniques of data storage and job scheduling are investigated, improving the efficiency of query platform in the real environment; at last, we show a prototype of knowledge graph search engine for the bibliography knowledge graphs and evaluate the effectiveness of the theoretical approaches proposed in the dissertation.Overall, a series of efficient and effective distributed top-k query techniques for massive knowledge graphs are explored in this dissertation. With the popularity of the knowledge graphs, the proposed techniques can be applied in the areas of in business, finance, life sciences, and more, which are valuable for the society in the future.
Keywords/Search Tags:Knowledge Graph, Top-k Query, Subgraph Similarity Matching, Big Data Processing, Distributed System
PDF Full Text Request
Related items