Font Size: a A A

Web-scale knowledge-base construction via statistical inference and learning

Posted on:2013-12-31Degree:Ph.DType:Dissertation
University:The University of Wisconsin - MadisonCandidate:Niu, FengFull Text:PDF
GTID:1458390008977052Subject:Engineering
Abstract/Summary:
Knowledge-base construction (KBC) is the process of populating a knowledge base (KB) with facts (or assertions) extracted from text. Bearing the promise of being a key technology of next-generation information systems, KBC has garnered tremendous interest from both academia and industry. A general trend in state-of-the-art KBC systems is the use of statistical inference and learning, which allow a KBC system to combine a wide range of data resources and techniques. In particular, two general techniques have gained significant interest from KBC researchers: the distant supervision technique for statistical learning, and the Markov logic framework for statistical inference.;This dissertation examines the application of distant supervision and Markov logic to web-scale KBC. Specifically, to fill a gap in the literature, we perform a systematic study on distant supervision to evaluate the impact of input sizes on the quality of KBC, hence providing guidelines for KBC system builders. While Markov logic has been shown to be effective for many text-understanding applications including KBC, the scalability of statistical inference in Markov logic remains a critical challenge. Inspired by ideas from data management and optimization, we propose two novel approaches that scale up Markov logic by orders of magnitude. Furthermore, we encapsulate our research findings into a general-purpose KBC system called Elementary, and deploy it to build a demonstration called DeepDive that reads hundreds of millions of web pages to enhance Wikipedia. Based on the above contributions, this dissertation shows that the distant supervision technique for statistical learning and the Markov logic framework for statistical inference are indeed effective approaches to web-scale KBC.
Keywords/Search Tags:KBC, Statistical inference, Markov logic, Web-scale, Distant supervision
Related items