Logic Knowledge Base Refinement Using Unlabeled or Limited Labeled Data

Posted on:2011-10-09

Degree:Ph.D

Type:Dissertation

University:The Chinese University of Hong Kong (Hong Kong)

Candidate:Chan, Ki Cecia

Full Text:PDF

GTID:1448390002455530

Subject:Computer Science

Abstract/Summary:

PDF Full Text Request

In many text mining applications, knowledge bases incorporating expert knowledge are beneficial for intelligent decision making. Refining an existing knowledge base from a source domain to a different target domain solving the same task would greatly reduce the effort required for preparing labeled training data in constructing a new knowledge base. We investigate a new framework of refining a kind of logic knowledge base known as Markov Logic Networks (MLN). One characteristic of this adaptation problem is that since the data distributions of the two domains are different, there should be different tailor-made MLNs for each domain. On the other hand, the two knowledge bases should share certain amount of similarities due to the same goal. We investigate the refinement in two situations, namely, using unlabeled target domain data, and using limited amount of labeled target domain data.;When there is no manual label given for the target domain data, we re-fine an existing MLN via two components. The first component is the logic formula weight adaptation that jointly maximizes the likelihood of the observations of the target domain unlabeled data and considers the differences between the two domains. Two approaches are designed to capture the differences between the two domains. One approach is to analyze the distribution divergence between the two domains and the other approach is to incorporate a penalized degree of difference. The second component is logic formula refinement where logic formulae specific to the target domain are discovered to further capture the characteristics of the target domain.;When manual annotation of a limited amount of target domain data is possible, we exploit how to actively select the data for annotation and develop two active learning approaches. The first approach is a pool-based active learning approach taking into account of the differences between the source and the target domains. A theoretical analysis on the sampling bound of the approach is conducted to demonstrate that informative data can be actively selected. The second approach is an error-driven approach that is designed to provide estimated labels for the target domain and hence the quality of the logic formulae captured for the target domain can be improved. An error analysis on the cluster-based active learning approach is presented. We have conducted extensive experiments on two different text mining tasks, namely, pronoun resolution and segmentation of citation records, showing consistent ii improvements in both situations of using unlabeled target domain data, and with a limited amount of labeled target domain data.

Keywords/Search Tags:

Data, Knowledge base, Target domain, Using unlabeled, Limited, Logic, Refinement, Approach

PDF Full Text Request

Related items

1	Research On The Methods Of Domain Semantic Knowledge Base Construction And Knowledge Service
2	An integrated approach to rule refinement for instructable knowledge-based agents
3	Design And Realization Of Domain Specific Knowledge Base Extraction Syste
4	The Research On Method Optimization Of Data Cleaning In The Construction Of Agricultural Domain Knowledge Base
5	Construction Of System Engineering Documents Based Domain Knowledge Base
6	Design And Building Of The Domain-specific Knowledge Base System For Internet Videos
7	Research On Integrating First-Order Logical Domain Knowledge With Machine Learning
8	Data Mining Techniques Guided By Domain Knowledge And Its Application In Extracting Of Traditional Chinese Pharmacy
9	A knowledge-based approach for generating target system specifications from a domain model
10	Partial Order Based Layout Strategy For Large-scale Knowledge Base Completion