Font Size: a A A

Domain knowledge, uncertainty, and parameter constraints

Posted on:2011-07-19Degree:Ph.DType:Thesis
University:Georgia Institute of TechnologyCandidate:Mao, YiFull Text:PDF
GTID:2448390002458299Subject:Computer Science
Abstract/Summary:
This thesis identifies three major issues in incorporating domain knowledge into supervised learning and discusses tentative attempts to solve them.;Domain knowledge is usually provided in a certain form that is not necessarily compatible with the supervised learning algorithm. In statistical modeling, it is a common practice to assume that data are generated iid from some distribution ptheta parameterized by theta ∈ Rn . Learning algorithms then try to identify the true parameter theta true given a set of examples. It is therefore natural to expect that domain knowledge should directly relate to the parameters theta for it to be useful. However, users tend to specify their knowledge in the form of probabilities of certain events rather than parameter value constraints, and it is not so obvious how to convert between them, especially for conditional models such as conditional random fields (CRFs). To overcome the difficulty, Chapter 3 developed a systematic way to obtain domain-dependent priors through probability elicitation and to incorporate them through parameter space regularization for conditional models. This leads to isotonic CRFs which are variants of CRFs with isotonic constraints over the parameter space. Chapter 3 applied isotonic CRFs to sentiment prediction and information extraction, and demonstrated their promising usage in modelling local sentiment flow, analyzing author's writing style and summarizing document content.;Domain knowledge provided by humans often holds with some degree of uncertainty. The uncertainty may arise when the knowledge set of an expert changes. This has already been demonstrated in a recent study showing that the ratings in the Netflix dataset are strongly affected by the time the ratings were provided by users. The uncertainty can be even more pronounced when the domain knowledge is obtained implicitly by interpreting user feedback such as clicks. It is not hard to imagine that a click on a search result does not necessarily mean the result is relevant to the query. The click can be random. In view of this, Chapter 4 proposes to explicitly model domain knowledge uncertainty by specifying the probability the knowledge is expected to hold, and aggregate both domain knowledge and its uncertainty into the learning process within a hierarchical Bayes framework. In contrast to hard parameter constraints, the approach is effective even when the domain knowledge is inaccurate and generally results in superior modelling accuracy. It therefore enables us to incorporate other non-traditional types of knowledge, such as information from trained classifiers whose usage has been severely limited due to its accuracy.;Standard approaches of incorporating domain knowledge admit information only at the initial stage and no user feedback is allowed afterwards. This contrasts with the belief that users may provide better knowledge if they are informed of intermediate learning results. For example in web search, one may build a webpage ranking model based on users' click feedback. When the model is in operation, new click feedback will be collected and the model should be refined accordingly. Therefore it is essential to provide users with a visual summary of the available information, and allow them to provide valuable feedback in real-time. This requires both an efficient learning procedure and the ability to support effective user interactions. Chapter 5 addressed this problem in the context of metric learning for text documents where users specify word similarity information on the fly. The problem is approached via learning techniques such as online update and Bregman projection. The effort leads to an improved metric for documents, and fosters better visual understanding of text corpus.
Keywords/Search Tags:Domain knowledge, Parameter, Uncertainty, Constraints
Related items