Font Size: a A A

Research On Context-Based Statistical Relational Learning

Posted on:2006-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y H TianFull Text:PDF
GTID:1118360185995687Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The vast majority of work in statistical machine learning methods has focused on"flat"data– data consisting of identically-structured entities, typically assumed to be independent and identically distributed (IID). However, many real-world datasets are innately relational: hypertext, web pages or sites, web images, scientific papers, e-books, educational resources and more. Such semi-structured relational data consist of entities of different types, where each entity is characterized by a different set of attributes and generally has complex internal structure. Entities are related to each other via different types of relations. The relational structure is an important source of semantic information, which is often ignored by the traditional statistical learning methods. Thus the paper focuses mainly on how to explicitly exploit such relational information in statistical learning tasks so as to build more effective and more robust models.The main methodology used in this paper stems from the context-based modeling and analysis. Here the context is defined as a collection of relevant objects and surrounding influences that make the semantics of an object unique and comprehensible. Accordingly, the contextual dependency can be regarded as a special relationship among related objects that conveys explicit semantic correlation. Starting with an in-depth discussion of the related work on context analysis methods and statistical relational learning, the paper investigates several statistical contextual learning methods on different application domains. The creativities and contributions are discussed in detail as follows:First, the paper proposes a novel web site representation and mining algorithm using multiscale semantic models. In general, a web site can be regarded as a hypertext document with complex internal structure. The paper uses a multiscale tree as the representation model of web sites, and proposes four kinds of context models to characterize the topical correlation among nodes in the multiscale site tree. Using this model, the paper presents an HMT-based two-phase classification algorithm and a multiscale classification algorithm for web sites, both of which employ the hidden Markov tree model as the statistical model of tree-based data structure, and explicitly exploit the contextual topical correlation among nodes to improve the classification accuracy of web sites. For further improving performance while reducing the classification overheads, a two-stage denoising procedure is adopted to remove the noise information within sites, and an entropy-based strategy is introduced to dynamically prune the...
Keywords/Search Tags:Statistical relational learning, context models, multiscale mining, contextual dependency network models, linkage semantic kernels, influence models
PDF Full Text Request
Related items