Data mining in tree-based models and large-scale contingency tables

Posted on:2006-06-16

Degree:Ph.D

Type:Thesis

University:Georgia Institute of Technology

Candidate:Kim, Seoung Bum

Full Text:PDF

GTID:2458390008470759

Subject:Engineering

Abstract/Summary:

This thesis is composed of two parts. The first part pertains to tree-based models. The second part deals with multiple testing in large-scale contingency tables. Tree-based models have gained enormous popularity in statistical modeling and data mining. We propose a novel tree-pruning algorithm called frontier-based tree-pruning algorithm (FBP). The new method has an order of computational complexity comparable to cost-complexity pruning (CCP). Regarding tree pruning, it provides a full spectrum of information. Numerical study on real data sets reveals a surprise: in the complexity-penalization approach, most of the tree sizes are inadmissible. FBP facilitates a more faithful implementation of cross validation, which is favored by simulations.; One of the most common test procedures using two-way contingency tables is the test of independence between two categorizations. Current test procedures such as chi-square or likelihood ratio tests provide overall independency but bring limited information about the nature of the association in contingency tables. We propose an approach of testing independence of categories in individual cells of contingency tables based on a multiple testing framework. We then employ the proposed method to identify the patterns of pair-wise associations between amino acids involved in beta-sheet bridges of proteins. We identify a number of amino acid pairs that exhibit either strong or weak association. These patterns provide useful information for algorithms that predict secondary and tertiary structures of proteins.

Keywords/Search Tags:

Tree-based models, Contingency tables, Data

Related items

1	Sampling contingency tables given sets of marginals and/or conditionals in the context of statistical disclosure limitation
2	Statistical tools for disclosure limitation in multi-way contingency tables
3	Latent tree models: An application and an extension
4	The Research And Application Of Nested Tables Modeling In Data Source Aggregation
5	Research On Foreign Key Detection Algorithm For Web Tables
6	Approach To Domain Ontology For Contingency Plan Based On Relational Data Model
7	Research On The Index Of Massive High Dimensional Data Via Multiple Hash Tables
8	Design And Implementation Of Emergency Command System For Digitizing Prison Based On Contingency Plans
9	Using Bayesian regression tree models and remotely sensed data to characterize recent environmental change in Alaska, USA
10	The Design And Implementation Of Information Extraction Engine On Web Tables