Class discovery via feature selection in unsupervised settings

Posted on:2017-11-28

Degree:Ph.D

Type:Dissertation

University:Boston University

Candidate:Curtis, Jessica

Full Text:PDF

GTID:1458390008466239

Subject:Statistics

Abstract/Summary:

Identifying genes linked to the appearance of certain types of cancers and their phenotypes is a well-known and challenging problem in bioinformatics. Discovering marker genes which, upon genetic mutation, drive the proliferation of different types and subtypes of cancer is critical for the development of advanced tests and therapies that will specifically identify, target, and treat certain cancers. Therefore, it is crucial to find methods that are successful in recovering "cancer-critical genes" from the (usually much larger) set of all genes in the human genome.;We approach this problem in the statistical context as a feature (or variable) selection problem for clustering, in the case where the number of important features is typically small (or rare) and the signal of each important feature is typically minimal (or weak). Genetic datasets typically consist of hundreds of samples (n) each with tens of thousands gene-level measurements (p), resulting in the well-known statistical "large p small n" problem. The class or cluster identification is based on the clinical information associated with the type or subtype of the cancer (either known or unknown) for each individual. We discuss and develop novel feature ranking methods, which complement and build upon current methods in the field. These ranking methods are used to select features which contain the most significant information for clustering. Retaining only a small set of useful features based on this ranking aids in both a reduction in data dimensionality, as well as the identification of a set of genes that are crucial in understanding cancer subtypes.;In this paper, we present an outline of cutting-edge feature selection methods, and provide a detailed explanation of our own contributions to the field. We explain both the practical properties and theoretical advantages of the new tools that we have developed. Additionally, we explore a well-developed case study applying these new feature selection methods to different levels of genetic data to explore their practical implementation within the field of bioinformatics.

Keywords/Search Tags:

Feature selection, Methods, Genes, Problem

Related items

1	Research On Prediction Of Drought-Resistant Genes In Arabidopsis Thaliana Based On Microarray Data
2	Several Studies On Of Feature Selection Algorithms That Incorporate Pathway Information To Identify Relevant Genes
3	Research On Significant Genes Selection Method Based On PSO Algorithm
4	Feature selection methods for intelligent systems classifiers in healthcare
5	Feature Selection Algorithm Using SAL Framework
6	Mixed Sparsity Regularized Multi-view Unsupervised Feature Selection
7	Nonparametric Methods for Classification and Related Feature Selection Procedures
8	Genetic Algorithm-based Mixed Feature Selection Methods Research
9	Feature Selection In High-Dimensional Statistical Learning Problem
10	Feature Selection Based On Feature Curve Of Subclass Problem