Data mining techniques to enable large-scale exploratory analysis of heterogeneous scientific data

Posted on:2010-05-22

Degree:Ph.D

Type:Thesis

University:North Carolina State University

Candidate:Chopra, Pankaj

Full Text:PDF

GTID:2448390002988882

Subject:Biology

Abstract/Summary:

Recent advances in microarray technology have enabled scientists to simultaneously gather data on thousands of genes. However, due to the complexity of genetic interactions, the functions of many genes remain unclear. The cause and progression of many diseases, like cancer and Alzheimer's, is increasingly being attributed to the deregulation of critical genetic pathways. Data mining is now being extensively used in biological datasets to infer gene function, and to identify genetic biomarkers for disease prognosis and treatment. There is a considerable need to design algorithms that explore and interpret the underlying microarray data from a biological perspective.;In this thesis, three areas of data mining in biological datasets have been addressed. First, a new clustering algorithm has been designed that explores data from different biological perspectives. Most conventional clustering algorithms generate one set of clusters, irrespective of the biological context of the analysis. The new model generates multiple versions of different clusters from a single dataset, each of which highlights a different biological context. Second, a new classification algorithm has been designed that uses gene pairings for cancer classification. This exploits the concept that gene pairs may be a better metric for cancer classification compared to single genes. Third, a meta-analysis of human and mouse cancer datasets is integrated with existing knowledge to highlight pathways that are closely associated with cancer.

Keywords/Search Tags:

Data, Gene, Cancer

Related items

1	Application Of Improved Biclustering Method To Cancer Gene Expression Data
2	A Research And Analysis On The Relationship Of Cancer Gene And Their Related MicroRNA
3	Clustering Analysis Of Cancer Gene Expression Data Based On Manifold Learning
4	Research Of Co-clustering Algorithms For Cancer Subtypes Discovery Based On Gene Expression Data
5	Research On Algorithms For The Cancer Differential Gene Expression In Gene Microarray
6	Dimensional Reduction Research And Application On The Cancer Gene Data
7	Gene Selection And Cancer Classification Based On Optimization Algorithm And Support Vector Machine
8	Data mining techniques to enable large-scale exploratory analysis of heterogeneous scientific data
9	Cancer Gene Expression Data Attribute Partial Order Representation And Knowledge Discovery
10	Tumor Gene Expression Data Analysis Based On Nonnegative Matrix Factorization