The Research On Imputation Algorithm Of Missing Values For Gene Expression Data

Posted on:2006-07-26

Degree:Master

Type:Thesis

Country:China

Candidate:T Yang

Full Text:PDF

GTID:2178360185465382

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

DNA microarray technology allows for monitoring of expression levels of thousands of genes simultaneously under defferent conditions. How to analyse the data is one of the hot problems in bioinformatics sciences. However, owing to various reasons, gene expression microarray experiments often produce multiple missing values which may affect downstream analysis. Many algorithms for gene expression analysis have great difficulty in the treament of missing values and may produce incorrect results because of a few missing values. Therefore missing value estimation for gene expression data is significant and important pretreatment process in bioinformatics data mining.The weighted based on K-nearest neighbors imputation is a classical algorithm for gene expression data. But it does not take into count correlations between genes. In this paper, a new imputation method based on Mahalanobis distance is proposed to estimate missing values in the gene expression data sets. The nearest neighbors are chosen on the base of Mahalanobis distances between genes, which utilize the correlations between genes, and then whose weight factors are determined by the Shannon entropy. This algorithm can select more correctly nearest neighboring genes and corresponding weight factors so that it has more accurate estimation of missing microarray data under a variety of conditions.The Fuzzy C-Means algorithm (FCM) is a widely used clustering algorithm, recently, which has been applied to analyse gene expression data. We have applied it to missing value estimation because it may do well in dealing with co-expression and correlations between multiple genes. The imputation method based on the Fuzzy C-Means algorithm (FCMimpute) is developed to estimate missing values in microarray data. In FCMimpute, clustering parameters are determined adaptively for different data sets, and then data are analysed by FCM clustering algorithm to estimate missing entries by clustering results. Results of experiments illustrate that FCMimpute is feasible and efficient to estimate missing values in gene expression data. In addition, clustering parameters are adaptively determined in FCMimpute so that it can improve clustering correctness. Therefore FCMimpute is an effective imputation method which can generate reliable imputed values.The Fuzzy C-Means clustering algorithm is very sensitive to the situation of the...

Keywords/Search Tags:

Gene Expression Data, Missing Value, Mahalanobis Distance, Fuzzy C-Means Algorithm, Iterated Local Search

PDF Full Text Request

Related items

1	Fuzzy C-means Clustering Algorithm Based On Mahalanobis Distance For Compositional Data
2	Application And Research Of The Fuzzy C-Means Clustering In Gene Express Data
3	Research On Generalzed Mahalanobis Distances And Its Application In Data Mining
4	Research On Fuzzy Clustering Algorithm Of Gene Expression Data
5	Improvement Of Genetic Clustering Algorithm And Its Application In Gene Expression Data Analysis
6	Research On Facial Expression Recognition
7	Research On Relevant Problems Of DNA Microarray Expression Data Analysis
8	The Research On Fuzzy Clustering Algorithm Based On Mahalanobis Distance
9	Gene Microarray Data Analysis Based On Clustering Algorithms
10	Iterated Local Search Algorithms For The Resource Constrained Project Scheduling Problem