| DNA microarray technology enable us to computational analyze gene expression data for early diagnosis of diseases like cancer.These data contains expression values of thousands of genes in an organism’s genome.However,the dimension of this gene expression data is very high,each dimension corresponds to one gene in the genome and very few of these genes are associated with a disease.At the same time,the number of samples or observations available is very small as compared to the number of genes.Therefore,the task of selecting the genes that are relevant to the disease being studied is an important task and has been extensively studied.the existing feature gene selection algorithms for gene expression profile data mainly focus on the selection of gene measurement methods,and do not take too much consideration of the interaction between genes,so as to refine the redundancy between genes.In this thesis,a gene selection method PRFS based on PageRank is proposed.Starting from the maximum information coe cient,we first use the maximum information coe cient to build a gene network,where nodes are genes,and directed edges represent redundant relationships between genes.Then,we compute PageRank on the network and assign a score to each gene as a measure of redundancy given a specific subset of genes.Finally,this redundancy measure is combined with the MRMR algorithm.Due to the global nature of the network and PageRank,our method can better measure the relationship between candidate genes and selected gene subsets.Extensive experiments are conducted on five microarray datasets to verify the effectiveness of the proposed method. |