Font Size: a A A

Application Research Of Feature Selection Method Based On Localized Samples For Transcriptomic Data

Posted on:2018-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:M Y ShengFull Text:PDF
GTID:2310330515978272Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of high-throughput gene chip technique and next-generation sequencing(NGS)technology,a large number of gene transcriptomic data was measured some m RNA experiments by researchers.Since the characteristic of gene transcriptomic data with tens of thousands of genes and a small number of samples may hinder extracting useful information,it is extremely necessary to propose an efficient and robust feature selection method to extract information genes from gene transcriptomic data for researchers.In recent years,researchers have begun analyzing such type data by feature selection method.With the deepening of the research,researchers found that the training model using all samples didn't get the best result.The noise samples,outlier samples and the distribution of samples can lead to a decline in classification accuracy.Therefore,the research of localized sample is particularly important.It is well known that cancer is a kind of heterogeneous disease,patients with the same genetic characteristics might share same molecular mechanisms during the development and evolution of cancer.So,it is more important to get more accurate feature selection model using localized samples with the same genetic characteristics.In this paper,we proposed an efficient and easily useful feature selection method based on localized samples,which are a subset of samples from original dataset.The method using localized samples to train feature selection models and can get better classification performance,because the method can reduce the influence of outlier samples and distribution of samples.There are three steps in the process of getting localized samples.Firstly,we calculated Euclidean distance between the central samples with their neighbor samples by using gene expression values.Secondly,we established the co-expression networks by selecting top four nearest samples for each central sample.And according to Random Walk with Restart(RWR)method,we formed the sample-sample similarity network.Thirdly,we divided into the similarity network by different cutoff values and compared five selection strategies,and obtained localized samples for best cancer classification.In this paper,the cancer transcriptomic data is regarded as the research object.And,we evaluated the proposed method by using leave-one-out cross validation(LOOCV)on breast,gastric,pancreatic,lung,thyroid and prostate cancer from GEO and TCGA database,and compared with T-test,Rank Sum Test and minimum redundancy maximum relevance method.The best accuracies of the proposed method on these datasets for top 100 genes by SVM classifiers were 98.51%,97.27%,98.55%,100%,100% and 100%,respectively.The results show that the proposed method obtains excellent performance on these datasets.It also indicates that the proposed method is effective and applicable.
Keywords/Search Tags:Localized Samples, Feature Selection, Cancer Classification, Transcriptomic Data
PDF Full Text Request
Related items