Font Size: a A A

A Cancer Classification Method Fused From Training And Low-rank Representation Of Gene Expression Data

Posted on:2019-01-06Degree:MasterType:Thesis
Country:ChinaCandidate:C Q XiaFull Text:PDF
GTID:2434330551956343Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Traditional diagnostic methods are not able to determine the accurate type of some specific diseases due to the uncleamess of pathophysiology.Machine-learning-based methods integrated with molecular biology data begin to draw interest in bioinformatics recently.Cancer is a kind of genetic diseases and there are more than 100 types of cancer,which are usually named for the corresponding tissues and organisms.Accurate identification of the cancer types is essential to cancer diagnoses and treatments.Since cancer tissue and normal tissue have different gene expression,gene expression data can be used as an efficient feature source for cancer classification.With the development of high throughput sequencing technology,gene expression data in the whole genome range can be derived.However,accurate cancer classification directly using original gene expression profiles remains challenging due to the intrinsic high-dimension feature and the small size of-the data samples.There also exists much noise and redundancy in the gene expression data.(1)In order to solve these problems,we proposed a new semi-supervised self-training classification algorithm under low-rank representation,called SSC-LRR,for cancer classification on gene expression data.Low-rank representation(LRR)is first applied to extract discriminative features from the high-dimensional gene expression data;semi-supervised self-training classification(SSC)method is then used to generate the cancer classification predictions.The SSC-LRR was tested on two separate benchmark datasets in control with four state of the art classification methods.It generated cancer classification predictions with an overall accuracy 89.7%and a general correlation 0.920,which are 18.9%and 24.4%higher than that of the best control method respectively.Overall,the study demonstrated a new sensitive avenue to recognize cancer classifications from large-scale gene expression data.(2)In addition,data visualization has been done based on the low-rank representation.We also proposed a key gene selection method to rank the discriminative ability of genes.Several genes(RNF114,HLA-DRB5,USP9Y and PTPN20)were identified by our method as new cancer identifiers that deserve further clinical investigation.(3)A Flask based web server was developed to provide online service of cancer classification prediction for other biomedical researchers.
Keywords/Search Tags:Cancer classification, Gene expression data, Low-rank representation, Self-training, Semi-supervised learning
PDF Full Text Request
Related items