Font Size: a A A

Cancer Marker Research Based On RNA-Seq Data

Posted on:2021-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:W Y JiangFull Text:PDF
GTID:2514306494995269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
A cancer marker is a biological molecule found in the body.It is a sign of cancer detection and prevention identification is of great significance to cancer prevention and treatment.Gene expression profile data is a very important type of cancer markers.Mining markers of different cancer types from tens of thousands of gene expression profile data is of great significance for elucidating the formation mechanism of cancer and preventing the occurrence and development of cancer.The research of this article is divided into three aspects:1.This paper presents a method of data standardization.The raw data are standardized,the product relationship between chemical molecules is transformed into linear relationship,the converted data are analyzed on the same scale,and the average expression level of different RNA is calculated.Then the RNA average expression level of each patient is obtained.The data standardization method in this paper overcomes the problem of data inconsistency caused by different experiments,different experimental platforms,different dimension and different data processing software.2.In this paper,a statistical significance-based feature selection algorithm is proposed.Design of the algorithm is divided into two steps: building sample set and generating p-value table.First,the data of specific cancer types are randomly assigned,and the data of other cancer types are preprocessed to ensure sample equilibrium.Secondly,the average sample rank is calculated by algorithm according to the assigned sample set.Sample rank is used as statistic to simulate the distribution of sample rank and generate statistical control table.3.We select three cancer types to verify the statistical significance feature selection method proposed in this paper.By comparing the experimental results of other excellent feature selection algorithms on the same classifier,it is proved that the result of statistical significance feature selection algorithm is more accurate and the feature selection is more differentiated.The feature selection method based on statistical significance can effectively screen cancer markers in gene expression profile data and assist cancer prevention and treatment.
Keywords/Search Tags:cancer markers, RNA-Seq, feature selection, based on significant statistics, gene expression profile data
PDF Full Text Request
Related items