Font Size: a A A

Cross-platform Gene Expression Data Classification Based On Rule Learning

Posted on:2020-10-26Degree:MasterType:Thesis
Country:ChinaCandidate:Y J HouFull Text:PDF
GTID:2370330596495049Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The human genome project launched to explore the coding of humans has been basically completed through nearly three decades of efforts.Along with this,a large amount of biomolecule data is generated.These high-dimensional data are rich in information,and there is hidden knowledge about themselves that humans have not fully understood.The urgent need to help biologists discover valuable information from massive information and promote advances in human medical research by using mathematical,statistical,and computer sciencerelated methods has promoted the rapid development of bioinformatics.The application of high-throughput detection technologies such as cDNA microarray and oligonucleotide chip has accumulated a large amount of data from different platforms.It is of great significance to obtain classification patterns from existing data and apply them to new samples for gene expression data mining.Since most of the gene expression data sets have small samples and high dimensional characteristics,a small number of sample data on the same platform is extremely limited compared to high-dimensional features.The sparsity of the number of samples in the data set is undoubtedly a dimensional disaster in the feature,which makes it difficult to extract robust key information from a single small sample data set.If the gene expression data on multiple GEO platforms can be combined,Increasing the amount of sample data to improve the reliability of the study is very beneficial for related research work.However,the biological experiments involved in gene expression data are a multi-step complex process.Due to the source of biological samples,the technical means used in chip fabrication,and the different equipment standards,it is difficult to express gene expression data between multiple platforms.Systematic scale differences eliminated.The classical normalization method relies on the statistics determined by the platform information of the data.In this paper,for the problem that the classification model is difficult to migrate between different platforms,the relative expression relationship between features based on platform information is used as the basic classification model of the model to adapt to the diversity of cross-platform genetic data,and to eliminate the difference in sample size caused by its platform characteristics.Promote the scope of application of relevant research work.Furthermore,based on the asymmetry analysis of the feature pair classification model,the partial order mode is proposed as the basic classification mode,and the hierarchical rule tree classification mode which can introduce more rules is constructed to improve the classification performance of the model from the perspective of sample coverage.For the large-scale data problems brought by cross-platform features,based on the classification rules for the relative size relationship between genes,this paper designs the corresponding data transformation and rule pre-screening strategy based on relative offset to realize the rapid mining of the algorithm.The experimental results on the real gene expression dataset verify the accuracy and stability of the algorithm and the operational efficiency of two orders of magnitude higher than the existing methods,which can effectively cope with the challenges brought by cross-platform gene expression data mining.The research on model algorithms independent of data scale from high-dimensional data can be applied not only to the classification of gene expression data,but also in many fields such as social networks,recommendation systems,and financial analysis.The cross-platform classification algorithm on gene expression data has much in common with the data characteristics and research in the above fields.Therefore,the cross-platform gene expression data classification algorithm based on partial order mode can be extended to more application scenarios,which helps cross Algorithm research in the field of platform data.
Keywords/Search Tags:Gene expression data, Classification, Feature selection, Cross-platform, Rule learning
PDF Full Text Request
Related items