Font Size: a A A

Research On Feature Engineering And Feature Selection Algorithm Of Biogenetic Data Based On CNN

Posted on:2022-06-08Degree:MasterType:Thesis
Country:ChinaCandidate:H Y WangFull Text:PDF
GTID:2480306332465464Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Transcriptome and methylated gene sequences are two major sources of genomic data influenced by genetic information and environmental factors.Transcriptome and methylated gene sequences have been widely used as biomarkers for disease diagnosis and prognosis.Nowadays,transcriptome and methylated analysis techniques can detect the status of tens of millions or even hundreds of millions of detected residues in the human genome.However,due to sample size limitations,“large P and small N” patterns make it difficult to apply transcriptome data to popular classification models.Traditional machine learning methods relied on feature selection,while deep learning models require large amounts of data,and transfer learning methods are mostly applied to image data.In view of this research status,this research proposes a feature construction method based on the original gene sequence,which uses small convolutional neural network to construct features,and combines with the traditional machine learning method to solve the problem of further improving the classification accuracy under the premise of rough feature selection.In this study,we customized a small convolutional neural network.A small amount of new feature data was constructed by convolutional neural network and original feature.Feature selection and classification experiments were carried out on the basis of the constructed features.Correlation analysis was conducted for the data of each characteristic layer,and the results showed that the differential expression of the convolutional layer features were more obvious.More importantly,our experiments confirmed validity of feature construction,by using the same original feature and the same feature selection algorithm(including T-Rank,W-Rank,Mc Two).The experimental results showed that after the new features abstract from convolution neural network structure can achieve better than the effect of original features,No matter which method of feature selection can improve the accuracy based on feature construction.We also put forward a simple method of feature selection,combined with the features of the structure can further enhance accuracy.We named this combination of approaches BioTransfer.In order to further explores feature selection methods,Mc One,T-Rank,W-Rank,Pearson,Sepearman and random feature selection methods were used to explore which features were used for feature construction in order to obtain more ideal results.Experimental results showed that T-Rank is the best method for feature construction.Surprisingly,even if the features were randomly selected,they can achieve higher accuracy than using the original features directly.Moreover,the method of feature construction had the stability close to the original data.In order to make the experiment more comprehensive and rigorous,the classification performance was analyzed for different evaluation indexes and different classifiers,and the results showed that feature construction method had better classification effect than original feature.
Keywords/Search Tags:feature construction, feature selection, Convolutional Neural Network, Recursive feature selection(RFE)
PDF Full Text Request
Related items