Font Size: a A A

Design Of Stream Feature Selection Algorithm And Its Application In Gene Expression Data

Posted on:2019-08-03Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WuFull Text:PDF
GTID:2438330551960866Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is a new hot research subject in recent years.The gene expression data in bioinformatics have the characteristics of high dimension,high redundancy and class imbalance,and the data sets is a difficult challenge for machine learning algorithms.Before training the model of machine learning algorithm,it is a necessary process to reduce dimension.Featre selection is an effective method to carry out the process of dimensionality reduction.The study of the feature selection and classification of gene expression data can provide a reliable and objective research method for the diagnosis and treatment of disease.In recent years,many scholars have conducted research on this field,the gene expression data feature selection work have made some achievements,but still far from achieving the ideal analysis result,so many researchers working on better gene expression data feature selection algorithm.The previous feature selection research of gene expression data is based on the traditional static feature selection algorithm,that algorithm model is relatively complex,high time complexity,academics have the streaming feature selection technology applied to feature selection problem of gene expression data and achieved good result.According to the characteristics of gene expression data,the paper study feature selection problem of gene expression data in depth.In this paper,a robust and understandable feature selection model is constructed based on online learning algorithm,and a streaming feature selection algorithm which can reduce feature space compactness and improve classification recognition accuracy is proposed.The paper was carried out feature selection experiments on twelve dimensional gene expression data using streaming feature selection regularized by L2,1-norm algorithm which proposed by this paper,and compared with the four other typical streaming feature selection algorithm.Experimental results show that the proposed algorithm in the classification recognition accuracy of best feature subset,feature space compression and algorithm stability of streaming scenarios has particular advantages.In order to deal with the class imbalance of gene expression data,this paper proposes a streaming feature selection algorithm for imbalanced data by combining the improved streaming feature SMOTE oversampling algorithm,and carries out experiments on three kinds of imbalanced data.The experimental results show that the subset selected by the improved algorithm improves the recognition ability of minority class samples under the condition of without losing the whole classification recognition rate,and it will be more capable to handle the streaming feature selection problem of class imbalance data.
Keywords/Search Tags:streaming feature, feature selection, L2,1-norm, gene expression data, imbalance data
PDF Full Text Request
Related items