Font Size: a A A

Research And Application Of Integrated Feature Selection Algorithm Based On Extreme Learning Machine

Posted on:2020-11-22Degree:MasterType:Thesis
Country:ChinaCandidate:X Y JiFull Text:PDF
GTID:2438330602962394Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Machine learning technology has been widely used and paid attention to in the fields of biomedicine,especially the machine learning based classification diagnosis of cancer patients and patients with difficult-diseases.High?throughput sequencing makes high-dimensional cancer gene expression data readily available.However,the sample size of such data is usually small,which makes it a high-dimensional small sample data.The key and first step in analyzing such data is feature selection,which removes irrelevant and redundant genes and retains cancer-causing genes to improve the diagnostic accuracy of cancer patients.In the diagnosis of difficult-diseases,patients are often accompanied by a variety of unrelated concurrent symptoms,which will affect the doctor's diagnosis for the patients.The correct diagnosis is the key to save the patient's life.Feature selection technology can effectively find the key pathogenic factors of patients and help doctors make correct judgment.ELM(Extreme Learning Machine)is a kind of machine learning algorithm based on feedforward neuron network.Its main feature is that the input weights and thresholds can be randomly given without adjustment and the learning process can only be realized by calculating the output weights.ELM has the advantages of high learning efficiency and strong generalization ability and is widely used in classification,regression,clustering and other problems.Therefore,it can be introduced into the feature selection process to evaluate the feature subset and improve the efficiency of the feature selection process.The main work and innovation in this paper are as follows:(1)A feature selection algorithm based on ensemble classifier of homogeneous ELM was proposed,named as Ensemble ELM and G-score based feature selection(EEGFS)algorithm.The features are sorted by G-score in the step of Filter and in the Wrapper process,a extend Sequential Sequential Forward Floating Selection(SFFS)strategy is adopted to search for the feature subset,and we improved the efficiency of feature selection by introducing ELM to the Wrapper process.In addition,multiple feature subsets generated by wrapper process with ELM can be used to construct models and obtained different base classifiers to obtain comprehensive diagnosis results by ensemble learning,thus improving the final prediction accuracy.(2)A Feature selection algorithm based on ensemble feature subset with K-ELM was proposed to solve instability problem of feature selection.It is based on sampling technology to obtain different training subsamples on traning dataset and K-ELM based feature selection process was implemented on these different training subsamples,respectively producing different feature subsets.The final ensemble feature subset was obtained through feature subset ensemble strategy.Through the experimental verification and analysis on the gene dataset,it showed that the algorithm in this paper improved the stability of feature subset to a certain extent and had strong robustness with data changes,guaranting the discrimination ability of feature subset at the same time.(3)The feature selection algorithm based on ELM,K-ELM and EM-ELM as well as the feature selection algorithm based on ensemble classifier of heterogeneous ELM were proposed and applied to the diagnosis of erythemato·squamous diseases.In this paper,ELM,EM-ELM,and K-ELM are respectively introduced to the feature selection process as three feature selection algorithms to evaluate the feature subset,and extend SFS(Sequential Forward Selection)strategy is adopted to search for feature subset.By changing the parameters of the three learning algorithms,the influence on the result of feature selection was explored.In addition,the feature subsets respectively selected by the three feature selection algorithms are constructed by the corresponding classifier to further improve the prediction results on the erythemato-squamous diseases.
Keywords/Search Tags:Feature selection, Extreme learning machine, Gene dataset, Ensemble learning, feature subset stability
PDF Full Text Request
Related items