Font Size: a A A

The Research On Feature Selection Algorithms Based On Multiple Viewpoints Fusion

Posted on:2019-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:H H SongFull Text:PDF
GTID:2428330566484188Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of science and technology,biological data are in an era of explosive growth.Mining valuable information from complex and high-dimensional biological data is of great significance to the study of the problem nature.Feature selection is an effective method for dealing with high-dimensional biological data.It can remove a large number of irrelevant,redundant features and noises from the original feature set,and screen feature subsets that are highly related to biological problems.It is widely used in the discovery of biomarkers and disease classification.The biological activity is complex,and the biological function is accomplished by the interaction among molecules.Therefore,not only the classification performance of molecules themselves,but also the intermolecular correlation should be considered in the process of searching for biomarkers.This paper proposes FS-ODND algorithm to measure the distinguishing abilities of features from both molecular level and network level.From molecular level,FS-ODND calculates the feature weight using the overlapping degree of effective ranges of the feature in different categories.From network level,FS-ODND constructs the network by the non-overlapping degree of the ratio variable and calculates the feature weight by computing the degree of node in network.Then the feature discriminative ability is determined by combing these two weights.The experiment on eight public biological data sets shows that FS-ODND is superior to Degree,ERGS,Relief-F and SVM-RFE in classification accuracy,selected feature number and stability in most cases.Biological systems are complex.The difference of different biological samples may be reflected in the distribution differences of some single features,and may also be reflected in the changes of the relationship between features.In this paper,FS-SVPV algorithm is proposed to analyze biological data by comprehensively evaluating individual features and feature pairs.The algorithm builds a classifier of an individual feature based on information gain,uses the M-k-TSP evaluation criteria to evaluate the discriminative information contained in individual features and feature pairs,and selects the most discriminative features and feature pairs to construct the classification model.FS-SVPV is compared with M-k-TSP and SVFS on eleven public biological data sets.The experimental results show that thecombination of single features and feature pairs can improve the classification performance and screen out more meaningful information.This paper proposes two feature selection algorithms based on fusion pattern.FS-ODND measures feature importance from both molecular level and network level.FS-SVPV combines the evaluation of single features and feature pairs to define the important information from the complex biological data.The experimental results on the public data show the effectiveness of the feature selection based on the fusion patterns.
Keywords/Search Tags:Feature Selection, Classification, Effective Range, Biological Network, M-k-TSP
PDF Full Text Request
Related items