Font Size: a A A

Researches On Feature Selection Based On Feature Relationships

Posted on:2019-09-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X HuangFull Text:PDF
GTID:1368330548484725Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Extracting crucial features from high-dimensional data is a key topic in bioinformatics studies,which can facilitate the performance of disease diagnosis and treatment.In the living organism,features interact with each other to implement different functions,and their relationships can provide information in respond to physiological and pathological changes.Therefore,exploring the alteration of complex feature relationships can provide a deep insight into disease mechanism and biomarker discovery.This study develops novel feature selection methods based on feature relationships,which identifies key information by means of networks and combinatorial features.The main works are as follows:(1)A novel method is proposed that considers multiple patterns of feature relationships.Different from the other pair-wise feature evaluation methods which evaluate feature pairs by horizontal comparison,the novel method identifies discriminative feature pairs by combining horizontal and vertical comparisons.The classifer is constructed based on the comparion of the selected feature pair.The better performance of the novel method on genomics and metabolomics datasets implies that studying feature relationships from vertical and horizontal comparisons could define more information,which can facilitate disease phenotype and biomarker discovery studies.(2)A novel method is proposed to construct the classifier based on feature combinations which are defined by horizontal comparison between features.Combinatorial features are inferred by appropriate number of features that are selected by iterative analysis.In each iteration,new combinations are constructed and added to the feature set.The superior performance on genomics and metabolomics datasets validates that by iteratively studying the relationships among the features,the novel method could successfully define combinations of more than two features that have powerful discriminative abilities and construct an efficient classifier.(3)A new method for analyzing time-series data based on dynamic networks in a systematic time dimension is proposed.This method uses non-overlapping ratio to explore the alteration of pathway reaction for network construction.Dynamic concentration analysis and topologiacl structure analysis are developed to analyze the networks and extract early warning information for complex disease diagnosis.In the application of this method to analyzing the HCC cohort dataset,a ratio of lyso-phosphatidylcholine 18:1/free fatty acid 20:5 was identified as the potential biomarker,which is a good tool in the discrimination of HCC and non-HCC.The better performance of the novel method suggests its potential for a more complete presentation of time-series changes and effective extraction of early warning information.(4)A computational method for identifying potential biomarkers based on differential sub-networks is developed.This method explores relationships between feature ratios for network construction;identifies differential relationships in different physiological and pathological states to infer differential sub-networks;uses the topological structure analysis to select important feature ratios.It can define discriminative information for classifying different disease groups from static datasets and identify warning signals from time-series datasets.In the application of this method to analyzing static genomics dataset and HCC metabolomics cohort dataset,a better performance suggests that extraction of differential sub-networks from feature ratio correlation neworks can provide a novel insight into disease mechanism and biomarker discovery.To explore the changes of feature relationships in the development of diseases and identify key feature relatinships from complex omics data for improving the performance of clinic diagnosis,this thesis studies novel data analysis methods based on networks and feature combinations.Novel methods are applied to analyzing the genomics and metabolomics datasets for disease phenotype studies and potential biomarker selection.The data processing methods developed in this thesis can provide an effective new tool for biomarker screening,which could facilitate the performance of disease diagnosis and treatment.
Keywords/Search Tags:Feature Selection, Classification, Network Construction, Network Analysis, Bioinformatics
PDF Full Text Request
Related items