Drug targets refer to biomolecules in vivo that can interact with drugs and exert therapeutic effects.They play a crucial role in drug discovery,drug sensitivity,and disease mechanism research.Identifying drug targets that are related to a specific disease is the foundation of modern drug development and the source of innovation and breakthroughs in drug development.Owing to the rapid development of high-throughput omics technologies and the rapid accumulation of pharmaceutical information,bioinformatics methods have been widely applied in pharmacological information-based drug-targetpotential analysis and omics-based biomarker identification,and thus playing an irreplaceable and crucial role in drug target identification and discovery.Although bioinformatics can accurately,efficiently,and intelligently identify potential drug targets from massive data,the instability of the identification results greatly weakened their credibility,and severely limited the application of related bioinformatics methods in disease treatment and drug development.Therefore,developing stable bioinformaticsbased methods and tools for drug target identification is particularly important for improving the efficiency and reliability of drug target discovery.This dissertation presents the development of a comprehensive and precise drugmolecule interaction network database,Drug MAP,which collects over 200,000 interaction data between more than 30,000 drugs or candidate drugs and over 5,000 molecules with pharmacological or pharmacokinetic effects.This database quantitatively described the differential expression patterns of all interacting macromolecules and provides diverse information retrieval tools and abundant detailed pharmaceutical information on drugs.Drug MAP provides a reliable data foundation for pharmacological information-based drug-target-potential analysis and enhances the stability and reliability of the analysis results.Drug MAP is accessible through the following link:https://idrblab.org/drugmap/.After completing a comprehensive collection of pharmaceutical information from massive literature,this dissertation developed and presented the SSizer,a tool to evaluate sample sufficiency and determine the necessary sample size for omics studies,which addressed the common problem of sample size sufficiency in omics research.For the first time,this tool integrated three complementary sample sufficiency evaluation criteria,including statistical power,classification accuracy,and robustness of biomarker identification,to establish a comprehensive evaluation of sample sufficiency for omics data.Moreover,SSizer expanded the sample size of the original data through data simulation while maintaining the variation between the original data groups,and determined the minimum sample size necessary for the omics study using prediction.SSizer provides sample size references and data support for omics research design and ensures the reliability and stability of omics data mining and the results of biomarker identification.SSizer is available for free access at: https://idrblab.org/ssizer/.Thirdly,to address the issue of inconsistent identification results caused by different feature selection methods in omics research,this dissertation constructed the optimal biomarker identification online tool POSREG based on stability and accuracy evaluation.In order to bridge the gap of the neglect of stability in current feature section methods,POSREG comprehensively applies stability evaluation and ensemble learning to optimize the stability of multiple filter feature selection methods,and uses accuracy-based golden section search to efficiently discover the optimal feature subset.It realizes the comprehensive evaluation of feature selection accuracy and stability,and improves the stability of traditional filter feature selection methods without reducing the feature prediction accuracy.In addition,POSREG tool also provides diverse protein functional enrichment,which could help users quickly discover the phenotypic relevance of features and is expected to become an important tool in the field of biomarker discovery and precision medicine.POSREG is available for free access at: https://idrblab.org/posreg/.Finally,after a comprehensive understanding of existing biomarker identification and feature selection methods and tools,this dissertation proposed a novel method for stable biomarker identification and developed an online tool based on this method named Con SIG.This novel method integrated multiple statistical strategies and multiple random sampling methods,and introduces a multi-step evaluation of rank stability during the biomarker identification process.This method achieved a comprehensive guarantee of biomarker identification stability by avoiding the erroneous elimination of effective features to the greatest extent.The performance improvement of biomarker identification stability in this method was further validated using several omics benchmark data.Moreover,the systematic evaluation of stability and accuracy,as well as the functional module of disease and the disease/gene ontology enrichment analysis were and integrates into Con SIG tool,to achieve a comprehensive consideration of biomarker identification accuracy,stability and biological relevance,and to provide new means for the identification of novel biomarkers and the discovery of related drug targets.The Con SIG tool is freely accessible through the following link: https://idrblab.org/consig/.In response to the instability problem of bioinformatics methods in drug target identification,this dissertation constructed a comprehensive drug-molecule interaction network,Drug MAP,and developed a multi-criteria sample adequacy assessment and required sample size prediction tool,SSizer,to ensure the stability and reliability of drug target identification based on bioinformatics at the data level.Furthermore,this dissertation comprehensively evaluated and improved the stability of existing filter feature selection methods and developed a novel stable biomarker identification method based on multiple statistical strategies and multi-step evaluation of rank stability.Based on the established method and analyses,the online tools POSREG and Con SIG were constructed to provide services for biomarker identification based on omics data for researchers.In conclusion,this dissertation has conducted a series of studies for the bioinformatics-based stable discovery of drug target,which not only provided the data basis for pharmaceutical-information based drug-target-potential analysis,and also provide technical support and assurance for the stability of omics-based biomarker identification.This dissertation could not only effectively improve the stability of drug target discovery,but also promote the reveal of complex disease mechanisms and development of novel therapeutic strategies. |