Font Size: a A A

Research Of Hypertension Biomarkers Identification Based On Omics Data

Posted on:2024-12-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:1524307361486994Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Hypertension is a crucial risk factor for cardiovascular diseases and a leading cause of death,posing a global public health challenge that imposes a significant burden on families and society.The pathogenesis of hypertension is still under exploration.With the continuous development of high-throughput sequencing technology,various omics data provide opportunities and challenges for identifying hypertension biomarkers using computational methods.Hypertension biomarkers can objectively detect and evaluate the occurrence,development,and prognosis of hypertension in patients and distinguish different types of hypertension.A growing body of research has shown that molecules involved in hypertension onset and progression often do not act in isolation but interact with each other through co-coordination.However,existing methods cannot meet the current needs for hypertension biomarker identification.These have prompted researchers to focus more on effectively identifying hypertension biomarkers through computational methods and gaining a deeper understanding of their potential roles and interactions in the pathogenesis of hypertension,thus providing vital support for elucidating the mechanisms of hypertension.This thesis studies bioinformatics computational methods to more accurately identify hypertension biomarkers and explore their potential roles in the pathogenesis of hypertension,thereby providing potential bases for the diagnosis and treatment of hypertension.The research contents and innovations include:(1)We propose an Iteration Random Forest Sequential Forward Epistasis Detection(IRFSFED)algorithm to improve the accuracy of epistasis detection in single nucleotide polymorphism(SNP)data.By analyzing target region sequencing data,the algorithm effectively identified SNP biomarkers associated with hypertension in plateau areas.The experiments identified five hypertension-related SNP biomarkers,including a newly discovered locus(chr14:61734822)significantly associated with hypertension in plateau areas,and six loci with interactions influencing hypertension.These findings provide potential targets for pathological research on plateau area hypertension.The accuracy of the IRFSFED algorithm is81.83%,higher than other similar algorithms’ accuracy.(2)We propose a Genome-wide Association Hybrid Feature Selection(GAHFS)algorithm that integrates GWAS and feature selection methods,leveraging the advantages in data analysis and model building.It accurately identified biomarkers associated with hypertension from the genetic typing data of the Qinghai Xining population.The experiments identified 12 gene biomarkers,with nine genes already confirmed by third-party biological experiments to be related to hypertension,and three newly discovered genes associated with hypertension.The accuracy of the GAHFS algorithm is 99.16%,superior to other similar algorithms’ accuracy.(3)We propose a new Deep Graph Clustering Feature Selection(Deep GCFS)algorithm.This algorithm first constructs a more stable and effective graph using prior interaction information between genes from the STRING database,then builds a new objective function for unsupervised learning and uses graph neural networks for gene node representation learning.Finally,it determines gene biomarkers through an integrated feature selection method.By analyzing the hypertension transcriptome dataset SRP447196,ten gene biomarkers were identified,all showing significant differences through t-test analyses.Classification performance validation indicated excellent analysis performance of the gene biomarkers(AUC=0.9750).Using the GSE113439 dataset as an external dataset,the identified gene biomarkers also demonstrated good classification performance(AUC=0.9545).Third-party literature case analysis revealed that six of the identified genes(PTGS2,TBXA2 R,ZNF101,KCNJ2,MSRA,and CMTM5)had been reported to be related to hypertension,verifying the effectiveness of the proposed algorithm.(4)We propose a Dual-Index Nearest Neighbor Co-expression Module Analysis(DINNCMA)algorithm for identifying mi RNA biomarkers from small sample mi RNA expression data.This algorithm effectively breaks the limitations of similarity measurement between mi RNAs,designing an importance measurement strategy that integrates multiple aspects of information to address the issues of module and intra-module mi RNA importance measurement.It proposes a probabilistic mi RNA global ranking method considering module importance.By analyzing the hypertension mi RNA expression dataset GSE75670,ten mi RNA biomarkers were identified,with eight mi RNAs previously reported to be associated with hypertension,and four mi RNAs(hsa-mi R-107,hsa-mi R-210,hsa-mi R-665,and hsa-mi R-449a)having more than five reported articles.Comparisons with other methods showed that both the number of reported mi RNAs related to blood pressure and the number of single mi RNAs with more than five reports were superior to those of the comparison methods,verifying the superiority of the proposed algorithm.Additionally,these ten mi RNA biomarkers were identified to be interconnected through their corresponding target genes,suggesting that the identified mi RNA biomarkers may jointly regulate target genes to influence the occurrence of hypertension.
Keywords/Search Tags:hypertension, genome, transcriptome, bioinformatics, biomarker
PDF Full Text Request
Related items