| Background:Biliary atresia(BA)is a common fatal obstructive biliary disease in neonates with clinical manifestations of obstructive jaundice.Studies have shown that the pathogenesis of BA may be related to genetic defects,virus infection,abnormal immune morphology,gene polymorphism and other factors.Due to the complex pathogenic factors of BA,it is suggested that there may be different molecular subtypes in BA.At present,molecular subtyping of diseases using high-throughput data is mainly achieved through unsupervised clustering based on the similarity of gene expression levels in samples,but such methods are often based on complex statistical models,which are difficult for biologists to understand.Therefore,it is necessary to develop an easy-to-understand and interpret molecular subtype identification algorithm to explore the molecular subtypes of BA and provide a reference for furtherunderstanding of thepathogenesis of BA.Objective:Using the relative expression rank relationship between genes within a sample,a subtype method for identifying potential molecules of disease is proposed,and the method is applied to BA gene expression profile data to mine the potential molecular subtypes of BA,and analyze the subtypes related tothe subtypes.Key pathways and genes toexplore the pathogenesis of BA.Methods:Using the relative expression rank relationship between genes within the sample,based on the following three hypotheses(1)there is a phenomenon that the relative expression rank relationship between genes is different between samples of different molecular subtypes;(2)it has the potential to identify molecular subtypes of diseases There are differentially expressed genes between the two groups of samples divided according to their relative expression rank relationship,and they also involve changes in biological functions;(3)On the cluster map,the identified genes with the ability to potentially identify molecular subtypes of diseases are identified.Gene pairs are clustered to develop subtype algorithms that identify potential molecules of disease.Then,the BA gene expression profiling datasets(GSE46960,GSE15235)were analyzed,potential gene pairs with subtype recognition ability were mined,and molecular subtypes were identified and verified by cluster analysis.Finally,weighted gene co-expression network analysis(WGCNA)was used to identify subtype-related gene modules in BAanalysis,and screen subtype-related coregenes.Results:A method was established to identify molecular subtypes of diseases based on the relative expression rank relationship between genes within a sample.Using this method,two major molecular subtypes were identified in the BA gene expression dataset GSE46960 and validated in the independent dataset GSE15235,and the results showed that the method has good stability and migration.Combined with weighted gene co-expression network analysis and protein interaction network analysis,10 genes(LUM,COL6A3,FBN1,SPARC,DCN,LAMA4,FAP,ANTXR1,LAMA2,COL1A2)associated with BAmolecular subtypes were found,which may It is a key factor indetermining BAtocausehepatitis or liver fibrosis.Conclusion:Potential molecular subtypes of disease can be identified based on the relative expression rank relationship between genes within a sample.There are two main potential molecular subtypes of BA.The analysis of key genes related to 10 subtypes,which is related to the outcome of BA disease progression,may be a key factor in determining BAto cause hepatitis or liver fibrosis.The findings of this study are beneficial to Further understanding of the pathogenesis of BA and the choice of treatmentstrategies provide a reference. |