Font Size: a A A

Research On LncRNA-disease Association Prediction Based On Multivariate Statistical Analysis

Posted on:2022-08-24Degree:DoctorType:Dissertation
Country:ChinaCandidate:B WangFull Text:PDF
GTID:1484306353476034Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Long non-coding RNA(lncRNA)refers to a class of RNA molecules whose length is greater than 200nt and does not encode proteins.lncRNA plays an important role in chromatin modification,transcription and post-transcriptional regulation and has very important biological functions.The variation or dysfunction of lncRNA can lead to the occurrence of many diseases.Therefore,the relevant studies on the prediction of lncRNA-disease association(LDA)can not only deepen the understanding of the molecular pathogenic mechanism of complex diseases,but also use lncRNA as biological targets for disease diagnosis and prediction,as well as drug targets for treatment and prevention.Existing methods can only predict whether lncRNA is associated with disease,can not give specific aspects of the related disease,focusing merely on single lncRNA predictions and ignoring the association of disease clinical and pathological information,such as clinical stage,pathological stage,lifetime,disease state,family history genetic disease,etc.However,actually the prediction and analysis of clinical and pathological information of lncRNA related diseases are more practical and valuable.For this reason,this paper uses multivariate statistical analysis method combined with lifetime data,clinical stage data,pathological stage data and double stage data to study the prediction of lncRNA-disease association in the following aspects.(1)In view of the lack of lifetime prediction for diseases,over-reliance on known associations and the inability to accurately predict unknown associations in the current lncRNA-disease association prediction methods,prediction correlation factorcf,attenuation coefficient ?,iteration variation method and corrected iteration variation method are proposed for lifetime prediction of lncRNA-disease association.By using multivariate linear regression analysis in multivariate statistical analysis and iteration variation,a method for lifetime prediction of lncRNA-disease association is proposed.In this method,significant differential expression and gradually deleting AIC criteria are used to measure the relationship between lncRNA and diseases instead of known associations,avoiding the dependence of the algorithm on known associations.Further,the lncRNAs that are most closely related to the prognosis and lifetime of cancer patients can be reduced by this lifetime prediction method,and the potential multiple linear regression model between the lifetime of cancer patient and lncRNA is finally presented.Experimental results of AUC show that this method achieved a good performance not only in predicting the lifetime of patients,but also in predicting lncRNA-disease associations.(2)In view of'the lack of clinical stage prediction for specific diseases and the poor stability caused by noise or irrelevant data in the current lncRNA-disease association prediction methods,three kinds of circular allelism subregion operation(?center?[a,b]sub]),?X-axis(?[a,b]sub),?Y-axis(?[a,b]sub))CASO are proposed for the selection of characteristic variables.CASO method trains candidate regions in the three kinds of circular allelism subregion operations respectively for their stability,and noisy data or irrelevant data will be added to penalty set and discarded in the stability training process,so as to avoid being affected by noisy data or irrelevant data.Further,based on CASO and random forest characteristic variable importance calculation(CVSgC-RF),the selection algorithm of characteristic variables based on CASO and CVSgCRF(CVSe-CS-CF)is proposed.Finally,a clinical stage prediction algorithm of lncRNA-disease association based on Logistic regression analysis and circular allelism subregion(CSPA-PL)is proposed.CVSe-CS-CF algorithm can select the lncRNAs that are most closely related to cancer,and CSPA-PL can farther predict the lncRNrAs that are most closely related to the clinical stage of cancer from the lncRNAs that are most closely related to cancer,thus realizing the prediction of clinical stage.Experimental results show that this method has good prediction stability and get good prediction performance.(3)In view of the lack of pathological stage prediction for specific diseases and the abnormal equal loss in the current lncRNA-disease association prediction methods,coordinates reverse rotation method for the pathological stage core variable selection(CRRC)is proposed.Harmonic importance ranking(HIR)calculation method is proposed for pathological stage core variable ranking.In the CRRC method,the inequality strategy is adopted,and the status of the main region is better than that of the subordinate region.The purpose of this strategy is to protect the data in the main region,and the abnormal data will not have the opportunity to enter the main region because of the low ranking of harmonic importance,so as to avoid the impact of the abnormal equality loss on the prediction performance.Further,cluster generating algorithm(ClGeA)based on the method of CRRC is proposed.Based on the calculation method of HIR,selection algorithm of core variables for cancer pathological stage(SA-CV-CPS)is proposed.Finally,on the basis of ClGeA and SA-CV-CPS,a pathological stage prediction algorithm for lncRNA-disease association based on principal component analysis(PSPA-LAPCA)is proposed.PSPA-LA-PCA takes pathological stage data as a decision attribute and then the pathological stage prediction of lncRNA-disease association is realized.Experimental results show that this method has achieved a good prediction performance in AUC value,precision rate,recall rate and F1 value.(4)In view of the lack of double stage prediction for specific diseases,incomplete coverage and difficulties in determination of optimizing parameters in the current lncRNAdisease association prediction methods,double stage significance calculation algorithm(DSSCA)based on a conditional weighting method combined with clinical stage and pathological stage is proposed.The inverted frequency shift among buckets(IFSB)method and variable-length dynamic buckets(VLDB)are proposed for the core variable selection of double stage.The IFSB method reverses from the subordinate area to the main area,and comprehensively dynamic updates.This comprehensively dynamic updating strategy avoids the possibility of incomplete coverage and eliminates the drawbacks caused by lncomplete coverage.Further,aiming at the mobile end in IFSB method,a frequency shift update algorithm on mobile end(FSUA-ME)is proposed.FSUA-ME eliminates the elements with poor stability and higher risk,and replaces the elements with the elements with higher activity,thus comprehensively improves the global optimization ability of the algorithm.Finally,double stage prediction algorithm for lncRNA-disease association based on Bayesian discriminant analysis(DSPA-LA-BDA)is proposed.In the DSPA-LA-DBA algorithm,Bayesian discriminant analysis is performed on the core variable set Bcore,and then the double stage prediction of lncRNA-disease association is realized.Experimental results show that this method has a strong updating effect and a good prediction performance is obtained.
Keywords/Search Tags:Machine learning, Multiple linear regression analysis, Principal component analysis, lncRNA, Disease association prediction
PDF Full Text Request
Related items