| Long non-coding RNA(lncRNA)is a type of non-coding RNA(nc RNA)with over 200 nucleotides(nt)in length and widely exists in cells.In recent years,studies have found that lncRNA plays a significant role in the process of disease,but the relationship between the two parts is not yet completely clear.Medical workers usually use the "wet experiment" method to study the relationship between lncRNA and disease.Faced with hundreds of thousands of lncRNAs and complex diseases,the traditional way to face such a huge data set seems helpless,and will waste a lot of labor costs and economic costs.Bioinformatics researchers apply predictive models with good performance to lncRNA and disease association studies.In this way,you can quickly get the lncRNAs most relevant to a certain disease,and then medical workers can focus on the relationship between these lncRNAs and the disease.Although the current prediction models have achieved good results,these models still have a lot of room to increasing precision.With the deepening of the research of association between lncRNA and disease,more and more dimensions of lncRNA similarities and disease similarities are applied to the prediction of the association between lncRNA and disease,but the problem with the current commonly used models is: the first case is to use the single similarity of disease and lncRNA,another case is the problem of loss of information when integrating multiple similarities between lncRNA and disease.In order to solve these problems,we did the following research:(1)To study the similarity between various lncRNAs and diseases,including lncRNA expression similarity,lncRNA similarity,disease semantic similarity,and disease cosine similarity.(2)Using the similarity kernel fusion method to integrate lncRNA and disease similarities,neighbor constraints are enforced before fusion to refine all similarity matrices.Then we constructed a cost function based on Laplace regularized least squares method to predict the association between lncRNA and disease.(3)To solve five parameters in the similarity kernel fusion model,the leave-one-out cross-validation method and the five-fold cross-validation framework are used to find their optimal solutions.(4)In order to verify the performance of the similarity kernel fusion method,we need to compare multiple cases: the rationality of multiple similarities,the difference of cosine similarity and the Gaussian kernel interaction similarity,the rationality of neighbor constraints in the similarity fusion method,comparison between similarity kernel fusion model and other methods.Finally,we found that the similarity kernel fusion achieved 0.9049 and 0.8743 ± 0.0050 in leave-one-out cross-validation and 5-fold cross-validation,respectively.The result shows superior performance.In addition,case studies based on three diseases(hepatocellular carcinoma,lung cancer,and prostate cancer)show that similarity kernel fusion model can accurately predicts the relationship between lncRNA and disease. |