Font Size: a A A

Improvement Based On Matrix Factorization Algorithm And Application In Long Noncoding RNA Regulation Prediction

Posted on:2020-12-22Degree:MasterType:Thesis
Country:ChinaCandidate:G F RenFull Text:PDF
GTID:2370330578450923Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data,in the field of bioinformatics,more and more data values have yet to be explored.However,most of the current biological data validation is the result of precision experiments.Obviously,the cost and energy consumption are enormous.In recent years,with the popularization of artificial intelligence,more and more researchers have applied intelligent algorithms in the direction of biological big data mining and analysis.Specially,it has recently been emphasized that long non-coding RNA is a kind of biological macromolecule,which can regulate proteins and microRNAs,thereby affecting diseases.The application of intelligent algorithms to study the relationship between long non-coding RNA and other molecules is a current research hotspot.In this paper,we first propose an improved algorithm based on matrix factorization.Based on the latent factor model,the latent factor vector of the matrix factorization is calculated as a probability score through the logic function,which represents the relationship score of the corresponding user and item.The latent factor model with logic function can better explain the recommended results and also facilitate the subsequent calculation and representation,but the method does not utilize collaborative filtering for the neighbors between users and between items.Therefore,we introduce the idea of graph regularization to integrate the similarities of users and items into the objective function.The logical matrix factorization with graph regularization can achieve collaborative filtering,but in practice,users with higher similarity tend to have preference for the same item.Therefore,we retain the strongest similarity between samples according to the idea of K-nearest neighbors,the influence of the strongest similarity in collaborative filtering is improved,thereby improving the accuracy of the sequence generation model.Finally,we apply the improved matrix factorization algorithm to the field of bioinformatics,which specifically includes the prediction of long non-coding RNAprotein interactions and the prediction of long non-coding RNA-microRNA interactions.We approximate long non-coding RNA-proteins and long non-coding RNA-microRNAs into user-project models,where the interaction information can be equivalent to the user's scoring of the item,while the sequence similarities of long noncoding RNA,protein and microRNA are then used as collaborative filtering information in the model.In the leave-one-out cross validation experiment,the two models achieved AUC values of 0.9025 and 0.9319,respectively.Further,we tested two experiments separately to verify the validity of the model.All the results show that the neighborhood regularization term of collaborative filtering will reduce the computational efficiency,but the improved algorithm is superior to other algorithms in prediction accuracy,showing the good predictive ability and scalability of the algorithm in the field of long non-coding RNA regulation prediction.
Keywords/Search Tags:data mining, matrix factorization, neighborhood regularized, long noncoding RNA, regulatory prediction
PDF Full Text Request
Related items