With the successful application of modern bioinformatics computing methods in the field of association analysis between lncRNA and diseases,a large number of computational data with different structures and diverse sources have been derived based on traditional biological experimental data.Including structured lncRNA regulatory human disease relational database data,semantic web based biological information link data and various serialized multi-source heterogeneous data.As a technical method to describe knowledge and model the relationship between all things in the world by using graph model,the development of knowledge graph provides a solution for knowledge discovery from multi-source heterogeneous data.Based on multi-source heterogeneous data fusion,Modeling multi-source heterogeneous data to support further calculation and application through knowledge mapping construction techniques such as knowledge mining and representation has important research significance.The research content of this paper is divided into the following three parts:(1)This thesis proposes a multi-source heterogeneous lncRNA and disease knowledge map construction method.The method is divided into two layers.First,in the pattern layer,the ontology library is constructed by defining relevant terms and predicate expressions based on verified multisource knowledge in the domain.Secondly,in the data layer,homogenous fusion and standardized mapping were carried out for heterogeneous data from different sources through resource description framework,so as to realize the transformation of data from different structures to the same data structure and then to knowledge.Finally,lncRNA and disease knowledge map were constructed.Compared with the general knowledge graph construction process,this method can better retain relevant data characteristics and capture semantic information to better support the downstream application of knowledge graph.(2)This thesis proposes an application model of lncRNA and disease knowledge mapping based on bi-layer embedding cascade framework,The method based on knowledge atlas embedding and graph embedding framework of embedded double cascade,first of all to relationship after fusion of multi-source heterogeneous data classification,and use knowledge embedded respectively and graph embedding two embed mode corresponding to the quantitative,said through a two stage cascade feature selection strategy,make the feature set in figure structure topology information and retain knowledge level of semantic information.Experiments are carried out on multi-source heterogeneous data sets in Chapter 3,and the results show that this method performs well in link prediction tasks and has the ability to mine potential knowledge and relationships in multi-source heterogeneous data.(3)This thesis designs and implements an lncRNA and disease knowledge graph system.The system realized the knowledge graph visualization system of lncRNA and disease relationship from multiple project modules of knowledge visualization and link prediction through the MVC three-layer system architecture,and by writing corresponding page codes,encapsulating data fusion interface and link prediction function interface,which reflected the application value of the research content. |