| With the development of the Internet,the data informatization construction of medical institutions is also gradually improving,and these rich data resources contain a lot of valuable information to help researchers gain a deeper understanding of the diseases.As a routine tool in system biomedicine,complex networks can obtain a panorama of the global and local relationship between diseases,and subsequently better guide doctors in clinical practice.Many studies now focus on common systemic diseases,while few studies focus on ophthalmic diseases and discover the relationship between ophthalmic diseases and other diseases.As the ophthalmic diseases,the diagnosis is complicated because the eyes are closely related to other organs.Existing disease network construction algorithms do not introduce parallel execution strategies,and higher time cost when processing large amounts of data.Therefore,how to efficiently perform information mining and analysis on large-scale ophthalmic disease data is an urgent problem.Aiming at the problems mentioned above,this thesis proposes an ophthalmic disease relationship analysis framework to explore the relationship between diseases.The framework first performs data preprocessing,and then uses the Spark distributed computing framework to calculate the correlation coefficient matrix of the disease data set,build disease symbiosis networks,and obtain the symbiosis relationship between the diseases.Second,the community detection of the symbiotic network leads to symbiotic disease clusters.Finally,Spark is used to calculate the RR value of the disease pair in parallel,and the disease trajectory network is constructed to obtain the development trajectory between the diseases.The research content of this thesis is divided into the following three aspects:(1)Symbiosis network construction and community detection in the analysis framework of ophthalmic diseases to explore the symbiotic relationship between diseases.The correlation coefficient matrix of the disease was obtained through statistics and calculations on the 490,000 rows of in-patient medical record front sheet in the ophthalmic center,and performing threshold screening operation based on the value of the matrix to construct symbiosis network(82 nodes and 122 edges).Each node in the network represents a disease,the continuous edge indicates a pair of symbiotic diseases,the weight of the continuous edge stands for the degree of symbiosis.Through the symbiosis network,it is possible to obtain some pairs of diseases with higher symbiosis,such as elderly cataracts and acute glaucoma,traumatic cataracts and eye penetrating injuries.From three different perspectives,BGLL algorithm,weighted GN algorithm and spectral clustering algorithm are used to perform community detection on the symbiotic network.The results of community discovery with clear structure were acquired,and symbiotic disease clusters in ophthalmic diseases were obtained.(2)Trajectory network construction in the analysis framework of ophthalmic diseases to explore the development trajectory between diseases.The RR value of the disease pair is obtained by performing statistics,sampling and calculation operations on the in-patient medical record front sheet,and performing threshold screening operation based on the RR value to build trajectory network(105 nodes and 478 edges).The nodes in the network represent diseases,and the directed edges between nodes indicate the disease may caused by another diseases.Through the trajectory network,simple paths of development of specific diseases are obtained.For example,refractive errors can cause alternating esotropia,and senile cataract can cause macular edema.(3)Building networks in parallel with the Spark distributed computing framework.In the process of constructing ophthalmic disease symbiosis network and trajectory network,due to the large amount of data,the calculation process is time-consuming.Using the advantages of Spark distributed computing framework for in-memory computing and efficient processing of batch data,the algorithms for constructing symbiotic networks and trajectory networks are designed in parallel and deployed in Spark cluster,to speed up the network construction process and improve the efficiency of the algorithm.The analysis of the results shows that the ophthalmic disease relationship analysis framework proposed in this thesis can obtain more accurate and quantitative symbiotic relationship and development trajectory between ophthalmic diseases.At the same time,using Spark distributed computing framework improves the efficiency of network building algorithms.Through the above framework,it can help doctors to take effective treatment strategies to prevent related diseases from occurring earlier. |