| Causation goes beyond correlation and reveals patterns of changes in one variable in the data that cause another.At present,causal relationship inference has been widely studied in many fields and has become an effective method to understand and explore the intrinsic driving relationships of complex systems.In the field of bioinformatics,gene expression regulatory network reconstruction is one of the most typical research hotspots in causal relationship inference.The living body system is a very complex overall operating structure.Genes in living organisms do not exist in isolation,and there is a particularly complex interactive regulatory relationship between genes.The study of gene expression regulatory network not only helps to understand the operation mechanism of cell life process,but also helps to reveal the mystery of life process,and can also provide new ideas for the treatment of complex diseases.The understanding and exploration of gene regulatory networks has been proven to play an important role in disease diagnosis and genomic drug design.In addition,gene expression regulatory networks can also visualize the dynamics of gene transcriptional changes and biophysiological states,which play an important role in understanding the genetic basis of phenotypic traits.However,how to infer and construct effective and reliable gene expression regulatory networks is still a research difficulty.Especially when inferring larger-scale gene regulatory network relationships,it brings greater challenges to existing gene expression regulatory network construction methods.The main work and research results of this thesis are as follows:(1)In view of the limitations of current gene expression regulatory network construction methods,this thesis proposes a gene expression regulatory network construction method GRNTSTE(Gene Regulatory Networks Based on Time Series Data and Transfer Entropy)based on time series gene expression big data and transfer entropy.This method takes the transfer entropy theory as the core,and analyzes large-scale time series gene expression data by computational analysis,so as to infer the causal regulatory relationship between genes,and then construct a gene expression regulatory network.In order to verify the performance and effectiveness of the GRNTSTE method,this thesis conducts experimental verification based on the open source dataset DREAM3 challenge and IRMA OFF/ON dataset,and compares it with the most effective gene expression regulation network construction algorithm available.The experimental results prove that The GRNTSTE method has better performance and sensitivity.(2)The activity of the pineal gland as the center of the biological rhythm has obvious periodic characteristics,and scientists have found that the pineal gland can transmit the "time signal" to the central nervous system in the form of secreting hormones,thereby regulating the biological body’s rhythmic biological clock.Current research shows that melatonin secreted by the pineal gland changes periodically during the day and night,and plays an important role in regulating the sleep and wakefulness of biological organisms.Therefore,understanding and exploring the secretion mechanism of melatonin is of great significance for the treatment of people with abnormal sleep problems.In this thesis,the GRNTSTE method was applied to the large data set of time-series gene expression in rat pineal gland tissue collected by the Key Laboratory of Big Data Research and Application of Inner Mongolia Autonomous Region,and a complete framework for the construction of the rat pineal rhythm gene expression regulatory network was proposed.The framework consists of 6 steps,namely time series gene expression data collection;preprocessing of time series gene expression data;characteristic gene selection;calculation of transfer entropy between paired genes;gene regulation relationship screening;gene expression regulation network construction.Based on this framework,the rhythm gene expression regulatory network in rat pineal gland tissue was effectively constructed,which provided a valuable reference for biological verification experiments,and was of great significance for in-depth exploration of the mechanism of melatonin secretion in the pineal gland.(3)In view of the low efficiency and time-consuming problem of transfer entropy calculation in the construction of gene expression regulation network by GRNTSTE method,this thesis proposes a scalable parallel transfer entropy calculation method based on Spark big data technology.This method adopts the idea of distributed computing of big data,divides the overall computing task into multiple independent subtasks,and distributes them to different computing nodes of the Spark cluster for distributed parallel computing,thereby greatly improving the computing efficiency of transfer entropy.Finally,this thesis summarizes and analyzes the research results of the subject,and points out the links to be optimized in the current method and the future research prospects. |