Font Size: a A A

The Research On Multi-modal Biological Data Analysis And Mining

Posted on:2019-11-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z W LiuFull Text:PDF
GTID:1368330572950134Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of next generation sequencing and neuroimaging techniques,a large amount of biological data has been accumulated in different fields of life science.The rich information contained in these data makes it possible for us to understand the biological processes related to disease or specific phenotypes from different perspectives.However,the speed at which the rich information in the biological data is converted into biological knowledge is far slower than the speed at which the data is accumulated.An important reason is the lack of effective data mining algorithms.There are some characteristics of biological data,such as high-dimensional,small sample size and multimodality.The direct application of traditional data mining algorithms will lead to problems such as curse of dimensionality and overfitting.Therefore,developing suitable algorithms for specific biological problems can help accelerate the conversion of biological data to biological knowledge.In this paper,we will develop algorithms for specific biological problems and take the characteristics of data that used into consideration.Here,we will focus on two specific feilds in the life science: cancer and brain science.We will develop specific algorithms to help reveal the biological mechanisms behind complex diseases and cognitive behaviors,the details are as follows:1.Developing an algorithm for identifiying mi RNA modules shared by multiple cancers by using mi RNA expression profiles and to explore how mi RNAs work together to regulate the cancer hallmarks.Mi RNA is a kind of non-coding small RNA that has been shown to be closely related to the development of cancers.At the same time,although different types of cancers have their own characteristics,some common hallmarks are found to be shared by different cancers.However,we know very little about whether mi RNAs are involved in the regulation of the cancer hallmarks.Therefore,we proposed an algorithm that used the expression profile of mi RNAs to identify mi RNA modules that dysregulated in a variety of cancers.Using our algorithm to integratively analyze the expression profile of 12 different cancers,we obtained 217 such mi RNA modules shared by different cancers.Further,we ranked these modules and performed functional analysis on the top two modules,we found that both of the two modules can regulate the biological process of cell cycle and contribute to the hallmarks of cancers,such as continuous growth signals and insensitive to growth signal.2.A disease/symptom bi-color network model was proposed to help us gain insight into the mechanisms of brain disease.The model can help explore how the neural circuits associated with mental illness affect the severity of the patients' symptoms.Usually,the biomarkers that indentified in the neuroimaging studies for brain diseases are the neruoimaging features that significantly changed between the patient group and the control group.However,these features are often not directly related to the symptoms of the patients.We know a little about how the disease-related neural circuits affect the symptom severity of patients.In order to solve this problem,we proposed a novel disease/symptom bi-color network model to explore the relationship between disease-related features,symptom-related features and patients' symptoms.In the first-eposide schizophrenia patients,we found that the symptom-related functional brain network can mediate the relationship between disease-related functional brain networks and symptoms,providing a new perspective for studying the pathology of schizophrenia.3.We proposed a prediction method that could incorporate multi-modal data from the brain science to preidict the phenotype of individuals.It aimed to help explore the multimodal basis of a specific phenotype and provide an objective evaluation model for the phenotype.Multi-modal data contained relevant and complementary information,which could provide a comprehensive description of a specific cognitive process.How to integrate data of different modalities to investigate and predict a specific phenotype is a hot topic in the field of brain science.We proposed a prediction method that integrating multi-modal data based on a ‘cross-validation' procedure.On the one hand,the features obtained by using methods that based on the ‘cross-validation' procedure have been proven to be better than those obtained using the ‘correlation-based' method in their generalization ability.On the other hand,we could provide a more objective evaluation model for a specific phenotype.We applied this method to the study of figural creativity.One of the aim of our work is to determine the possible neural and genetic basis of individual's figural creativity.Further,using our multi-modal data prediction model,we could predict the creativity score of new individuals with an accuracy of 78.4%.4.We presented methods to provide the functional and genetic annotation for neuroimaging findings and built a Matlab toolbox based on these methods.The aim of our work was to use multi-modal biological knowledge existed in the databases to help provide reliable functional and genetic annotations for neuroimaging findings,which in turn helps to explain the biology mechinasim behind the neruoimaging findings.Non-invasive neuroimaging technologies makes it possible for us to study the neural mechanisms of cognition and brain diseases in vivo.However,the traditional way to interprete the neuroimaging results was often based on manual literature searching.Each individual study usually has a small sample size and a high false discovery rate that could not able to provide reliable explanations for neuroimaging findings.Currently,there is no toolbox for annotating neuroimaging results based on the large-scale databases contained bilogical knowledge.Inspired by the widly used enrichment analysis in the field of bioinformatics,using the functional and genetic knowledge of brain at the voxel-level,we developed a series of statistical methods to provide reliable functional and genetic annotations for different forms of neuroimaging results that are at region-level.By annotating the widely used functional altas of brain and the neuroimaging results of mental diseases obtained from real analysis,we further confirmed the reliability of our statistical methods and toolbox.
Keywords/Search Tags:multi-modality, data mining, cancer multi-omics, brain science
PDF Full Text Request
Related items