Font Size: a A A

Research On Key Algorithm And Parallel Optimization Technology For Gene Expression Relationship Analysis

Posted on:2017-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:M GaoFull Text:PDF
GTID:2370330569998797Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In the analysis of gene expression relationship,DNA methylation and miRNA play unique roles in gene-specific expression and cancer occurrence.This paper analyzes the key algorithms and parallel optimization problems in current gene expression relationship analysis,designs and implements the software of WGBS(Whole Genome Bisulfite Sequencing)analysis software Hint-Hunt,P-Hint-Hunt and BSMAPOS,as well as MEGEAM,the miRNA activate gene expression analysis method.Experiments show that our software and method can achieve the desired purpose and achieve significant parallel acceleration effect,related softwares have been used in the practical research in Fudan University School of Medicine.First of all,WGBS analysis is one of the most important aspects in the research of whole genome DNA methylation.In view of the low matching rate of traditional pairedends synchronization method in WGBS,the new paired-ends asynchronous method brings new improvement in the aspect of theory.But the paired-ends asynchronous data generated by the new method can not be analyzed by the existing softwares.Therefore,this paper analyzes the characteristics of the paired-ends asynchronous data structure,the function and performance requirements of the new software.The improved SmithWaterman algorithm is used to perform the best similarity sequence mapping.And the optimal screening,score sharing,false positive recognition and other functions are implemented to develope a paired-ends asynchronous data processing software Hint-Hunt.The performance test showed that Hint-Hunt could correctly calculate the whole genome DNA methylation level,and the accuracy of the mapping was increased from about 75%to about 80%.Second,the current growth rate of sequencing data has exceeded Moore's Law,which means serial software or single machine based multi-threaded software can not meet the requirements of rapidly processing.In this paper,on the one hand,we optimize Hint-Hunt software to deep parallel software of multi-threads and multi-processes.Parallel software P-Hint-Hunt shows good stability,scalability and close to linear acceleration ratio during the test.On the other,in order to make full use of the existing paired-ends synchronous data in the current database,the multi-processes optimization is implemented for BSMAP,which is widely used at present but remains some limitations and problems.When using32 computing nodes in the actual test,the optimized BSMAPOS can reduce the original processing time of one sample data from nearly 43 hours to about 2.5 hours.Finally,in order to study the role of miRNAs in the regulation of gene expression,this paper design and implement the MEGEAM analysis method.Using the construction of co-expression network strategy,MEGEAM is implemented based on the expression data of miRNA and gene in the database TCGA.We used MEGEAM to analyze the data of 370 lung cancer samples and obtained 6 high-value miRNAs and 5 important genes with strong correlation with lung cancer.Nearly half of these results were obtained in the existing medical or biological literature,which provide the basis for the combination therapy of lung cancer.
Keywords/Search Tags:Whole Genome DNA Methylation, Smith-Waterman Algorithm, TH-2, miRNA Positive Regulation, Random Walk, Gene Expression Relationship Analysis
PDF Full Text Request
Related items