Font Size: a A A

Application Research On Data Integration In Bioinformatics

Posted on:2013-09-22Degree:MasterType:Thesis
Country:ChinaCandidate:Y J RenFull Text:PDF
GTID:2248330371485122Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data integration is an open challenge in bioinformatics. Represented by the human genomicproject, the genomics era has generated and speed-up generating huge amount of biologicaldata. From DNA, RNA, SNP, protein to metabolic pathway network and so on, data fromnearly every level of organism are facing explosive growth. It is a very useful source for thebiologists, but opportunity and challenge exist together, facing the huge and growing amountof available biological data, the problem of biology data integration (that is, modelling,storing and querying) is now considered as a bottleneck in many biology researches. Astandard and accepted middle tunnel model is necessary for the realization of data integration.COSBI-Model is a simple and programming language-based modeling tool withuser-friendly interface. It has provides the biologists lots of convenient to model andcooperated with other software for simulation, and it has a wide prospect as a processcalculi-based modeling tool. But it has some shortage, like the function of integration. It couldnot provide reference model and outside information at runtime, so the extension of thedatabase-based integration ability to COSBI-Model is an urgent question to the designers.Moreover, depending on different interests of collectors and different schemas of databases,data are cluttered in different data format on the internet, we call this data heterogeneous.How to integrate a uniform and convenient data model and relational schema fromheterogeneous data sources is a big task.This thesis adopts mediator approach and mapping rule which defined based on GAVmethod to the solution of database heterogeneous issues. It brings lots of convenient to theusers for searching and extracting needed information, and promotes the database-assistedpathway model integration function to COSBI-Model. This paper first do the pre-processingto the XML document, including get the.list file with mapping between ID and real name ofevery gene or protein in the pathway model, then compare and add the real name into XMLfile. After the adjusting and filtering its schema and content, the XML file looks morecompatible to the COSBI-Model. Then we investigate the relationship between the modelingtool and pathway model, define proper integration rules, including dispose the conflict ofname, sites, reaction type etc. Due to the speciality and complexity of biology model, it is important to choose the initial model, the more typical of this model, so we need tocommunicate with the biologists constantly to get the most proper model for the completionof the modeling tool. In the realization aspect, we use the modeling software package plusrelated XML parser technology to integrate pathway model from KEGG and PID databasesinto the COSBI-Model according the integration rules. When the integration is almostfinished, the application will be updated as a Web Service client on the internet for thebiologists. Here we use the p53signalling pathway and p53regulation pathway because thebiology data is very special. P53is a very important cancer suppressor gene which plays avital part of tumour’s growing, so its pathways which have great significant meaning on theresearch of cancer are very representative and meaningful models. After defining integrationrules based on the p53pathway models, other pathway models were also integrated into theCOSBI-Model successfully.
Keywords/Search Tags:Data integration, XML, Biological database, Biological pathway
PDF Full Text Request
Related items