Font Size: a A A

Research On Annotation System Of Sequences In Coding Region And Prediction System Of Sequences In Non-coding Region

Posted on:2010-08-30Degree:MasterType:Thesis
Country:ChinaCandidate:D Y ZhaoFull Text:PDF
GTID:2178360272495839Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid increase in sequence data from Human Genome Project, genomics and bioinformatics play a more important role in the biological research. One of the key applications of bioinformatics in genome science is the creation and maintenance of biological databases. These databases which are designed to store, manage and retrieve biological data using computational technology, become essential for life research.The mainstream public databases of sequences such as NCBI, EMBL and DDBJ save data through a general pattern, which does not contain the special type of information. In order to research and satisfy biologists'specific needs, several smaller, specialized databases have been constructed. At present, it is difficult to identify or predict sequences through experimental methods, and have always the high cost, low accuracy of prediction and relatively time-consuming. However, some bioinformatic prediction methods based on sequences features and intelligent algorithms were advanced. Some of these methods were used widely in various species, and validated by biology experiments. Based on the above concepts, two systems were constructed which are the annotation system of sequences in coding region named wikicell and the prediction system of sequences in non-coding region named PMirP.The wikicell as transcriptome annotation system and database is highly relevant and applied. Its logical structure based on the anatomical map of the human body which disassembles human body until the cell-level system. There are about thousands of complexy relationship nodes in the map. Millions of data are storage among these notes. We need to classify the data to organize thousands of nodes, and transform those into XML documents displayed in the wikicell pages. The bottleneck of data dealing is whether can find the root path of the each node of storage data in the fastest way. Furthermore, an improved algorithm for graph searching is provided, which search the entire map notes and find the goal node by adjacent matrix for map storage. Father of the current note is matched with other information of the goal note, and then be pushed into stack until finding the root note. If the matching information is unsuccessful, backdate finding until success. The search algorithm of root graph is provided to improve efficiency of constructing wikicell system furthest. The system is based on"wiki"concept which is supporting for collaborative writing in a hypertext community, but also including a set of auxiliary support writing instrument. It does not need essential area and write FAQ, not even need extra collation process, but its natural advantage of the accumulation of knowledge communicate the process.In addition, it displays a set of transcriptome data and annotation information to imitate human body image. On the technical side, it is set up by mediawiki engines depending on the windows operation system and PHP+mySQL+APACHE. Not only does it facilitate biology researchers to query transcriptome data and annotate the data, but also be able to enrich the knowledge of researchers. PMirP is used to predict pre-microRNAs which are prophase of regulation gene microRNAs and have hairpin secondary structure like stem-loop. It has advanced some bioinformatics methods to identify and predict pre-microRNAs based on the conservative sequence of microRNA and the precursor stem - loop structure character. However, most prediction methods have not been widely used, mainly because of: (1) the effect of some specific prediction methods good for special data, but low accuracy rate for others; (2) no software or web servers but only methods; (3) some complicated and unpractical software groups. Such as the above information, PMirP was built based on a new prediction method. The principle of prediction is the application of support vector machine (SVM) as classification and dealing with pre-microRNAs of the FASTA format and analysis them, then generating the input format of SVM and predicting the value of results with training model. The advantages of PMirP are: (a) low computing time (b) predicting multi-sequences at one time (c) all related software and source files are free. PMirP applied JSP + JAVA + tomcat platform. Input page is simple and practical, and output is orderly to display prediction results.Gene is consisted of coding region and non-coding region. The first is coding protein and the other regulates gene expression. Sequences of the gene coding region are large quantity, and the majority of gene expression information is still unknown. The transcriptome annotation system is helpful of research on gene expression information and finding new gene expression as a platform. Sequences of the regulatory gene in non-coding region are short and quantity unknown. It is difficult to identify through the biology experiments. The PMirP applied intelligent algorithm can search new pre-microRNAs among genome quickly, conveniently and widely used. In all diseases of mankind, cancer is dangerous killer and nasty to the people's health. Studies show that gene expression and regulation are associated with cancer, as well as closely related to the generation of cancer affecting. Therefore, the research on annotation system of sequences in coding region and prediction system of sequences in non-coding region is significance for human health.
Keywords/Search Tags:Wiki, EST, Transcriptome, Annotation, Prediction
PDF Full Text Request
Related items