Font Size: a A A

Genetic Data Information Analysis Methods And Its Application

Posted on:2013-08-05Degree:DoctorType:Dissertation
Country:ChinaCandidate:R H WuFull Text:PDF
GTID:1260330425983959Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the successful completion of human genome plan (HGP), more and more large biological molecular sequence data been produced. The need to analyze, maintain, process and study biomolecular data has directly or indirectly promoted the collaboration among computer science, molecular biology and mathematics, and give rise to new disciplines such as bioinformatics and computational biology, which have gradually developed into an active research area in natural science. The two major research directions of bioinformatics are:nucleic acids and proteins, including the analysis of the sequence, structure and function of nucleic acids and proteins. For example, sequence alignment, sequence encoding, sequence illustration?, sequence comparison, feature selection, feature extraction, molecular evolution, similarity analysis, protein structure prediction, comparative gene group learning and computer-aided gene (protein-coding genes) to identify the protein and RNA structure prediction, computer drug design.The ultimate goal of the bioinformatics data mining is to obtain a fundamental understanding of the pathogenic mechanism of human disease, thus effectively prevent and treat diseases, especially complex diseases that lead to high mortality. An analysis starting from the basic human genetic material-DNA sequence is an effective way of understanding the complexity of human diseases. Aiming at solving complex human diseases, this thesis use gene expression profile as a tool to find out how to use computational method to recognize gene and detect the important variants of feature genes. It can provide more effective DNA sequence level information, and lay a solid foundation for the study of complex diseases in system biology. These main works are as following:This thesis starts with a detailed review of literature, and a discussion on the existing gene selection methods. We then specifically address the current sequence feature classification method of expressing, and summarize the advantages and disadvantages of the various types of existing methods. Based on that, a detailed explanation on methodology is then presented, which includes gene selection method and DNA sequence of a graphical representation and the structure-based representation. The major research contributions of this thesis are as follows:We propose a new hybrid gene selection method for gene expression profile. First, we use filtering method to classify expression data, and then select genes with high score. Second, we apply ant colony algorithm for gene cluster and use SVM to evaluate candidate subsets. Finally, we validate the effectiveness of our method in solving similar problems via experiments.Based on different classification of nucleotide acids, we build a DC-R curve and the curve of a DC-Y in the Cartesian coordinate system, and then proposed a new two-dimensional graphical representation for gene characteristic sequence:, the DC-curve. At the same time, we introduce the advanced attributes of this method, such as a ring and non-degradation and DNA sequence of one-to-one correspondence. In addition, we discuss mutation analysis and similarity analysis on gene sequence based on these methods.Based on the position information of nucleotide acids and the composition of the gene sequence, we propose a new DNA coding method based on the structure of DNA. In addition, we also improve method based on the information theory combining sequence statistic characteristic for representing sequence. We use these methods to analysis similarity and construct phylogenetic tree.
Keywords/Search Tags:Representing gene sequence, Feature gene selection, Detect gene, Phylogenetic analysis, Phylogenetic tree constructed, mutationanalysis, sequence similarity analysis
PDF Full Text Request
Related items