Font Size: a A A

The Application Study Of Computational Intelligence In Bioinformatics

Posted on:2005-11-12Degree:DoctorType:Dissertation
Country:ChinaCandidate:S H PengFull Text:PDF
GTID:1118360122487904Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Artificial Intelligence (AI) has been advancing rapidly in recent years, and found applications in many fields, such as pattern recognition, machine learning, knowledge discovery, datamining. A great usage of it is in a newly evolved branch of science: bioinformatics. The accomplishment of the Human Genome Project (HGP), and the completion of more other genomes, AI will play bigger roles in computational biology and bioinformatics. In this thesis, AI applications are developed and used to analyze biological sequence and microarray data, with the following points of innovation that can be summed up:1. A new method for the splicing-site recognition of rice DNA sequences was designed. Based on the GT-AG intron organization principal, support vector machines (SVM) was used to predict the splicing sites. Through machine learning, a model was built on some test data set of true and pseudo splicing sites. The prediction accuracy obtained was 87.53% at the true 5' end splicing site and 87.37% at the true 3' end splicing sites, respectively.2. A new framework, named Information Entropy Model of Sliding Window (IEMSW), was proposed to effectively analyze the structural information contained in the 3'-UTR sequences around the polyadenylation site.3. Based on Support Vector Machines, a Machine Learning Model of Sliding Window (MLMSW) was put forward. The results obtained by this method prove the validity of the Entropy Model of Sliding Window (EMSW) in other aspect.4. To deal with the two-class classification problem, a new strategy of integrating the Genetic Algorithm and the LVQ artificial networks is adopted to reduce the dimensions in high dimension space. Using this method, the classification accuracy is 100% to Leukemia dataset, and 91.27% to clone cancer dataset.5. Combining the Genetic Algorithms and the Support Vector Machines with the adoption of filtering method, GA/SVM algorithm for choosing the features in high dimension space was proposed to solve multi-class classification. By using GA/SVM algorithm, the classification accuracy of 86.55% was obtained to NCI60 dataset, and of 91.23% to GCM dataset.
Keywords/Search Tags:Computational intelligence, Bioinfromatics, DNA sequence, microarray, Genetic Algorithms, Support Vector Machines, Classification, Cancer, Machine Learning, Oryza Sativa L., Splicing site, Intron, cis-element, LVQ artificial networks, 3'-UTR
PDF Full Text Request
Related items