Font Size: a A A

Research On Algorithms For Gene Recognition And Microarray Data Recognition

Posted on:2009-05-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ZhouFull Text:PDF
GTID:1118360245463119Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Bioinformatics is an intersectional disciplinary approach drawing from specific disciplines such as biology, physic, computer science, statistics, mathematics, physics and chemistry. It is arisen with the human genome project in the end of 1980'. From naissance to development, it has come through three periods. They are called Pro-Genome-Era, Genome-Era and Post Genome Era respectively. The current bioinformatics research has changed from data storage, classification, searching into genome analysis, proteome analysis and compare analysis, conformity and approach of system biology. Annotation of genome is the aim of human in the future. Gene recognition and gene function recognition are both of genome annotation. Microarray is a powerful and useful tool for the research of gene function annotation. Microarray always has a small number of samples, high dimension and noise in the dataset. A powerful tool for recognizing microarray noise data is a good hand to promote the development of molecular biology. In this dissertation it is our aim to find a good solution about gene recognition, essential gene recognition and mislabel sample recognition.Machine learning studies how to simulate human learning. It is converging from several sources, such as artificial intelligence, computational intelligence, statistics, mathematics, psychological, philosophy, adaptive control theory, informatics, biology etc. It nearly includes all human cognition domains and has been evaluated well. Fusing correlation machine learning methods, supplementing their superiorities, and then proposing new models and algorithms will promote the development of gene recognition and microarray data recognition effectively. The task of this dissertation is gene recognition and microarray data recognition. The proposed algorithms belong to machine learing fields.Base upon comprehensively analyzing and understanding the present research status, opening topics and developing trendency in gene recognition and microarray data recognition of bioinformatics, we mainly focus on the research of recognition algorithms in gene recognition and noise data recognition of microarray. About gene recognition, we propose a novel algorithm model base on equitable weights, this model and machine learning algorithms are used to solve the problem of novel fields of gene recognition. About essential gene recognition, we use multi artificial neural network and support vector machine to solve this problem and got a better results. After the sufficient analysis and study on microarray, focusing on the features of microarray noise data, we propose two algorithms, called Generalized CL-stability and Generalized CL-Stability with exclusion, for recognizing noise data in microarray. The main contributions and contents are described as follows:(1) Sum up the research of gene recognition and microarray noise data recognition in bioinformatics. Introduce background, application, present research status, challenge and developing trendency of gene recognition and microarray noise data recognition respectively. All these works make the foundation of further research and study.(2) Introduce related basic machine learning theory in the gene prediction and the microarray noise data recognition, including the structure design and learning theory of RBF neural network, rationale of evolution computation as well as statistics theory.(3) About protein coding gene, we propose an algorithm model based on equitable weights and RBF neural network to recognize gene. This algorithm adopts a fusing strategy. It integrates three famous gene recognition programs. Three normal datasets are used to test this algorithm. The results show that the proposed algorithm is feasible and effective.(4) About essential gene, we apply two machine learning algorithms to solve these problems and obtain good results. These algorithms are artificial neural network and support vector machine. Six types of ANN and two types of SVM are used. The experiment results show these algorithms can be used to recognize essential gene.(5) About microarray noise data recognition, we propose two algorithm models to recognize and modify mislabeled samples and abnormal samples in microarray base on support vector machine. The names of the two algorithms are Generalized CL-stability and Generalized CL-stability with exclusion respectively. The ideas of the two proposed algorithms are based on the stability of each sample. The benefit of these algorithms is not only on the accuracy, but also that it can show more information regarding which sample is mislabeled and which sample is abnormal. In this dissertation, we use microarray datasets and synthetic dataset to test these algorithms. Experiment results show that these algorithms have a good accurate and better than that from other existing algorithms.The research of this dissertation has enriched the study of machine learning theory application. It has significance in applications, such as combination of evolution computational with neural network, design and parameter study of neural network structure, improvement and optimization of support vector machine etc. Furthermore, it provids significant method and strategy for the application of gene prediction and microarray noise data recognition. Hopefully, these algorithms could be benefit to improve the study of biology and medicine.
Keywords/Search Tags:bioinformatics, gene, essential gene, gene recognition, essential gene recognition, microarray, mislabeled sample recognition, abnormal sample recognition, machine learning, neuro network, genertic algorithm, support vector machine
PDF Full Text Request
Related items