Font Size: a A A

Research On A Few Key Issues In Bioinformation Data Mining And Its Application

Posted on:2005-03-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:R LiFull Text:PDF
GTID:1118360212984597Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining has been an active field of study for more than a decade, and a number of algorithms are developed to deal with all sorts of data mining problems. These algorithms employ a variety of methodologies, including statistics, artificial intelligence, machine learning, and digital signal processing and so on. In recent years, data mining technologies are successfully applied to various industries and have demonstrated its knowledge discovery power. The focus is now on tailoring the data mining techniques to specific applications.A series of breakthroughs has been made in life science recently. As a result of the successful execution of the Human Genome Project (HGP) and related advancement in modern biology, a huge amount of data has been accumulated, providing a strong data base for uncovering the secret of life. Biological data is rich in variety, and high-throughput with high dimensions. In essence, it has heterogeneity and network characters that are far beyond the power of traditional analytical means. The analysis of these bio-data becomes the bottleneck of biological research. Applying information technology to molecular biology gives arise a brand new field, bioinformatics. As an effective way of finding the needle (biological knowledge) from a haystack (raw data), data mining has also become increasingly important in bioinformatics study.As a result of the progress of the genome research and the advent of many high throughput technologies, profound changes are taking place in the research methodology of modern biosciences. Even traditionally experiment-based subjects more and more utilize bioinformatics tools to interpret experimental data and to gather fruitful hints on experimental design. Therefore, finding effective analysis tools become a pressing issue for the advancement of modern biological sciences.Computer scientists have a lot of interests in getting involved in bioinformatics data analysis. As a major methodology, data mining has a promising role to play in this endeavor. However, researches on data mining in biology are still in a preliminary phase and are facing a lot of challenges. How to apply various data mining techniques effectively to biological information analysis is a hot subject today. This includes finding data mining system architecture, new algorithms, and new methodology, etc. that are suitable for bioinformatics data analysis.This dissertation focuses on data mining application in bioinformatics. The main results of this study are summarized as follows:1. Gene expression data analysis. After reviewing the existing gene expression data mining, analysis models on gene expression similarity explain, peculiar expression gene analysis, and bypath analysis model etc. have been proposed. Each model is mapped to given analysis flow and data mining algorithms. These models are practical for bio-data analysis.2. Bioinformatics data mining system architecture. To build a better data mining application framework for bioinformatics data analysis, we developed a 4-tier data mining architecture, BDMAPA, consists of data tier, data mining algorithm tier,analytical logic tier, and application tier. In this architecture, the data mining algorithm, analytical function, and application are layered logically; and the data mining algorithms and the analytical models are independent and reusable units. The benefits include, a bio-data mining system can be easily customized by selecting a set of analytical units; and users can understand the system at the algorithm level as well as at the application level.3. Integration and normalization of biological information. Gene expression data analysis means to take into account of the entire experimental process, including array design, sample preparation, experiment design, hybridization protocol, array scanning and image processing. In this dissertation, I described a new MIAME-compliant microarray database, CBioDB. As shown in our practice of biological data mining, this database serves well for microarray analysis.4. A general design for bioinformatics software. Bioinformatics software design is important part of bioinformatics research. I also designed and implemented a gene expression data mining system, CBioMiner. This system is constructed based on BDMAPA, and includes data mining analysis models required for gene expression analysis. It can provide whole analytical flow and result visualization to satisfy the main requests by gene expression analysis. It is shown to be scalable and with entity independence.
Keywords/Search Tags:Data mining, bioinformatics, microarray, gene expression, analysis model, pathway analysis
PDF Full Text Request
Related items