Research On A Few Key Issues In Bioinformation Data Mining And Its Application

Posted on:2005-03-15

Degree:Doctor

Type:Dissertation

Country:China

Candidate:R Li

Full Text:PDF

GTID:1118360212984597

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

Data mining has been an active field of study for more than a decade, and a number of algorithms are developed to deal with all sorts of data mining problems. These algorithms employ a variety of methodologies, including statistics, artificial intelligence, machine learning, and digital signal processing and so on. In recent years, data mining technologies are successfully applied to various industries and have demonstrated its knowledge discovery power. The focus is now on tailoring the data mining techniques to specific applications.A series of breakthroughs has been made in life science recently. As a result of the successful execution of the Human Genome Project (HGP) and related advancement in modern biology, a huge amount of data has been accumulated, providing a strong data base for uncovering the secret of life. Biological data is rich in variety, and high-throughput with high dimensions. In essence, it has heterogeneity and network characters that are far beyond the power of traditional analytical means. The analysis of these bio-data becomes the bottleneck of biological research. Applying information technology to molecular biology gives arise a brand new field, bioinformatics. As an effective way of finding the needle (biological knowledge) from a haystack (raw data), data mining has also become increasingly important in bioinformatics study.As a result of the progress of the genome research and the advent of many high throughput technologies, profound changes are taking place in the research methodology of modern biosciences. Even traditionally experiment-based subjects more and more utilize bioinformatics tools to interpret experimental data and to gather fruitful hints on experimental design. Therefore, finding effective analysis tools become a pressing issue for the advancement of modern biological sciences.Computer scientists have a lot of interests in getting involved in bioinformatics data analysis. As a major methodology, data mining has a promising role to play in this endeavor. However, researches on data mining in biology are still in a preliminary phase and are facing a lot of challenges. How to apply various data mining techniques effectively to biological information analysis is a hot subject today. This includes finding data mining system architecture, new algorithms, and new methodology, etc. that are suitable for bioinformatics data analysis.This dissertation focuses on data mining application in bioinformatics. The main results of this study are summarized as follows:1. Gene expression data analysis. After reviewing the existing gene expression data mining, analysis models on gene expression similarity explain, peculiar expression gene analysis, and bypath analysis model etc. have been proposed. Each model is mapped to given analysis flow and data mining algorithms. These models are practical for bio-data analysis.2. Bioinformatics data mining system architecture. To build a better data mining application framework for bioinformatics data analysis, we developed a 4-tier data mining architecture, BDMAPA, consists of data tier, data mining algorithm tier,analytical logic tier, and application tier. In this architecture, the data mining algorithm, analytical function, and application are layered logically; and the data mining algorithms and the analytical models are independent and reusable units. The benefits include, a bio-data mining system can be easily customized by selecting a set of analytical units; and users can understand the system at the algorithm level as well as at the application level.3. Integration and normalization of biological information. Gene expression data analysis means to take into account of the entire experimental process, including array design, sample preparation, experiment design, hybridization protocol, array scanning and image processing. In this dissertation, I described a new MIAME-compliant microarray database, CBioDB. As shown in our practice of biological data mining, this database serves well for microarray analysis.4. A general design for bioinformatics software. Bioinformatics software design is important part of bioinformatics research. I also designed and implemented a gene expression data mining system, CBioMiner. This system is constructed based on BDMAPA, and includes data mining analysis models required for gene expression analysis. It can provide whole analytical flow and result visualization to satisfy the main requests by gene expression analysis. It is shown to be scalable and with entity independence.

Keywords/Search Tags:

Data mining, bioinformatics, microarray, gene expression, analysis model, pathway analysis

PDF Full Text Request

Related items

1	Association Rules Mining And Its Applications In Microarray Gene Expression Data
2	Research On Relevant Problems Of DNA Microarray Expression Data Analysis
3	Research On Algorithms For The Cancer Differential Gene Expression In Gene Microarray
4	Several Studies On Of Feature Selection Algorithms That Incorporate Pathway Information To Identify Relevant Genes
5	Gaussian Mixture Model-based Clustering Analysis For Gene Microarray Expression Data
6	Data Analysis Of Expression With Gene Microarray And Investigation For Gene Regulatory Networks
7	Clustering Algorithm Research Based On Gene Expression Spectrum Data
8	Gene Microarray Data Analysis Based On Clustering Algorithms
9	Research On Analysis Of Gene Expression Profile Data In Bioinformatics
10	Construction And Application Of A Bioinformatics Analysis Platform Based On WEB Interfaces