Machine learning techniques for alleviating inherent difficulties in bioinformatics data

Posted on:2016-06-05

Degree:Ph.D

Type:Dissertation

University:Florida Atlantic University

Candidate:Dittman, David J., II

Full Text:PDF

GTID:1478390017981238

Subject:Computer Science

Abstract/Summary:

In response to the massive amounts of data that make up a large number of bioinformatics datasets, it has become increasingly necessary for researchers to use computers to aid them in their endeavors. With difficulties such as high-dimensionality, class imbalance, noisy data, and difficult to learn class boundaries, being present within the data, bioinformatics datasets are a challenge to work with. One potential source of assistance is the domain of data mining and machine learning, a field which focuses on working with these large amounts of data and develops techniques to discover new trends and patterns that are hidden within the data and to increases the capability of researchers and practitioners to work with this data. Within this domain there are techniques designed to eliminate irrelevant or redundant features, balance the membership of the classes, handle errors found in the data, and build predictive models for future data.;This dissertation is an in-depth analysis of how the domain of data mining and machine learning is uniquely suited for alleviating the inherent difficulties found within bioinformatics datasets. First, we will present a number of different gene selection techniques in terms of their stability or robustness. Next, we will present an analysis of the entire process of ensemble gene selection including different approaches for implementing the ensemble and ranked feature list aggregation. Next, we will then provide a framework for using gene selection and classification with the focus of maximizing classification performance while simplifying the machine learning process. Then, we will discuss two new approaches for incorporating ensemble learning along with gene selection while comparing them to the case wherein no ensemble learning approach is applied. Lastly, we will give a detailed analysis of the data sampling process for bioinformatics data including which techniques should be used, when and how they should be applied, and to what extent should the data sampling be performed. Overall, this dissertation presents an thorough analysis on how the use of machine learning techniques can alleviate inherent difficulties found in bioinformatics data.

Keywords/Search Tags:

Data, Bioinformatics, Machine learning, Inherent difficulties, Techniques, Gene selection

Related items

1	Study On SVMs-based Classification Of Gene Expression Data
2	Studies on several bioinformatics problems with machine learning techniques
3	Research On Algorithms For Gene Recognition And Microarray Data Recognition
4	Machine Learning Methods And Their Applications In Bioinformatics
5	Applications Of Data Mining Techniques To Text Classification And Bioinformatics
6	Machine learning and bioinformatics
7	Support Vector Machine And Its Application In Gene Expression Data
8	Scalable machine learning using applications in bioinformatics and cybercrime
9	Research On Feature Selection Algorithm Base On Gene Expression Data
10	Research On Several Key Technologies Of Gene Expression Data Mining