Font Size: a A A

Boosting algorithms for mining biomedical and biological data

Posted on:2010-05-11Degree:M.SType:Thesis
University:Wayne State UniversityCandidate:Krishnaraj, YazheneFull Text:PDF
GTID:2448390002974983Subject:Biology
Abstract/Summary:
Biomedical informatics is an interdisciplinary field that uses information technologies to analyze and understand biological and biomedical data to improve the detection, prevention, and treatment of disease. Data obtained from these applications contain valuable information that awaits advanced computational techniques for extraction and analysis. Machine learning and data mining techniques have proven to be excellent tools to extract the knowledge that is encapsulated in the form of various patterns in the data. Boosting is an adaptive supervised machine learning algorithm that have been successfully applied to different applications. It is a robust method that generates multiple classifiers from a base learner and ensembles them for building the best classifier. The base learner can be any weak learning algorithm which is already optimized for accuracy and boosting can still improve the accuracy. This Thesis focuses on applying boosting algorithms on biomedical informatics data for the classification task and compare its performance against the other traditional machine learning algorithms. Two critical data mining problems that are investigated in this Thesis are : early detection of breast cancer (which is critical for saving the lives of the cancer patients) and prediction of 3D structure of the protein (which is useful for functional classification). A best model for early cancer detection is created to achieve a higher AUC in a clinically relevant region. Protein structure prediction is done with both flat classification and hierarchical classification approaches. In both the approaches, boosting achieved better accuracy than the other successful algorithms in the literature. Boosting not only yields improved accuracy, but is also very efficient.
Keywords/Search Tags:Boosting, Data, Algorithms, Biomedical, Mining, Accuracy
Related items