Font Size: a A A

Integrating Domain Knowledge in Supervised Machine Learning to Assess the Risk of Breast Cancer Using Genomic Data

Posted on:2013-10-03Degree:M.SType:Thesis
University:University of Maryland, Baltimore CountyCandidate:Bochare, AniketFull Text:PDF
GTID:2454390008972899Subject:Biology
Abstract/Summary:PDF Full Text Request
Breast cancer is the most common form of cancer in women. Breast cancer comprises 22.9% of the invasive cancers in women and 16% of all the female cancers. Currently, treatment decisions are based primarily on clinical parameters, with little use of genomic data. Our study takes into consideration the data of postmenopausal women of European descent and their single nucleotide polymorphism (SNP) information to assess the risk of developing breast cancer. We used various supervised machine learning and data mining techniques to generate a model for predicting risk of breast cancer using only genomic data.;In this paper, we explored and compared 3 different techniques to generate a prediction model using the SNPs associated with breast cancer. We propose an approach to use 9 best SNPs obtained from various feature selection algorithms to improve binary classification accuracy and validate our results with the literature. We observed that a machine learning model generated without the domain knowledge yields poor prediction results. Hence, we used both domain knowledge of 11 informative SNPs and feature selection to develop a classification model. The machine learning model generated using both the domain knowledge and the feature selection techniques performed better compared to the naive approach of classification.
Keywords/Search Tags:Breast cancer, Domain knowledge, Machine learning, Using, Feature selection, Data, Risk, Genomic
PDF Full Text Request
Related items