Font Size: a A A

Genetic programming optimized neural networks for identifying gene-gene interactions

Posted on:2005-03-08Degree:Ph.DType:Dissertation
University:Vanderbilt UniversityCandidate:Ritchie, Marylyn DeRiggiFull Text:PDF
GTID:1450390008498827Subject:Biology
Abstract/Summary:PDF Full Text Request
The identification and characterization of susceptibility genes for common complex human diseases presents several difficult challenges for human geneticists. Many disease susceptibility genes exhibit effects that are dependent partially or solely on interactions with other genes. These interactions, known as epistasis, are difficult to detect using traditional statistical methods due to several important limitations [Templeton 2000]. The reason for the difficulty in identifying interactions is that in high-dimensions, many contingency table cells are empty which leads to large standard errors and coefficient estimates [Hosmer and Lemeshow 2000]. This is sometimes referred to as the curse of high-dimensionality. To deal with this issue, one can collect a very large sample size to reduce the number of empty cells. This can however, be prohibitively expensive. The other alternative is to develop new statistical methods that have improved power to identify high-order interactions in relatively small sample sizes. Many groups have used neural networks as a new statistical approach. Neural networks are a supervised pattern recognition method commonly used in many fields for data mining. Defining the NN architecture is crucial for success in data mining. This can be challenging when the underlying model of the data is unknown. Therefore, we will use genetic programming (GP) to optimize the architecture of the NN (GPNN). Through simulation studies, we will validate this new statistical approach and estimate the power of this method for detecting interactions. We will then compare the performance of this approach with that of a traditional neural network methodology. Finally, we will analyze two different breast cancer case-control data sets with the optimal neural network approach to detect gene-gene interactions associated with sporadic breast cancer. The goal of this study is to develop a new statistical methodology that has improved power for detecting gene-gene interactions in common, complex diseases and demonstrate its utility in both simulated data and real case-control data.
Keywords/Search Tags:Interactions, Neural networks, Gene-gene, Data, New statistical
PDF Full Text Request
Related items