Font Size: a A A

The Study Of Dna-microarray Analysis And Protein-protein Interaction Networks

Posted on:2011-01-28Degree:MasterType:Thesis
Country:ChinaCandidate:M L HouFull Text:PDF
GTID:2120360308955309Subject:Bioinformatics
Abstract/Summary:PDF Full Text Request
DNA microarray technology, a powerful tool in functional genome studies, has yet to be widely accepted for extracting disease-relevant genes, diagnosis and classification of human tumor. However, current studies are confronted with over-fitting and dimensionality curse in tumor classification and false positives in the identification of cancer biomarkers due to the small sample size problem along with high dimensions. An efficient way to solve these problems is gene selection. Identifying minimum gene subsets means discarding most noise and redundancy in dataset to the utmost extent, resulting in not only classification accuracy improvement but also tumor diagnosis cost decrease in clinical application, which is still a key challenge in gene expression profile based tumor classification. In this study, we developed a novel gene ranking method based on neighborhood rough set reduction for molecular cancer classification based on gene expression profile. Furthermore, we applied literature search to see whether there is a closely relationship between our top-ranked genes and crucial processes of tumor development and protein-protein interaction network that consists of nodes which represent proteins encoded by the corresponding genes and linkers which represent protein-protein interactions to examine gene function and gene interaction pathway through examining the neighborhood partners of the selected genes. Results show that the selected genes are closely related to tumors. However, some proteins that encoded by the selected genes can interact with a very large number of proteins, reaching tens and even hundreds. Since a single protein cannot interact with such a large number of partners at the same time, this presents a challenge: which interactions can occur simultaneously and which are mutually excluded and how can a hub protein interact with many proteins with different affinities? Addressing this question adds a fourth dimension into interaction maps. Therefore, we hope to construct a structural network including time dimensionality by linking kinds of data such as structures protein complexes and time series of mRNA expression data, which is a focus of our future research. Now, we have made some works on protein affinities prediction. The main works in this thesis can be introduced as follows: 1) A breadth-first heuristic search algorithm based on neighborhood rough set reduction was proposed to select numerous gene subsets. Previous studies showed that significant class predictor genes whose expression profile vector show remarkable discrimination capability among different class samples of specific cancer maybe play a crucial role in the development of cancer. We hypothesized that the occurrence probability of genes in the final selected gene subsets may reflect the power of tumor classification and the significance of them to some extent. The top-ranked genes according to occurrence probability were used as model inputs for molecular cancer classification based on gene expression profile. Comparison with other methods such as PAM, ClaNC, Kruskal-Wallis rank sum test and Relief-F, our method shows that only few top-ranked genes could achieve higher tumor classification accuracy. Moreover, although the selected genes are not typical of known oncogenes, they are found to play a crucial role in the occurrence of tumor through searching the scientific literature and analyzing protein interaction partners, which may be used as candidate cancer biomarkers.2) A simple knowledge-based statistical energy function on residual level was presented to predict the affinity of protein-protein complexes by using 20 residue types and a distance-free reference state. The correlation coefficients between experimentally measured protein-protein binding affinities (PPIA) and the predicted affinities by our approach are 0.74 for 82 protein-protein (peptide) complexes, despite the fact the affinities and structures of protein-protein (peptide) are not used in constructing the energy function. The results of the distance independent residual level potential of mean force (DIRPMF) energy function on protein-protein complexes are compared to the published results of two other volume corrected knowledge-based scoring functions on atomic level. The proposed approach not only is the simplest but also yields the comparable correlation between theoretical and experimental binding affinities of the test sets with the reported best methods.
Keywords/Search Tags:DNA microarray, protein-protein interaction networks, affinity prediction, mean force of potential, tumor classification, gene selection, neighborhood rough set, breadth-first heuristic search
PDF Full Text Request
Related items