Invention of high-throughput technologies and accumulation of the genome-scale data have stimulated the fast development of the functional genomic field. Numerous algorithms have been developed to analyze these large scale data so as to predict gene functions, functional relationships, physical interactions and phenotypes associated to gene perturbations.My dissertation focuses on the development of these algorithms in higher organisms as well as the evolutionary interpretation of such functional genomic data across species. I first extended previous machine learning approaches and developed an ensemble method to predict protein functions in the laboratory mouse. Using those genomic data collected in mouse, I carried out data integration and generated a functional relationship network. Furthermore, I demonstrated how this functional network could be applied to identify disease genes so as to complement conventional quantitative genetics. Then I developed an algorithm to direct the generation of such genomic data in poorly-studied species. Based on experimental data generated under the direction of this algorithm, I designed a generalized quantification scheme for expression divergence between orthologs, using S. bayanus and S. cerevisiae as an example. I further went on to show how the divergence on expression, function and sequence level may differ through examples of duplicates. Finally, I used the combination of sequence and nucleosome occupancy change to explain the diverged expression between the two yeast species. |