Font Size: a A A

Computational Biology Approach Mining Key Genes From Microarray Data During Rat Liver Regeneration

Posted on:2015-04-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q LiuFull Text:PDF
GTID:2180330431978489Subject:Cell biology
Abstract/Summary:PDF Full Text Request
Biological high-throughput techniques can be used to detect expression level of multiplegenes/proteins simultaneously. Gene expression detection technology mainly includes microarray andRNA-seq. The main technology to detect protein expression includes two-dimensional electrophoresiscombined with mass spectrometry or iTRAQ. Biological high-throughput technique enables biologists andmedical workers to have the opportunity to detect gene/protein expression from the whole genome level.However, it remains a considerable challenge for researchers to explain and analyze a large amount of datagenerated by these high-throughput technologies. Therefore, this study systematically reviewed thehigh-throughput data analysis methods, utilized several methods for data mining of microarray detection ofliver regeneration, and then discovered some vital genes during liver regeneration.In the present, feature genes selection algorithms used generally include filter method, wrappermethod and embedded method. Among them, filter method has advantages of high speed and moreefficiency. This study established a filter method-based integrated statistical method through a combinationof12filtering methods, which was based on two criterions. The one was to sum up all ranks in everymethod and order it by ascended, and the other one was to count the frequency of a gene appeared invarious methods with descending order. According to above mentioned methods, sequence forward methodand genetic algorithm (ga) were respectively combined with four classifiers, such as the decision tree,support vector machine, naive bayesian network, and artificial neural network, and then were used forfurther filtering feature genes. In order to further analyze the interaction relationships between featuregenes, this study used Bayesian infer to construct static networks of key genes at each time point, dynamicnetworks in each of three stages and the overall regulatory network during rat liver regeneration. Thenpathway studio was used to gain the real interactions between genes. Afterwards, some network parameterswere used to analyze above-described networks. It showed that the results of integrated statistical methodswere consistent with the majority of filter methods and effectively avoided the deviation coming from thefixed method. Among them, the method based on correlation coefficient and T test was consistent with the integrated statistical methods. Among1000feature genes in rat liver regeneration,135genes werepreviously reported to be associated with liver regeneration. These genes were involved in cell proliferation,cell differentiation, immune response and various physiological activities. The results of the wrapperapproach showed that two methods can achieve high classification accuracy though relying on relativelysmall number of genes. Even if4-5genes were included in sequence forward method, three classifierclassification accuracy rates reached100%. Genetic algorithm reached about99%after several iterations,meaning that the selected genes between PH group and SO had a high discrimination. By searching relatedliterature, it was found that genes selected by sequence forward method were closely associated withmetabolism, whereas the genes selected by genetic algorithm (ga) was more closely related to liverregeneration, such as Myc proto-oncogene involved in cell proliferation, Glod5up-regulated in liver tumortissues and participating in the negative regulation of liver tumor. The Bayesian network was sparse in theearly phase and the termination phase, but more complex in the progressing stage during rat liverregeneration, which were consistent with the actual network. Among18highest scoring relationshipsestablished by Bayesian network infer,7relations were previously conformed by literature. The timepoint-knockout experiment found that12h was the greatest important time point through the whole processof liver regeneration, and some research also reported12h is a turning point between the initiation and theprogression stage in rat liver regeneration. Through the network analysis, a lot of important genes werediscovered as node genes, such as Tec maybe involved in the hepatocyte activation in early liverregeneration, Lyn involved in the negative regulation of the mitochondria-mediated apoptosis in the earlystage, and then in inhibition of hepatocytes apoptosis in early liver damage.In summary, key genes can be effectively mined by combining a variety of methods in rat liverregeneration. For high-throughput analysis, this study formed an integrated analysis method.
Keywords/Search Tags:Biological high-throughout check data, filter method, wrapper approach, Bayesiannetworks analysis, integrated statistical method
PDF Full Text Request
Related items