Text Tumor is one of important diseases threatening human health. It is a complex and multi-stage process. Usually it's caused by mutations or gene abnormal expressions, which results in the change of protein molecules within cells. During the development of DNA microarray technology, the research of gastric cancer at molecular level is developing. Given the useful information and knowledge derived from the gene expression data, the nature of the gastric cancer can be understood better. It plays a pivotal role in promoting the clinical diagnosis and treatment of the gastric cancer. Decision Forests, proposed in 1990s, is an method for classification and feature selection. Combining the technologies of decision tree and classifier fusion, it is widely applied to tumor studies. This paper focuses on using Decision Forests to analyze gene expression profiling data of tumor and selecting featured genes. The main contributions of this research are summarized as following:(1) The performance of Decision Forests (DF) on feature selection is evaluated. DF is a supervised learning approach based on recursive partition trees. In order to validate the performance of classification, I have used SVM and compared it with three feature selection algorithms--SAM, ReliefF and PCA. The results of the experiment show that the accurate rate of classification is higher than others.(2) The class weight in DF has been studied. The tumor expression profiling data has small samples, large number of variables. It has large differencese between different categories. When some kind of samples occupied a small proportion of overall samples, it will reduce its impact on the final result. In this situation, we discussed the set of the class weight in DF. The results of experiment indicated that both classification performance and feature selection accuracy of small-sampled imbalanced data are improved.(3) In this paper, I have studied marker genes selected by DF and apply DF in gene signaling pathway analysis. Microarray gene expression profile analysis provides an extremely important tool for biological studies. But presently, most analysis methods based on single-gene level are easily affected by the number of samples and noises and neglect interactions among genes. Gene signaling pathway analysis is an effective method to analyze a gastric cancer microarray data. In this paper, I have used MAS (Molecule Annotation System) to analyze gene signaling pathway. MAS is a kind of softwares to analyze the relationship between genes. The study is a preliminary attempt to integrate machine learning algorithms with biological methods. I have also discussed the relations among genes and found some cancer-related genes and signaling pathways, which is of certain referential meaning for similar studies in future. |