With the rapid development of Microarray technologies base on high throughput screening, masses of data emerge and challenge the statisticians because of the feature of "small sample with high dimensionality ". The Boosting algorithm, as one of the ensemble methods, fascinates many researchers with its nearly "perfect" classification capacity.In this research, we first introduced the idea of Boosting, described two fundamental procedures, AdaBoost and LogitBoost. Based on the two procedures, we constructed discriminant models of simulation data and traditional data. Comparisons of the predictive effects of Boosting, Bagging, Random-Forest, Fisher's Linear Discrimination, Fisher's Quadratic Discrimination and Logistic Discrimination were also discussed.With much care to the specificity of Microarray data, we analyzed two public databases: leukaemia and breast cancer data. The idea is as follows: (1 ) Use the FDR procedure to correct the P-Value, screen the gene variable with a criteria of P≤0.05 or P≤0.01 so as to make the dimensionality less than the sample size. Construct the discriminant model and compare Boosting with other two ensemble methods and three traditional methods; (2)Construct different discrimnant models with different sets of gene predictive variables based on the order of P-Value, and distinguish the advantages of Boosting(including precision and sensitivity). (3) Identify the advantages of Boosting by comparing it with principal component discriminant analysis. Predictive effects of the above methods should be confirmed by cross-validation to ensure the stability of the results. |