| As people contact things contains material that is harmful to body more and more frequently in the daily life, the incidence of cancer is higher than before. In this era of big data, how to select the valid part of the complex data becomes quite important. Statistical learning methods can help us dig out the useful information more effectively, so it has turn into an significant research field.The article focuses on reverse phase protein arrays(RPPA) and cell proliferation data scanned by a group of breast cancer cells MDA-MB-231 of MD Anderson. Based on the data, three models include linear regression, support vector machines(SVM) and random forest(RF) are trained, so that the key protein to control breast cancer cell proliferation is founded. Finally, we can make the key protein as potential target of cancer drugs.Due to the high volatility of the data here, we preprocess the data of RPPA first in this paper to reduce the influence of the data statistics efficiency. Then take the preprocessed RPPA as input data, the cell proliferation as output data, to train the linear regression, SVM and RF respectively. In the application of linear regression model, we propose and use a method combining principal component analysis(PCA) with linear regression models. Finally by comparing the results of three models, a model with a higher accuracy that can scan the protein combination with pivotal influence is obtained.The results in this article show that, linear regression model has a high accuracy, and SVM model can screen the protein combination influencing the breast cancer cell proliferation as a key role. The performance of RF is quite outstanding in both of the two aspects. Finally, analyzing RPPA by RF we obtain28 kinds of protein influencing breast cancer cells heavily. By checking related literature, we found that 21 kinds of the 28 really have a great impact on breast cell proliferation. |