| Objective:Patients with metastatic melanoma have a poor prognosis.In this study,artificial intelligence algorithm was used to establish an excellent prediction model for the classification of metastatic melanoma and primary melanoma,and a new model for the classification and recognition of melanoma was established,which will be conducive to improve the prognosis of melanoma patients,and provide convenient conditions for the choice of clinical treatment.Methods: 1.Collecting and sorting the expression profile data of reverse phase protein array of 354 melanoma patients in TCPA proteomics database,and remove the protein features with missing values.The methods of T-SNE and PCA were used to reduce and visualize the protein features.Characteristic biomarkers for proteins with optimal significance to the model were screened by Embeded method.The selected feature subset was imported into the python programming software,and the Scikit-learn package was used to construct the classification model of artificial intelligence algorithm.2.In this study,six kinds of artificial intelligence algorithms were used,and each melanoma patient was used as a sample size,and its corresponding protein characteristics were used as the attribute value of the sample size.In python software,the train_test_split method was used and random number seed was set.The first 80% of 354 cases were input as the training set into 6 kinds of artificial intelligence algorithm programs for black box operation.The algorithm learned the data and generated the machine learning model.The latter 20% of samples were used as test set to verify the accuracy and generalization ability of the model,and finally the classification model of metastatic melanoma and primary melanoma based on artificial intelligence algorithm was constructed.3.Based on the same data set,the adjustable superparameters of six artificial intelligence algorithms are optimized and adjusted to find the optimal solution of the prediction model formed by the data set under different artificial intelligence algorithms.And Uses the method of SMOTE or adjustment algorithm with unbalanced sample parameters for processing of the data sample labels don’t balance,after dealing with the unbalanced sample of data to build the model,and optimize the model parameters,seeking after dealing with the unbalanced samples of different algorithm the optimal solution.Results: 1.The T-SNE and PCA analysis results showed that the data of the expression profile of the inverse phase protein array of 354 melanoma patients were not completely linear,and the model based on linear fitting could not achieve a high accuracy.2.Based on the Embeded embedding method,16 protein feature biomarkers with the best significance for the model were selected from 213 protein features,respectively is X4EBP1_p T37T46,CKIT,CAVEOLIN1,ECADHERIN,EIF4 E,FIBRONECTIN,PR,YAP,EIF4 G,NRAS,NDRG1_p T346,RAB25,EPPK1,ANNEXIN1,MSH6,BRAF_p S445.3.In the logistic regression algorithm,the highest accuracy of the data on the test set before the sample imbalance processing is 85.92%,the sensitivity is 100%,the specificity is 50%,and the AUC value = 0.947.After SMOTE sample imbalance processing,the highest accuracy rate was 88.73%,sensitivity was 94.11%,specificity was 75%,and AUC value = 0.944.In the decision tree algorithm,the highest accuracy of the data on the test set before the sample imbalance processing is 84.50%,the sensitivity is 98.04%,the specificity is 50%,the AUC value = 0.870.After the unbalanced treatment of samples,the highest accuracy rate is 84.50%,sensitivity is 94.12%,specificity is 60%,AUC value = 0.767.In the random forest algorithm,the highest accuracy of data on the test set before the unbalanced treatment of samples reached 85.92%,the sensitivity was 100%,the specificity was 50%,and the AUC value was 0.923.After the unbalanced treatment of samples,the highest accuracy was 83.10%,the sensitivity was100%,the specificity was 40%,and the AUC value was 0.862.In the linear kernel support vector machine(SVM),the accuracy of the data on the test set before the unbalanced treatment of the sample reached the highest value of80.28%,the sensitivity was 100%,the specificity was 30%,the AUC value=0.943,after the unbalanced treatment of the sample,the accuracy was the highest value of 91.55%,the sensitivity was 100%,the specificity was 70%,the AUC value =0.943.In the support vector machine of polynomial kernel function,the accuracy of data on the test set before the unbalanced treatment of samples reached the highest value of 90.14%,the sensitivity was 100%,the specificity was 65%,the AUC value =0.878,after the unbalanced treatment of samples,the highest value of accuracy was 91.55%,the sensitivity was 96.07%,the specificity was 80%,the AUC value =0.918.In support vector machine of hyperbolic tangent kernel function,the accuracy of the data on the test set before sample imbalance treatment reached the highest value of 84.51%,the sensitivity was 100%,the specificity was 45%,the AUC value =0.903,after striking sample imbalance treatment,the accuracy was the highest value of 88.73%,the sensitivity was 96.07%,the specificity was 70%,the AUC value =0.933.In the support vector machine of gaussian radial basis kernel function,the accuracy of the data on the test set before the imbalance treatment reached 88.73%,the sensitivity was 100%,the specificity was 60%,and the AUC value was 0.860.After the imbalance treatment of the sample,the accuracy was 92.96%,the sensitivity was 100.00%,the specificity was 75%,and the AUC value was 0.907.In the naive bayes algorithm,the accuracy of data on the test set before sample imbalance treatment was 85.92%,the sensitivity was 98.04%,the specificity was 55%,and the AUC value was 0.922.After striking sample imbalance treatment,the highest accuracy value was 87.32%,the sensitivity was 96.08%,the specificity was 65%,and the AUC value was 0.921.In the extreme gradient lifting algorithm,the highest accuracy of the data on the test set before the sample imbalance processing is 84.51%,the sensitivity is 100%,the specificity is 45%,the AUC value = 0.933,after the SMOTE sample imbalance processing,the highest accuracy rate is 88.70%,the sensitivity is 98.04%,the specificity is60%,and the AUC value = 0.910.Conclusion: Applying artificial intelligence algorithm to the classification and recognition of melanoma can better distinguish metastatic melanoma from primary melanoma. |