Font Size: a A A

Efficient Tumor Traceability Prediction Based On Hybrid Machine Learning

Posted on:2022-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:X LiangFull Text:PDF
GTID:2518306539975749Subject:Applied Mathematics
Abstract/Summary:PDF Full Text Request
Tumor can happen in any tissue of the body,and can spread to any other part of the body after happening.In general,the happening of primary tumor and its metastatic site can be detected smoothly,and the primary source of metastatic cancer can be determined by clinical evaluation in a short time.However,some tumors are characterized by one or more metastasis points,and their primary focus is unknown.Even if doctors conduct comprehensive examination and Evaluation Research on patients with standard methods,they can not find the main source of tumor.This kind of tumor is called primary focus unknown metastatic carcinoma(CUP).Doctors will refer to the condition of the tumor in their treatment plan,treatment process,and medication for cancer patients to clarify the occurrence and metastasis of the tumor.Furthermore,the recognition of primary or metastatic tumor tissue is essential for doctors to develop precise treatment plans for patients.When the primary site of tumor is unknown,it is challenging to design specific treatment plans for patients.Machine learning methods have been widely used in prediction and classification,and have achieved significant advantages in clinical practice.This paper focuses on the accuracy of three hybrid machine learning methods in tracking tumor origin.These three hybrid machine learning methods are all random forest + classification algorithm hybrid machine learning mode.For random forest + naive Bayesian hybrid machine learning method to track the origin of tumors,integrated processing of data downloaded from the Cancer Genome Atlas(TCGA)and GENE EXPRESSION OMNIBUS(GEO)database.After preprocessing the data,Gini impurity of random forest is used as the scoring standard to select genes and get the input matrix.Then input the input matrix into the Naive Bayes classifier to get the accuracy of the prediction.AUC(area under the curve of ROC)is a common index to evaluate the quality of a model.It can comprehensively measure the effect of all possible classification thresholds and intuitively represent the performance of a classifier.The larger the AUC,the better.The result of 10-fold cross-validation showed that the AUC value was 0.91.For the random forest + decision tree hybrid machine learning method to track the origin of the tumor,the data used is processed in the same way as before,and the input matrix is obtained.Then input the input matrix into the decision tree classifier,and the accuracy of the prediction is 0.889.The AUC value is 0.886.For the random forest + K nearest neighbor(K=5)hybrid machine learning method to track the origin of the tumor,the data used is processed the same as before to obtain the input matrix.Then input the input matrix into the K nearest neighbor(K=5)classifier,and the accuracy of the prediction is 0.886.The AUC value is 0.886.
Keywords/Search Tags:Organization traceability, random forest, machine learning, naive Bayes, decision tree, k-nearest neighbor
PDF Full Text Request
Related items