Font Size: a A A

The Research Of Tumor Tracing With Unknown Primary Origin Based On Supervised Learning Method

Posted on:2021-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:X J LiuFull Text:PDF
GTID:2404330602493694Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
As environmental pollution increasing,people are more likely to exposure to carcinogenic factors and develop cancer.As a result,the incidence of cancer is increasing year by year.Cancers in human can transform from a tissue to another,which can lead to metastatic cancers.A large number of metastatic cancers require further diagnosis to determine the primary site.However,primary origin of lots of cancer patients are still unknown.Patients with cancer of unknown primary origin(CPU)account for 3 and 5% of total cancer diagnoses.In previous researches,Immunohistochemistry and medical images are used to help predicting the primary site of a tumor.However,it depends much on the experience and the ability of expert,and have a lot of limitations in applications.Firstly,relative techniques for detecting molecular profiling and related supervising learning theories were introduced.A discovery that low frequency mutation fragment enrichment of circulating tumor DNA can improve the detection of low frequency mutations was put forward.With the fast development of Next-Generation Sequencing(NGS)and d Digital droplet PCR(dd PCR),it is possible to detect gene mutation or obtain gene expression value in one experiment.And supervising learning algorithms can train a model for molecular profiling data.Therefore,this study proposed that using supervising learning algorithm and molecular profiling data contributes to tracing the primary site of a tumor.The relevant feature selection and classification algorithm in supervising learning and the relevant theoretical background are expounded to provide theoretical support for the subsequent tumor tracing methods.Then,this study proposed a new method of predicting the primary site of a tumor based on logistic regression.As a kind of important molecular profiling of tissue-specific genes,data somatic mutation can be utilized to identify the primary site of a tumor effectively for it differs in the primary and metastatic sites of a tumor.In this method,data preprocessing was firstly carried out on the raw data.Then,feature selection with Pearson Correlation Coefficient was used to obtain the most proper number of genes.Logistic regression was applied to construct classifiers,used for training data.Experimental results show that theproposed method can achieve higher prediction accuracy than the traditional method.Afterwards,this study came up with a novel method of predicting the primary site of a tumor based on random forest.Based on the researches of tumor tracing using mutation data,this paper designed the method for identifying the primary site of a tumor by using both of mutation data and gene expression profiling.In this method,data of these two molecular expression profiles were firstly collected and preprocessed.Due to the fast training speed and strong fitting ability of ensemble learning in training large-scale data and discrete data,the random forest algorithm was used to conduct feature selection and classifier construction.The classifier was then used to train the well-processed data.Experimental results show that the improved method has higher prediction accuracy and better robustness than the method based on logistic regression.Identifying the primary site of a tumor is very important for the diagnosis and treatment of cancer.Traditional methods cannot meet current medical needs.According to the differences in molecular profiling between the primary site and metastasis site of a tumor,by which supervising learning method can train a model for classification,somatic mutation data,gene expression profiling and combination of them were used as feature data.Then different feature selection and classification algorithms were used.Experiment results show that using combination of somatic mutation data and gene profiling with random forest algorithm can achieve better performance than using either of them alone,providing new research ideas and methods for tumor tracing,as well as showing application value and practice significance for the diagnosis and treatment of cancer.
Keywords/Search Tags:tumor tracing, supervising learning, random forest algorithm, logistic algorithm, molecular expression profiling
PDF Full Text Request
Related items