| Pancreatic ductal adenocarcinoma(PDAC)is a highly malignant digestive tract tumor and it is the most common type of pancreatic cancer,accounting for 95% of all pancreatic cancer cases.PDAC has a very high fatality rate and is the seventh leading cause of cancer death worldwide.Identifying biomarkers associated with pancreatic ductal adenocarcinoma can contribute to the early diagnosis of pancreatic ductal adenocarcinoma,improve the survival rate of patients,and further understand the pathogenesis of the cancer.Because the current discovery of biomarkers for pancreatic ductal adenocarcinoma falls far short of the need for accurate diagnosis and drug development of this cancer,there is an urgent need to find biomarkers that adequately characterize pancreatic ductal adenocarcinoma.In this study,the gene expression data of 537 pancreatic ductal adenocarcinoma samples and 75 normal samples were used as the baseline study data.Firstly,35430 reverse gene pairs were obtained by using the size relationship between gene expression values in tissue samples,that is,gene pairs with stable expression order in more than 75%of pancreatic ductal adenocarcinoma samples and in more than 75% of normal samples,but in the opposite expression order in the two types of samples.And then,using the correlation between different genes,by calculating the Pearson correlation coefficient and partial correlation coefficient between any two genes,11337 differential partial correlation gene pairs that were significantly partially correlated in pancreatic ductal adenocarcinoma samples,but not correlated in normal samples,or not correlated in pancreatic ductal adenocarcinoma samples,but significantly partially correlated in normal samples were obtained.The reverse gene pairs and the differential partial correlation gene pairs were intersected to obtain 31 reverse differential partial correlation gene pairs containing 60 genes.Using the expression values of 60 genes and 31 reverse differential partial correlation gene pairs as features respectively,Minimum Redundancy Maximum Relevance algorithm and Incremental Feature Selection method were used for feature selection,Support Vector Machine,Random Forest and Logistic Regression were used to establish models,10 gene pairs(20 genes)as the gene markers of pancreatic ductal adenocarcinoma were obtained finally.Among the three machine learning methods,the Support Vector Machine obtained the best classification effect.10 gene pairs were used as feature set,the Support Vector Machine was performed on the training dataset using five-fold cross-validation,and the accuracy is 0.9606,the precision is 0.9693,the recall is 0.9876,the AUROC and AUPRC are 0.9960 and 1,respectively.Furthermore,177 pancreatic ductal adenocarcinoma samples from the TCGA database as well as 45 pancreatic ductal adenocarcinoma samples and 8 normal samples from the GEO database were taken as independent test samples,the 10 gene pairs combined with Support Vector Machine could accurately identify 0.99 pancreatic ductal adenocarcinoma samples from the TCGA database,and generate 0.87 accuracy for the samples of GEO database.The above results show that the 10 gene pairs obtained in this study can effectively distinguish between pancreatic ductal adenocarcinoma samples and normal samples,and can be used as diagnostic markers for early pancreatic ductal adenocarcinoma. |