| Background and purposeWheat intolerance refers to that the immune system recognizes some components in wheat as harmful substances,and the body produces excessive protective immune response,inducing type III allergy and causing clinicopathological symptoms in multiple systems.At present,most studies on wheat intolerance are to detect the positive rate of food-specific antibodies in the population,or to screen possible differentially expressed proteins from plants.No studies on the pathogenesis of wheat intolerance have been conducted from the direction of human serum proteomics.Sig Feature combines univariate and multivariate screening methods,which not only considers the correlation between variables,but also can better eliminate noise variables from small sample data,and obtain variables with fewer numbers,higher classification accuracy and better biological functions.EFS integrates the feature ordering of each sample subset with a completely linear aggregation method to improve the stability of variable screening method.Based on this,this study analyzed the classification performance of sigFeature and other variable screening methods,and evaluated the stability of sigFeature combined with EFS.Sig Feature and EFS were used to screen differentially expressed proteins of wheat intolerance,analyze the function and biological pathway of differentially expressed proteins,and provide basis for the study of molecular mechanism of wheat intolerance.MethodsSimulation study: The simulation study generated 128 sets of simulation data with different parameters.The balance accuracy was used to compare the classification performance of sigFeature with SVM,t test,RF and PLS,and the Kuncheva index was used to compare the stability of variable screening with EFS combined with sigFeature and sigFeature alone.Example analysis: Wheat intolerance serum samples were collected,and protein labeling was performed by TMT technique.sigFeature and EFS were used to screen differential expression proteins of wheat intolerance.GO and KEGG enrichment were used to analyze the function and biological pathway of differentially expressed proteins.ROC curves were used to analyze the classification performance of differentially expressed proteins in variable screening datasets and validation sets.ResultsSimulation study: the comparison of balance accuracy showed that when the sample size was less than 30,the balance accuracy of sigFeature was higher than SVM,t-test,PLS and RF.With the increase of sample size,the balance accuracy of the five methods increased.When the sample size is 200,the balance accuracy of sigFeature,PLS and t-test approaches 1.Compared with KI of EFS-sigFeature,KI of EFS-sigFeature is higher than that of sigFeature when the number of differential variables obtained by variable screening is small.The number of differential variables increased,and KI increased in both methods.When the number of differential variables obtained by variable screening is equal to that of parameter setting,KI of sigFeature is close to EFS-sigFeature.In case analysis,14 serum samples from wheat intolerance variable screening dataset and 21 serum samples from validation dataset were collected.TMT was used to obtain 849 variable screening proteins and 1132 validation proteins.Sig Feature and EFS were combined to screen variables,and 18 differentially expressed proteins of wheat intolerance were obtained.GO enrichment analysis showed that the differentially expressed proteins were involved in biological processes such as platelet degranulation and acute response.KEGG enrichment analysis showed that the differentially expressed proteins were involved in complement and coagulation cascade.ROC curve analysis showed that the AUC value of ITIH2 in variable screening data set and verification set was 0.816 and 0.867,respectively.ConclusionsSig Feature and EFS were selected for variable screening,and the obtained differential proteins had good classification performance and stability,so it is suggested to be applied in variable screening of sample omics data.The occurrence of wheat intolerance is related to the excessive immune response and the maladjustment of complement and coagulation level.In the treatment,by regulating the activity and function of serine protease inhibitors in patients,or improving the level of ITIH2 protein in patients,the complement and coagulation level of the pathway play a normal immune defense function,reduce the occurrence of excessive immune response and inflammation in the body. |