| Digestive tract cancer is the second most common type of cancer in the world.In addition to that,this cancer is typically diagnosed at an advanced stage,which limits opportunity of treatment,and leads to a poor prognosis.Therefore,early discovery,early diagnosis and early treatment for the patients with digestive tract cancer is very important.The combined measurement of multiple tumor markers is very valuable for auxiliary screening of people with digestive tract cancer,due to its cost-efficiency,simplicity,and lack of contraindications.The reference values of digestive tract tumor markers such as serum carbohydrate antigen 199(CA199),carbohydrate antigen 125(CA125)and tumor supplied group of factory(TSGF)in healthy Chinese people have significant regional differences.However,little work has been done to investigate the relationship between various geographical factors and the reference values of serum CA199,CA125 and TSGF using quantitative analysis and qualitative simulation methods.According to the classical medical theory-"the unique features of a local environment always give special characteristics to its inhabitants",so this study is aiming at exploring the relationship between geographical environment factors and reference values of serum CA199,CA125,TSGF of healthy people,and deriving the spatial distribution pattern of three reference values of healthy people.The results of this study can provide scientific evidence and standards for the development of regional medical reference values.A total of 243132 reference values of serum CA199,CA125 and TSGF of healthy Chinese people were collected from 256 cities and 549 units in 31 provinces by combining the database search with the field survey data.The geographical factors selected in this study include spatial,topographic,climate and soil indicators,contained 16 sub-indexes in 2317 counties and cities in China.A comprehensive database was established integrating both attributes and spatial distribution characteristics for the serum CA199,CA125,TSGF and geographical environmental factors.The normal distribution and intergroup difference of the original data of serum reference values of CA199,CA125 and TSGF was tested,and the results were used for determining the research contents of six groups of three medical reference values.Six groups of original reference value data were analyzed by trend analysis and sample distribution presentation.After that,the spatial autocorrelation,the geographical detector analysis,and other methods were used to explore the relationship between reference values of serum CA199,CA125,TSGF and geographical factors.Afterwards,the consistency of the preliminary results of qualitative analysis and quantitative simulation was verified by ArcGIS spatial correlation,and the trend of spatial distribution was explored.Finally,the spatial distribution and hotspot distribution maps of the medical predicted reference values were generated by spatial interpolation of geostatistical analysis.The maps were used to verify the consistency of qualitative analysis and quantitative simulation results,and the spatial distribution of these six sets values was summarized.The main research results were as follows:(1)The latitude variation trend of six groups of sample data(healthy adults’ serum CA199,male and female adults’ serum CA125,the young,middle-aged and elderly individuals’ serum TSGF reference values)was more significant than the longitude variation trend,and the latitude variation trend of serum TSGF reference values in healthy young,middle-aged and elderly individuales was more obvious.The serum TSGF reference values in healthy young,middle-aged and elderly individuals were high in the north and low in the south.The healthy adults’ serum CA199 reference values,male and female adults’ serum CA125 reference values were high in the west and low in the east.And the healthy female’s serum CA125 reference values were higher than that in male.The reference values of serum TSGF gradually increased with age.(2)Six groups of reference values showed spatial aggregation distribution patterns.It was found that there were significant positive correlations between latitude and six groups of sample data(healthy adults’ serum CA199,male and female adults’ serum CA125,the young,middle-aged and elderly individuals serum TSGF reference values);and significant negative correlations between annual mean relative humidity,annual precipitation and six groups of reference values.There was significant negative correlation between longitude and serum CA125 reference values in healthy adults.There was significant positive correlation between altitude and serum CA199 reference values in adults and serum TSGF reference values in young adults.The annual mean temperature was negatively correlated with the reference values of healthy adults’ serum CA199,the the young,middle-aged and elderly individuals’serum TSGF.The serum CA125 reference values of healthy male and female were positively correlated with both topsoil calcium carbonate content and topsoil salinity.The serum TSGF reference value of healthy youth was positively correlated with topsoil sand percentage,and negatively correlated with topsoil silt percentage.Reference values of serum CA125 in female adults were optimized using the genetic algorithm optimizes support vector machine(GA-SVR)model,while the reference values of the rest five groups were optimized using the combined models.The Moran index I of the predicted reference values of adults’ serum CA199 group was 0.5754,that of the healthy male and female adults’ serum CA125 groups was 0.5990 and 0.7419,respectively,and that of the young,middle-aged and elderly individuals’ serum TSGF groups was 0.7115,0.7590 and 0.7443,respectively.These results indicate a strong spatial positive correlation.The results of scatter plots between six predicted values and geographical factors were similar to those of correlation analysis,and the predicted values were consistent with the distribution trend of sampled reference values(3)The predictive reference values of the six groups were higher in north than in south,and the predictive reference values of adults’ serum CA199,male and female serum CA125 were also higher in the west than in the east.The spatial distribution of the six predicted reference values has matched well with the sampled reference values.However,few high values of serum TSGF sampled reference values in elderly individuals was identified in Hainan,Yunnan and other provinces in the southern region,but it was not shown in the spatial distribution of predicted reference values,indicating that the special differences reflected in the reference values cannot be completely simulated only by the determination of geographical factors.The distribution of predicted reference values of serum CA199 in healthy adults and serum CA125 in healthy male,female adults was extremely consistent with the distribution trend of annual precipitation,humidity and temperature.The distribution of predicted reference values of serum TSGF in healthy young,middle-aged and elderly individuals was consistent with the distribution trend of annual precipitation,humidity and temperature.The distribution of coldspots is in line with the predicted distributions of low annual precipitation,humidity and temperature values,while the the hotspots of the six sets of medical reference values in line with the predicted distributions of middle to high annual precipitation,humidity and temperature values.The not significant points distributed in the form of bands or sheets between the high and low values,which was consistent with the spatial distribution.(4)The predicted reference values of six groups were all within the range of sampled reference values.The mean predictive values of four groups(healthy adults’serum CA199,male adults’serum CA125,the middle-aged and elderly serum TSGF)were close to those of the sampled values,while the mean predictive value of CA125 in healthy female and TSGF in healthy young people was slightly higher than that of the sample values.The predictive data was highly consistent with the sampled data in terms of gender and age.The mean value of interpolation prediction in the south was smaller than that in the whole country,and the other regions were opposite.Based on the upper limit that was the maximum value of the interpolation prediction value,the ranges of reference values for six groups of interpolations were the smallest in the south of China.And the range of reference value of serum CA199 in healthy adults in Qinghai-Tibet region was the largest,which is ≤30.81 kU/L.In the other five groups,the range of predictive reference values was the largest in northwest of China,which was healthy male serum CA125≤25.98 kU/L,healthy female serum CA125≤26.46 kU/L,healthy the young serum TSGF≤57.61 U/mL,healthy the middle-aged serum TSGF≤58.14 U/mL,healthy the elderly serum TSGF≤61.83 U/mL.In summary,this study invesstigated the relationship between the reference values of serum CA199,CA125,TSGF in healthy people and geographical environment factors and their spatial distribution in China.The results can provide guidance and basis of regional reference values for clinical,preventive medicine,public health and other fields.In terms of the application value and replicability of the research results,we expect to obtain massive data and cooperate with more data sources to promote the development and application of scientific research results. |