| With universal attention to healthy living,more and more people through the fitness to promote health,but unscientific exercise may cause irreversible damage and even sudden death.However,exercise-induced sudden death is more common in people in their 20s,who are mostly college students.For this group,exercise monitoring is helpful to scientific exercise.Among many physiological indicators,there are five indicators to evaluate and monitor exercise,namely oxygen uptake,energy consumption,metabolic substrate,maximum oxygen uptake and anaerobic threshold(gas exchange threshold).But these five indicators are not easy to obtain in daily life.Therefore,this study hopes to establish the prediction model of five evaluation indicators for college students,so that these indicators can be more convenient and widely used.This study recruited 266 healthy college students(including male,female and male athletes)to participate in graded exercise test,and collected cardiopulmonary data and basic physical data during the test.This paper analyzes the data structure and finds that the data in this study belong to longitudinal data that is not aligned in time,which is called quasilongitudinal data in this paper.Because of the advantages of longitudinal data and longitudinal study,this paper will analyze and model from the perspective of longitudinal data.Specific research contents and innovations include:1.Explore whether the prediction model needs to consider intraindividual and inter-individual differences.In the longitudinal study,two questions are mainly concerned:whether there are intra-individual and inter-individual differences.Therefore,before modeling,k-means was used to cluster the samples,and then the situation of clustering training and non-clustering training was compared.It was found that individual differences could not be considered.Comparing training on each piece of data with training on each person,intra-individual differences were found.2.Machine learning method is used to model 5 indicators,and according to whether the evaluation indicators change over time,they are divided into dynamic data prediction model and static data prediction model.In the dynamic data prediction model(oxygen uptake,energy consumption,metabolic substrates:carbohydrate,lipid and protein consumption),except for the special lipid consumption in metabolic substrates,other indicators are very similar to oxygen uptake in the change trend.Specifically,this paper used multiple linear regression and selected oxygen uptake and carbon dioxide production as features to fit lipid consumption.The measurement values containing a large number of 0 values were replaced by fitting data containing negative values,and the fitting data were used as labels for subsequent modeling.In the modeling of dynamic indicators,multiple linear regression,random forest regression,decision tree regression,support vector regression and multilayer perceptron regression are tried,and it is found that random forest regression has the best effect(R2>0.795).In the modeling of static indicators(maximal oxygen uptake and gas exchange threshold),since the sample size was only 180(maximal oxygen uptake:179),5-fold cross-validation was used for model selection and parameter adjustment.Multiple linear regression,lasso,decision tree regression,gradient boosting decision tree regression,support vector regression and multilayer perceptron regression are tried in model selection.In the modeling of maximum oxygen uptake(V’O2max),the lasso effect is best up to 0.63(R2).In the modeling of gas exchange threshold(GET:V’O2 corresponding to GET),support vector regression with linear kernel is the best(R2=0.68).There was no significant difference between predicted and measured values by paired t-test(V’O2max:p=0.12;GET:p=0.76).3.An empirical model,heart rate deflection point model,was used for the anaerobic threshold model.This paper presents a robust and economical method of heart rate deflection point using only heart rate data.On the basis of piecewise fitting to find the intersection point,the iterative method of best fitting broken line is used to find the best turning point.Finally,8 samples(8/200,4%)were outside the consistency interval,and the error range of the three groups was±10bpm,about 30%,among which the proportion of ordinary men was the highest(34/86,39.5%). |