Total organic carbon(TOC)content is a key parameter for screening potential source rocks and sweet spots of shale oil and gas.The traditional methods of measuring and predicting TOC content in shale,such as carbon and sulfur analysis experiment,have some problems,such as high prediction cost,long time consuming and discontinuous results,while the empirical mathematical model prediction rule has some problems,such as general prediction accuracy,poor generalization and weak applicability.How to predict TOC content with low cost,high efficiency and high precision has become an important task in shale oil and gas exploration and development.As a key means of big data analysis and mining,machine learning has been applied in many fields,such as biology,chemical industry,medicine,transportation,finance,industrial manufacturing and so on,and has achieved good application effects.However,in the field of oil and gas exploration and development,the application of machine learning is still very short,and the application effect is not clear.In this paper,multiple types of logging data and known TOC content data are taken as the breakthrough point,Random forest(RF),support vector regression(SVR)and XGBoost machine learning algorithm are used to establish TOC content prediction model,to realize the continuity and high precision prediction of TOC content in shale,and the prediction performance is compared systematically.First,a decision tree algorithm is used to determine the optimal set of logging parameters from a total of 15 commonly used logging features.Three machine learning algorithms,including Random Forest(RF),support vector regression(SVR)and XGBoost,were used for hyperparameter optimization,and then trained and tested to build a predictive TOC mode A total of 816 data points of well logs and TOC content from five different shale formations were then used to train and test the three models.Finally,these three models are used to predict TOC content data in Shahejie shale that models have not seen before.The results showed that RF had the best prediction effect on TOC content,R~2=0.9141,RMSE=0.329,MAE=0.252,followed by XGBoost,and SVR had the lowest prediction accuracy.Nevertheless,these three models outperform the traditional Schmoker gamma ray logarithm method,multiple linear regression method andĪlgR method,which verify the reliability of the above three machine learning methods. |