In the exploration and development of oil and gas reservoirs,the lithology identification technology based on conventional logging data is helpful to understand the geological characteristics and is of great significance to the prediction of reservoir oil and gas.However,in the actual logging data,the distribution of different types of lithology data is not balanced,which leads to the low identification accuracy of traditional lithology identification methods,and it is difficult to be used for the actual lithology identification.In order to solve the classification problem of unbalanced data,this paper proposes a Bagging ensemble classification algorithm UWBagging based on undersampling.First,the original data set is formed into multiple training subsets by Bootstrap.Secondly,the distribution structure is determined according to the density peak of the majority class,and the training subset is balanced by combining the neighborhood features.Then,the base classifier is trained with the balanced data set,and the weight of the base classifier is constructed by using the out-of-bag data.Finally,the final model is generated by weighted voting.The effectiveness of UWBagging algorithm is verified by experiments.The resampling technique can effectively balance the data set,but there is still a problem of low minority class recognition.Therefore,this paper proposes a Boosting ensemble classification algorithm HCBoost based on mixed sampling.First,the minority class samples are clustered,and the minority class samples are synthesized according to the cluster density and sample weight.Secondly,combined with the sample weight,the random undersampling of majority class is carried out to obtain the balanced data set.Then,use the balanced data set to train the base classifier C4.5 and update the error rate of the sample according to the cost function to increase the attention to the minority samples.Finally,the final model is generated iteratively.The effectiveness of HCBoost algorithm is verified by experiments.In this paper,the lithology sample set is constructed by logging data and logging lithology data,and two lithology identification models are established by UWBagging algorithm and HCBoost algorithm.Experimental results show that both models can effectively improve the accuracy of lithology identification. |