| The imbalance of sample categories makes it difficult for the model to distinguish the overall distribution of the minority class,resulting in a lower recognition rate of the model.Multi-class imbalanced data is more common than binary imbalanced data,and its simple binary decomposition may lead to problems such as information loss and class overlap.Seismic facies lithology recognition is a key technology in geology and is widely used in fields such as oil and gas exploration and reservoir prediction.Due to the complexity of strata movement,the lithology has the problem of category imbalance,which reduces the accuracy of seismic facies lithology recognition.Aiming at the above problems,this thesis proposes a label-balancing method based on sample distribution(LBMSD)and a multi-class oversampling algorithm based on LBMSD(Multi_LBMSD).The LBMSD processes the data itself,and achieves data balance by oversampling the minority samples.First,the nearest neighbor method and single-class support vector machine are used to clean the data,then the density clustering algorithm is used to divide the minority class samples into regions,and the number of samples to be synthesized for the class cluster is adaptively generated by calculating the density and the number of boundary samples for each class cluster.Finally,the seed samples are selected for synthesis through the distribution probability of boundary and non-boundary samples in the class cluster.Through experiments in the UCI dataset,the algorithm improves the recognition accuracy of the classification model to a certain extent,and it is verified that the algorithm is more effective than other oversampling algorithms.The Multi_LBMSD algorithm is based on the LBMSD algorithm,and achieves multi-class data balance through class conversion.The Multi_LBMSD algorithm first iteratively selects samples from each category as the minority class,and selects samples from different categories as the majority class based on the weighted feature information of the remaining categories,and converts the multi-classification problem into a binary classification problem.Then,in each binary sub-problem,the LBMSD algorithm is used to oversample the minority class samples to achieve sample balance.In the training phase there is put-back sampling of samples from each class to form a balanced training subset,and multiple classifiers are trained,and finally the final class is derived by weighted voting using the idea of integrated learning.Through experiments on the UCI dataset and the KEEL dataset,the algorithm improves the performance of the classification model to a certain extent,and it is verified that the algorithm is more effective than other algorithms.The above two algorithms are used to equalize the seismic facies lithology data set.By comparing with various algorithms,it is verified that the algorithm has excellent performance in dealing with unbalanced seismic facies lithology data,and has been successfully applied in the field of seismic facies lithology recognition. |