Font Size: a A A

Research On GPU-Accelerated Multi-Class Imbalance Classification

Posted on:2021-03-06Degree:MasterType:Thesis
Country:ChinaCandidate:S X XuFull Text:PDF
GTID:2428330605954313Subject:Engineering
Abstract/Summary:PDF Full Text Request
Imbalanced learning is a hot topic of machine learning.Imbalance data classification plays an important role in many applications,such as disease prediction,credit card fraud,network attacks and so on.Accurate predication of minority classes is often more meaningful than the majority ones.However,the scale of data is increasing fastly in various scenarios,at the same time,in order to improve the accuracy of imbalanced data classification,imbalance classification algorithms are becoming more and more complicated and timedemanding,especially for ensemble based algorithms.Due to the above reasons,the classification time for imbalanced data increases significantly in many cases,sometimes they need several days to accomplish a classification task.Therefore,there is strong need to improve the efficiency of existing imbalance learning algorithms,in which GPU parallel computing is an effective and common approach.As a professional master's thesis,this work aims to use GPU parallel computing to speed up the efficiency of multi-class imbalance classification algorithms,while still ensuring the accuracy of classification results.Experimentals show that,through the way of this paper,the processing time can be reduced to only 9 hours for large-scale data sets,which may need more than 5 days.The main contributions of this thesis are as follows:First,we implement 5 multi-class imbalance classification algorithms,which are published recent years,based on GPU acceleration.We also propose a new algorithm named Focal Boost.We find that,most of the time consumption of these multi-class imbalanced classification algorithm comes from the processing time of base classifiers.Hence,we need to accelerate the base classifiers first.This paper parallelizes KNN and Logistic Regression by GPU,optimizes KNN distance calculation and sorting method,matrix multiplication and parameter update in the process of logistic regression iteration.The implemented multi-class imbalance classification algorithms include DECOC,DOVO,Ada Boost.M1,SAMME and im ECOC.Focalboost applies focal loss in boost algorithm to classify the multi-class imbalanced data.Second,in this paper,a series of experiments are conducted to verify the classification accuracy of the multi-class imbalanced classification algorithm based on GPU acceleration,which proves that the method is efficient in processing data.This stage uses 18 classic multi-class imbalanced datasets to show the classification performance of multi-class imbalanced classification algorithms based on GPU acceleration.Meanwhile,we record the time consumption of data processing in the algorithm,and finally results show that,the multi-class imbalance classification algorithm based on GPU acceleration can significantly improve the efficiency of data processing and ensure the classification accuracy of the algorithm meanwhile.In addition,we analyse the influence of sample size,dimension and category number of datasets on the performance acceleration of the algorithm through experiments.The results show that with the larger sample size,dimension and appropriate number of sample categories,the acceleration performance is more obvious.Third,we opensource the GPU-im Learn,a software containing all the classification algorithms implemented in this paper,which contains 6 multi-class imbalance classification algorithms based on GPU acceleration.We package the algorithms into a third-party library file using Python and upload all the resources to Git Hub platform.Meanwhile,we provide a simple installation method for relevant researchers to use.
Keywords/Search Tags:Classification algorithm, Multi-class imbalanced data, GPU acceleration, Open-source software, CUDA
PDF Full Text Request
Related items