Font Size: a A A

The Imbalanced Data Classification Algorithm Based On Integrated Learning And Its Application In Product Quality Discrimination

Posted on:2021-08-07Degree:MasterType:Thesis
Country:ChinaCandidate:C H ZhouFull Text:PDF
GTID:2518306308971629Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
Nowadays,the manufacturing companies have gained great development by connecting with the new technologies about Internet and big data,which can be seen as the typical feature of the modern manufacturing.During this process,to improve the product quality is very necessary when the manufacturing companies try to escalate their competitiveness.And the machine learning technology can be adopted to analyze the industrial big data for the purpose of improving the production efficiency and product quality.However,the industrial big data is typical imbalanced.Taking product quality data as an example,the number of defective products is extremely low,but it will cause more loss with misclassification.Therefore,it is of great importance to improve the classification accuracy of the positive samples by classification algorithm,so as to reduce the loss caused by misclassification.Although the standard classifier algorithm in machine learning can deeply explore the relationship between data features,they aim to gain the overall optimization.Therefore,the standard classifier algorithm shows great limitations in the imbalanced data classification.About this question,the integrated learning is an important method from the algorithm level.The integrated learning will obtain the better effect in classification by combining many different classifiers.Based on this,the integrated learning will be used in this research for the imbalanced classification.This paper first analyzes the problem of product quality discrimination from the perspective of machine learning,and analyzes its background and significance.The domestic and international research status of unbalanced data classification is sorted out;then the principle and influence of data imbalance are analyzed and discussed in depth,and the existing common classification methods for imbalanced data are studied,and the integration is based on Boosting.The principles and advantages and disadvantages of learning methods and cost-sensitive learning are analyzed.This paper combines sample weight update function modification and performance measurement selection to conduct the research.Based on the quality data of German Bosch home appliance products,the AdaBoost framework is used to study the influence of different processing methods on classification effect.Firstly,exploratory analysis of target variables is carried out,and the missing values are analyzed and processed from the two dimensions of sample and feature: then the data cleaning uniform format is conducted,different coding modes are adopted for different types of features;the numerical feature is transformed as the discretized type and the category features are simultaneously performed.Characteristic engineering such as vectorization.Finally,in view of the data imbalance in product quality discrimination,this paper migrates the idea of cost-sensitive learning to the AdaBoost integrated learning framework to optimize the classification effect,and adjusts the weights of different types of samples.And a model based on sample weight update function correction is then proposed.Contrast experiments were carried out with AUC and missed detection rate as evaluation criteria.AdaBoost integrated learning and CS-AdaBoost integrated learning model were constructed by using three different basic classifiers through ten-fold cross-validation,and comparative analysis was carried out.Compared with single classifier and AdaBoost integrated learning,the introduction of cost-sensitive CS-AdaBoost model has obvious advantages in product quality discrimination accuracy and stability,which shows the good applicability in product quality discrimination.
Keywords/Search Tags:imbalanced data classification, integrated learning algorithm, AdaBoost, cost sensitive, quality discrimination
PDF Full Text Request
Related items