Font Size: a A A

Dynamic Granular Support Vector Machine Model For Classification

Posted on:2021-01-05Degree:MasterType:Thesis
Country:ChinaCandidate:S Q ZhaoFull Text:PDF
GTID:2428330626955568Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age,all kinds of data have shown explosive growth.The data form has also become complex and diverse,which brings great challenges to data mining with machine learning as its core technology.The massiveness and imbalance of data are two important characters.Large-scale data have the characteristics of large volume or many categories.Traditional machine learning methods are time-consuming and inefficient in processing large-scale data,especially when computing resources are limited.Imbalance data mainly manifests as a significant imbalance in the number of samples.Some traditional models can not effectively distinguish minority samples,let alone efficiently identify them.For the large-scale and imbalance data,many scholars have proposed some related solutions,but there are still some limitations.Firstly,the time consumption of large-scale data classification is still large.Secondly,the comprehensive performance of imbalance data classification may be poor,that is,the classification accuracy of minority samples is low,which will also affect the classification accuracy of majority samples.Therefore,it is of great value to study the modeling method for large-scale and imbalance data.Based on support vector machine(SVM)model and the dynamic granular dividing method,this thesis will study the modeling approach for large-scale and imbalance data.The main work includes the following.(1)Propose a granular support vector machine with bidirectional control of division-fusion.The model first divides the data set into several granular information sets,and then determines the importance of the information granules based on the distance between the information granules and the classification hyperplane by SVM.The information granules in the area closer to the hyperplane may have important impact on classification,and they are defined as strong information granules.Those information granules in the areas farther from the hyperplane have less impact on classification,and they are defined as weak information granules.Based on this,combined with dynamic information processing technology,deep or fine-grained dividing is performed for strong information granules,and weak information granules are fused selectively.The training samples are always maintained at a small scale.This method can significantly improve the learning efficiency of support vector machine and ensure the generalization ability of the model at the same time.(2)Present an imbalance granular support vector machine method combining SMOTE(Synthetic Minority Over-sampling Technique)oversampling.This method analyzes the distribution characteristic of the majority samples and the minority samples,and applies the dynamic granular support vector machine model and the SMOTE sampling method to different kind of samples.Considering the G-means index can comprehensively evaluate the result of imbalance classification and has a high reference value,it is adopted as a measure to select the over-sampling or under-sampling processes dynamically.By constantly adjusting the classification hyperplane,the G-means value is optimized.These two processes are iteratively carried out.The imbalance classification model with strong generalization ability will be finally obtained.In this thesis,dynamic granular support vector machine models are proposed to solve the problems of long time consumption of large-scale data and poor performance of imbalance data classification by combining with the dynamic granular dividing method.They can improve the classification efficiency of large-scale data and overall performance of imbalance data classification.The obtained research results could enrich the research of granular support vector machine algorithm and have certain practical application value.
Keywords/Search Tags:Granular Support Vector Machine, Large-scale Data, Dynamic Granular Dividing, Imbalance Data, Mixed Sampling
PDF Full Text Request
Related items