Font Size: a A A

Research On Imbalanced Dataset Classification Based On Ensemble Learning

Posted on:2021-02-11Degree:MasterType:Thesis
Country:ChinaCandidate:M L WangFull Text:PDF
GTID:2428330614461448Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and the significant increase of the storage capacity of physical media,a large amount of data have been generated from the industry production and life of human society.How to remove redundant information from big data and to summarize effective information drives the development of data mining,a field of machine learning research.Classification is an important research field of machine learning.Traditional classification algorithms have excellent classification performance for balanced datasets,these algorithms' original purposes aim at classification on balanced datasets.However,the imbalance of categories widely exists in practical application fields such as medical case diagnosis,spam filtering,and social network abnormal user detection.Imbalanced data is composed of a majority class with a large number and a minority class with a small number.When traditional classification algorithms deal with the problem of imbalanced data classification,it is often difficult to guarantee the classification effectiveness of a few samples.Solving the classification problem of imbalanced data sets has important research significance and application value.The paper first proposed an edge-based clustering algorithm for network structure data,that is,collective behavior inference learning algorithm.There are power-law distribution characteristics of social networks,that is,most users in the network are sparsely connected,and a small number of users are closely connected.The imbalance of the network structure means that the clustering algorithm can be applied to the classification task of imbalanced data sets.The performance of the clustering algorithm depends on the initial input graph structure,and the network usually contains noise and has redundancy.The paper proposed a new algorithm to learn a new graph from the initial input graph.The low redundancy of the graph allows the clustering algorithm to perform clustering tasks more efficiently.Combining ensemble learning strategies,this paper designs an imbalanced data sets classification algorithm based on ensemble learning.This algorithm uses collective behavior inference learning algorithm and deep belief network as individual learners for ensemble learning,and uses Boosting algorithm to perform individual learning.For training,the majority voting method is used to combine the prediction results of individual learners to complete the classification task for imbalanced data sets.From the perspective of classification algorithm,clustering algorithm,deep learning,and ensemble learning,the paper proposes algorithms and strategies that can help improve the classification performance of imbalanced data sets.
Keywords/Search Tags:classification, clustering, deep belief network, ensemble learning, imbalanced data
PDF Full Text Request
Related items