Font Size: a A A

Ensemble Learning Based On Cost-sensitive On Cloud Computing Platform

Posted on:2014-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:L W ZhangFull Text:PDF
GTID:2248330395983799Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous expansion of data availability in many large-scale, complex, and networkedsystems, it becomes critical to advance the fundamental understanding of knowledge discovery andanalysis from raw data to support decision-making processes. Although existing knowledgediscovery and data engineering techniques have shown great success in many real-worldapplications, the problem of learning from imbalanced data (the imbalanced learning problem) is arelatively new challenge that has attracted growing attention from both academia and industry.PAC learning is an appropriate model to study the bounds for classification performance. Thetraditional PAC learning model answers the question that how many examples would be sufficientto guarantee a low total error rate. However, when class is imbalanced, we usually neglect the rareclass which is usually more important in real-world applications. In this paper, we use the cost-errorto evaluate the classification performance on imbalanced data, and propose a new PAC learningmodel based on cost-sensitive idea.With respect to the classification of large-scale imbalanced data, a distributed cost-sensitiveensemble learning algorithm based on cloud computing platform is proposed. Large scale data isdivided on Hadoop cloud computing platform and is learned in parallel. Based on the idea ofcost-sensitive, a weighted ensemble classifier is achieved, and a distributed cost-sensitive ensemblelearning model based on cloud computing platform is developed. Experiments show that the recallrate of the minority class is improved significantly and the computational time is shortened by theensemble learning on cloud computing platform due to the Hadoop parallel mechanism. Then theclassification efficiency of the large-scale imbalanced problem is largely improved.
Keywords/Search Tags:imbalanced pattern classification, PAC learning, cost-sensitive learning, ensemblelearning, cloud computing platform
PDF Full Text Request
Related items