Font Size: a A A

Study Of Active Learning Algorithms On Imbalanced Data Using Extreme Learning Machine

Posted on:2018-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:Q W LiFull Text:PDF
GTID:2348330536477438Subject:Engineering
Abstract/Summary:PDF Full Text Request
In recent years,various fields have accumulated a mount of data with the rapid development of data acquisition and data storage technologies.How to analyze these data has become a major problem in the field of machine learning and data mining.If we label a mount of data,and then establish the classification model on these data,it is no doubt that it will consume lots of human resource,money and time.Active learning is an effective tool to solve this problem.In the past years,researchers have proposed a variety of effective active learning algorithms.But almost all of them ignore one important problem,i.e.,whether these algorithms will still be effective in the class imbalanced data.Therefore,this thesis mainly studies how to maintain the efficiency and performance of active learning in the class imbalanced data.To solve the problems above,the major research work of this thesis is to study how to improve the active learning algorithm in the class imbalanced data,and further to produce better performance.The main research contents include two aspects as follows:1)To solve the problem that the classification hyperplane tends to be biased towards majority class during conducting active learning in the class imbalanced data,further makes active learning lose efficacy,instance sampling technique is considered as balance control strategy of active learning.First,the characteristics of various sampling algorithms are investigated.Then,a novel boundary oversampling algorithm is proposed.They are considered to be used as balance control strategies for active learning.In addition,we try to implement active learning by using Extreme Learning Machine(ELM)as basic classifier according to two reasons as follows:(1)it has strong generalization ability and(2)it has a faster training speed.The experiments were conducted on 12 benchmark data sets,indicating the effectiveness and feasibility of the proposed improved active learning algorithm.Also,the experimental results show that the active learning can be indeed negatively affected by skewed data distribution,as well the active learning algorithms with instance sampling can produce better performance.2)In order to further achieve faster training speed,the online sequential learning method is introduced.In addition,an OS-W-ELM algorithm is proposed,which is an online sequential weighted extreme learning machine algorithm.At the same time,cost-sensitive learning technique is considered as balance control strategy of active learning,and combines with active learning.The experiment also uses Extreme Learning Machine(ELM)as basic classifier,and compares the performance of the AL-OS-W-ELM algorithm,the AL-OS-ELM algorithm and the RS-OS-W-ELM algorithm on the 12 benchmark data sets.Meanwhile,the experiment compares the running time of AL-OS-W-ELM algorithm and the AL-OS-ELM algorithm with the active learning algorithms based on instance sampling.Also,the experimental results show that active learning algorithms with cost-sensitive learning can produce better performance in the class imbalanced data.
Keywords/Search Tags:class imbalance, active learning, extreme learning machine, instance sampling, cost-sensitive learning
PDF Full Text Request
Related items