Font Size: a A A

Research On Large-Scale Instance Selection Algorithms Based On Multi-Objective Evolutionary Optimization

Posted on:2022-08-13Degree:MasterType:Thesis
Country:ChinaCandidate:F X ChuFull Text:PDF
GTID:2518306542963709Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of the information technology,the scale of data in various fields is expanding at an amazing speed,which brings more training information to machine learning,data mining and other tasks.But it also increases the difficulty of processing these data.In this case,instance selection(IS),as an important data pre-processing task,can effectively delete some redundant and noisy instances and obtain excellent subsets from the training set,is widely used in machine learning,data mining and other tasks.Because of the importance of instance selection,in the last decades,many different instance selection algorithms have been proposed.Recently,evolutionary optimization based instance selection algorithms,due to their good global search ability and no assumption of objective function,have become the hot spot in instance selection.However,when the size of the instance data is large,due to the search space and computing time,the existing evolutionary algorithms can not obtain good results.In this case,a length reduction based multi-objective evolutionary algorithm termed LR-MOEA is firstly proposed for large-scale instance selection,where a length reduction strategy is suggested to recursively shorten the length of each individual in the population.Specifically,each gene(each gene represents the corresponding instance)in each individual has a probability of being deleted,where the probability is obtained based on the importance of the corresponding instance in the instance sets and the importance of the corresponding gene in the population.Then,a tailored crossover and mutation operator is developed to generate offspring population from the reduced population.In addition,an individual repairing operator is designed to repair the length of over-reduced individuals.Experiments on 12 commonly large-scale classification datasets show that compared with the existing evolutionary optimization based instance selection algorithms,LR-MOEA can obtain instance subsets with high accuracy and reduction rate in less computational time.Then,to solve the instance selection problem in larger data sets,a clustering encoding based multi-objective evolutionary algorithm named CE-MOEA is suggested.Different from the idea of length reduction in LR-MOEA,in CE-MOEA,a clustering encoding strategy first clusters the training set,then encodes and searches each cluster as a whole,which will greatly reduce the search space.In the search process,for each cluster,instead of using all instance in the cluster,only the instance closest to the cluster center is selected as the representative to participate in the individual evaluation,which will effectively reduce the evaluation computational time.Then,in order to further optimize the performance,a clustering importance based local search operator is suggested to search the other instances in each selected cluster.And considering unselected clusters,a reduction preserving evolution operator is used to search all instances as a whole.Experiments on 12 commonly used large-scale classification datasets show that CE-MOEA can achieve better results in less computing time than LR-MOEA and other evolutionary optimization based algorithms.
Keywords/Search Tags:Instance selection, Evolutionary algorithm, Multi-objective optimization, Large-scale data, Classification learning
PDF Full Text Request
Related items