Font Size: a A A

An Instance Selection Method Based On The Convex Hull And The Nearest Enemy

Posted on:2020-04-12Degree:MasterType:Thesis
Country:ChinaCandidate:S H WangFull Text:PDF
GTID:2518306518461924Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of e-commerce and Internet technology,the scale of data is increasingly large.How to process and analyze data quickly and accurately with the limited storage resources is one of the problems that need to be solved in many big data application field.The instance selection algorithm removes the redundant instances,outliers and noises in the dataset,and retains the typical instances which contribute to the classification process of the classifier.Without affecting the performance of data analysis,it reduces the computing resources and time required.It has become one of the important methods of data processing in the context of big data.This paper proposes an instance selection method for SVM(Support Vector Machine),for the reason that traditional instance selection methods cannot effectively select concave instances,and make the classification decision surface move when the large scale dataset arrive in batches.This paper first introduces the research progress of the instance selection algorithms.Secondly,an instance selection method based on the convex hull and the nearest enemy is proposed.This algorithm dividing the original dataset into several subsets by determining each sample's nearest enemy,and puting the instances into same set which have the same nearest enemy.Then,the instance selection method is executed in each subset in parallel.Based on this,an instance selection method is designed to select instances close to the classification decision surface.In addition,an instance selection method based on the nearest enemy,which is suitable for data arrive in batches is proposed.This method considers both the instances close to the classification decision surface and the instances far away from the classification decision surface.It can determine and retain the complete boundary instances of the dataset.When the scale of the dataset is so large that need be imported in batches,it can determine the classification decision surface better than the other instance selection methods.Experimental results show that the proposed algorithm can obtain higher classification accuracy with a smaller sample size,and has better sample selection performance.
Keywords/Search Tags:instance selection, the nearest enemy, convex hull, SVM, classification
PDF Full Text Request
Related items