Font Size: a A A

Research On Natural Neighbor-based Non-parameter Instance Reduction

Posted on:2018-09-30Degree:DoctorType:Dissertation
Country:ChinaCandidate:L J YangFull Text:PDF
GTID:1318330536969308Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Instance reduction is a crucial task in instance-based learning algorithms.It can reduce prohibitive computational costs and the storage requirements for classifying patterns while maintaining classification accuracy.Instance based learning algorithms use the whole set of prototypes from the training set to construct inference structures,if the chosen dataset contains too many instances(i.e.,data prototypes),it requires a large memory space since the whole training sets has to be stored which may be an excessive amount of storage for large dataset and leads to a large computation time in the classification stage.The k-nearest neighbor classifier is a well-known example of instance-based learning algorithms.It is widely used in machine learning,data mining and pattern recognition because of its intuitiveness,simplicity and effectiveness.And it has become one of the top ten algorithms in the field of data mining.Instance reduction contains three main methods: edited method,condensation method and hybrid method.Edited method improves the classification accuracy of the algorithm by remove the harmful data(noises).The basic idea of condensation method is that the patterns located in decision boundary are crucial to the classification,but those far from the boundary have little effect.So the Condensing Nearest Neighbor algorithm attempts to reduce the number of training set by using the nearest neighbor decision.It removes any well-classified patterns using the nearest neighbor rule and obtains a consistent subset that does not affect the classification accuracy of the whole training set.Usually,edited method has been used in various condensation methods as a noise pre-processing filter to eliminate the noisy patterns.Although current instance reduction algorithm has achieved a lot of results,it still suffer three main problems: parameter dependency,low noise tolerance and reduction rate.In order to overcome these problems,natural neighbor is introduced to solve the instance reduction.Natural neighbor is a new neighbor form and a scale-free concept of neighbor.Natural neighbor of each points are automatically obtained by the natural neighbor search algorithm.Natural neighbor is originally proposed to solve the parameter selection problem of nearest neighbor,but it also suitable for dealing with the instance reduction problem of nearest neighbor.The main innovations and contributions of this paper include the following aspects:(1)An adaptive edited natural neighbor algorithm,namely ENaN,is proposed.The characteristics scale-free and the number of natural neighbor is different are used to solve parameter dependency and low noise tolerance of instance-based learning.ENN is the classic edited-based instance reduction algorithm,it remove noises that are not correctly classified by their nearest neighbors,in order to improve the classification accuracy.However,ENN suffers from weaknesses such as high time complexity,parameter dependency and low noise tolerance.To solve these shortcomings,an adaptive edited natural neighbor algorithm is proposed,it does not require any parameter.it can degrade the interfere of noises to normal patterns,thus keeping better class boundaries.Besides,it can remove the global outliers.The above characteristics make the proposed algorithm can be easily applied into other reduction algorithms as a noisy filter.(2)A hybrid instance reduction algorithm merging natural neighbor,namely IRNN,is proposed.It uses the density information implied in natural neighbor to search the core points and uses constraint nearest neighbor chain to search border points.The basic idea of condensation method is that the patterns located in decision boundary are crucial to the classification,but those far from the boundary have little effect.Therefore,it removes any well classified instance using the nearest neighbor rule and obtains a consistent subset that does not affect the prediction accuracy of the whole training set.However,experiments have shown that reserving suitable core internal instances can improve accuracy greatly,especially for the data sets with complex distribution.The density information implied in natural neighbor is used to select core points.Besides,constraint nearest neighbor chain is used to search border points.Finally,we obtain a reduced set consisting of border points and a few core points.It solves the parameter selection problem because the proposed algorithm does not need any parameters.Moreover,it significantly improves the reduction rate,while maintaining or even improving classification accuracy.More importantly,its fluctuation of reduction rate is small for different data sets.(3)A non-parameter natural neighborhood graph-based instance reduction algorithm,namely NNGIR,is proposed.Graph is a common representation method for representing relations between points of a given data set and is widely applied into clustering and manifold learning.The labeled graph contains the decision information if if adding class label for each node.Natural neighborhood graph is an extended nearest neighbor graph,which are adaptively obtained by the search algorithm of natural neighbor.Labeled natural neighborhood graph is applied to instance reduction,and two different types of edge are defined to reduce training set.The edge is defined as homogeneous edge(hoe)if the label of vertices from same class;On the contrary,the edge is defined as heterogeneous edge(hee).Hoe and hee can determine the position of data points relative to the decision surface,so that border and core sets are obtained.The reduced set consists of border points and core points.The efficiency is supported by the positive results of the experiments conducted both on synthetic and real datasets.
Keywords/Search Tags:K-nearest neighbor, Natural neighbor, Natural neighborhood graph, Instance reduction, Instance-based learning
PDF Full Text Request
Related items