Font Size: a A A

Natural Neighbor:The Concepts And Applications In Data Mining

Posted on:2017-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:J FengFull Text:PDF
GTID:1318330536950894Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Over the last decade,nearest neighbor method has received considerable attention from the community of data mining,image process and pattern recognition.Traditionally,the definition of neighborhood plays an important role,a reasonable definition of neighborhood can effectually improve the performance.In spite of their simplicity,the k-nearest neighbors(KNN)and reverse k-nearest neighbors(RkNN)are demonstrated themselves to be the most useful and effective algorithms.And then a fundamental task that arises in KNN and RkNN is to determinate the optimum value of the parameter k.The best choice of k depends upon the data.Generally,larger values of k reduce the effect of noises on the classification,but make boundaries between classes less distinct.Especially,if the neighborhood is too large with respect to folds in manifold on which the data points lie,large values of k may cause the “short-circuit” errors.Alternatively,small values of k reduce the correlation of neighborhood,or separate the data points from the same class.In fact,parameter selection problem becomes the most important part of nearest neighbor algorithm.In order to solve this problem,a new term called Natural Neighbor(NaN)is presented in this paper.First of all,the paper presents the basic concept of NaN method.NaN method is inspired by the friendship of human society and could be regarded as belonging to the category of “scale free” nearest neighbor method.Natural Neighbor method has three advantages:1)Natural Neighbor method can create an applicable neighborhood graph based on the local characteristics of various data sets.This neighborhood graph can identify the basic clusters in the data set,especially manifold clusters and noises.2)This method can provide a numeric result named Natural Neighbor Eigenvalue(NaNE)to replace the parameter k in traditional KNN method,and the number of NaNE is dynamically chosen for different data sets.3)The natural neighbor number of each point is flexible,and this value is a dynamic number which reflects the relationship between data and data set.After that,this paper presents the solution of parameter selection problem based on Natural Neighbor Eigenvalue.NaNE can be used to help the traditional KNN method avoiding the problem of parameter selection and strongly influence the quality of the result.Thus a new algorithm is proposed to estimate the optimal NaNE effectively and exactly,and then NaNE is used to be the parameter of k in traditional algorithms of outlier detection and clustering.Simulations on both synthetic data and real world data witness the effectiveness of the proposed method.At last,this paper proposes a novel clustering and outlier detection method based on weighted natural neighborhood graph,which is formed by our natural neighbor method.This algorithm can figure out cluster,global outlier,local outlier and outlier cluster with no parameter(e.g.,parameter k).Experimental results show that our mining result intuitively reflects the characteristics of data sets.
Keywords/Search Tags:nearest neighbor, natural neighbor, clustering, outlier detection
PDF Full Text Request
Related items