Font Size: a A A

Outlier Detection Algorithm And Application For Hubness Phenomenon

Posted on:2021-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:W Q MaFull Text:PDF
GTID:2518306095975799Subject:Computer technology
Abstract/Summary:PDF Full Text Request
It is an important method to detect outliers by using inverse nearest neighbor query.However,with the rapid expansion of data sets,inverse nearest neighbor queries appear hub phenomenon,which has a great impact on the performance of the algorithm.Based on the performance of outlier detection,this paper studies the hub phenomenon in inverse nearest neighbor query.The main research results are as follows:(1)A two-way nearest-neighbor-based outlier detection algorithm for the Hubness phenomenon is proposed,HPOD algorithm and HPOD2 algorithm.Firstly,we introduce and redefine the influence space of the object,in which the effect of k-nearest neighbor and reverse-neighbor are fused into our algorithm.Such a novel influence space effectively improves the accuracy of the algorithm;Secondly,we employ the heuristic information,which considers not only the degree of outliers of the object but also the outliers of its k-nearest neighbors.The use of this information significantly decreases the value of k,so that reduces the computational complexity and running time of the algorithm;Finally,experimental results driven by real datasets validate that the proposed algorithm is more efficient and accuracy than other outlier detection algorithms.(2)In order to speed up the mining efficiency of the above research work in high-dimensional data,this paper analyzes the algorithm and proposes an outlier detection algorithm based on pruning strategy,which sign ificantly improves the efficiency of the algorithm without affecting the accuracy of HPOD and hpod2 algorithm.Finally,artificial data set,UCI data set and spectral data set are used to verify that the outlier detection algorithm based on pruning strategy for hub phenomenon can effectively reduce the calculation cost in high-dimensional data set and improve the efficiency of the algorithm.(3)On the basis of the above research,a prototype system of astronomical spectrum outlier data detection based on pruning strategy is designed and implemented with Java language,Java GUI and Intelli J IDEA.The system operation results show that the prototype system based on pruning strategy can effectively find the abnormal data in the celestial spectrum,and provide an effective basis for the abnormal analysis of the celestial spectrum data.
Keywords/Search Tags:outlier detection, influence space, Hubness phenomenon, reverse k nearest neighbor, pruning
PDF Full Text Request
Related items