Font Size: a A A

Research On Outlier Detection Algorithm Based On Hash Coding

Posted on:2020-01-21Degree:MasterType:Thesis
Country:ChinaCandidate:M ShuFull Text:PDF
GTID:2428330578461339Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology,data has exploded.However,due to various factors such as instrument failure,natural environment,and human error,there may be some deviation or abnormality in the data collection process.Detecting and eliminating anomalous data in data is one of the main tasks of data mining.Outlier detection is widely used in reality due to the ability to detect abnormal data and noise in data,such as fraud detection,abnormal behavior detection,medical analysis,and the like.Many outlier detection algorithms have been proposed.According to the given hypotheses,they can be roughly divided into: statistical outlier detection algorithm,neighborhood-based outlier detection algorithm,cluster-based outlier detection algorithm,subspace-based outlier detection Algorithm,classificationbased outlier detection algorithm,and isolated outlier detection algorithm.As the size of datasets in various fields continues to increase,the time overhead of the neighborhood-based outlier detection algorithm is greatly increased,making the algorithm less efficient.One of the commonly used solutions to this problem is to use an efficient neighbor search method to detect abnormal data to ensure a certain degree of time efficiency.The hash coding method is favored in the neighborhood search because it not only has high search efficiency and low storage space requirement,but also preserves the original neighbor relationship of the data in the data encoding process.Therefore,this thesis proposes a research on outlier detection algorithm based on hash coding.The research work includes the following:(1)This thesis proposes an outlier detection algorithm for large-scale data—an outlier detection algorithm combining locality sensitive hashing and random walks(LSH-RWOD).Locality sensitive hashing is used to efficiently process high dimensional data,then the similarity is obtained by using the distance between data,and it is transformed into a transition probability of random walks.On this basis,the random walks technology is used to calculate the transfer probability between the data,in which the transfer probability between normal data will be higher and higher,and the probability of anomaly will be lower and lower,and then the anomaly data can be finally distinguished according to this property.The experimental results show that the proposed method can effectively detect outliers in the data and is generally better than other outlier detection algorithms.(2)The characteristics of abnormal data are often determined by partial feature subspaces,but many outlier detection algorithms need to consider all feature spaces.As the data dimension increases,data features have a lot of redundant information,which easily covers abnormal data.The effect of the outlier detection algorithms is reduced.To solve this problem,a hash forest-based outlier detection algorithm(HFOD)is proposed.The algorithm uses the hash coding method to divide the data into neighbors in the feature subspace,so as to construct a hash tree and construct the hash forest.On this basis,the anomaly is distinguished according to the average leaf node density in the hash forest according to the data to be measured,that is,the smaller the average leaf density of the data to be tested,the more likely the data is abnormal data.This method eliminates the need to consider all data feature spaces while improving generalization capabilities.Considering the performance of the algorithm,this thesis compares the algorithm with other outlier detection algorithms.Experiments show that the outlier detection algorithm proposed in this thesis can effectively detect the anomalies in the data.
Keywords/Search Tags:outlier detection, locality sensitive hashing, random walks, hash forest, leaf node density
PDF Full Text Request
Related items