Font Size: a A A

Research On The Outliers Detection Algorithm

Posted on:2019-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2348330569480190Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The noise of one person may be the signal of another's.Outliers mining is an important research direction in the field of data mining.How to use data mining technologies to detect the outliers accurately is of great significance in the process of data analysis and mining.This paper improves and optimizes the advantages and disadvantages of various methods based on the existing outliers detection methods after analyzing the different types of outlier detection methods.The research work in this paper can be summarized as follows:1)A clustering-based outliers detection algorithm is proposed.The deficiencies of classical clustering algorithms such as K-means and K-mediods are analyzed in clustering,ie the number of clusters can't be identified accurately,and the parameter selection is sensitive.A clustering by fast search and find of density peaks algorithm is adopted.The algorithm differentiates high-density points from low-density points accurately and has significant advantages in clustering accuracy over traditional clustering methods.The parameter selection is robust and free from the effect of data cluster distribution.A new outlier detection algorithm is proposed based on the clustering algorithm and combined with a new proposed outlier measure method.The effectiveness of the algorithm is verified on both manual and real datasets.2)An outliers detection algorithm based on the density of the reverse nearest neighbors is proposed.There are two shortcomings of LOF algorithm in the detection of outliers.First,misjudgments of outliers are easily caused in some specific data density distribution patterns.Secondly,the complexity of the algorithm is relatively high.A real-world "friend relationship" model is proposed.Through the idea of the model,the influence of reverse neighbors on the degree of outliers of nodes is considered,and the disadvantages of the LOF method are effectively solved.A new measure of outlier degree named "NLOF" is proposed.To compared with some traditional methods the proposed algorithm has a good effect in artificial datasets and real datasets.3)A data-weighted outliers detection algorithm based on PageRank is proposed.The outlier detection algorithm based on the density of reverse nearest neighbors only considers the number of neighboring nodes in the measurement of node density,and does not consider the influence of different density nodes on the outlier degree.Therefore,the PageRank algorithm is applied to the outliers detection.In the process of detection,each point in the data set is considered as a different page in PageRank.It is used to calculate the impact factor of the points in the data set.According to the size of the influence factor,the outlier candidate data set is determined.Using the new measure of outlier degree named "NLOF" to calculate outliers for each data point in the candidate data set,and finally outputs outliers.The effectiveness of the algorithm is verified on both manual and real datasets.
Keywords/Search Tags:outliers mining, outliers detection, outliers, clustering, PageRank
PDF Full Text Request
Related items