Font Size: a A A

Research On Efficient Outlier Detection Methods Supporting Multiple Queries

Posted on:2021-05-28Degree:MasterType:Thesis
Country:ChinaCandidate:J F LiFull Text:PDF
GTID:2428330602989062Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid growth of Internet,the academic field of data mining continues to develop.As one of the important components in the field of data mining,outlier detection is to discover the abnormal data information.At present,outlier detection technology has been widely used in network security,social analysis and so on.In the detection systems,with the continuous expansion of upper-layer services,the system receives more and more query requests in the same time period,and the timeliness of processing needs to be continuously improved.Therefore,this puts forward higher performance requirements for the performance of outlier detection algorithms.However,most of the existing outlier detection algorithms are oriented to a single query,which makes the system perform poorly when processing a large number of query requests in a short time,and makes users' feeling is not good.Aiming at this problem,this paper mainly studies outlier detection algorithms for multiple queries.The main contributions are as follows:(1)In this paper,an algorithm named R-tree Outlier Detection Algorithm-Single Query(RODA_SQ)is put forward,which is efficient.First,the algorithm improves the traditional spatial index R-tree.That is,adding a density attribute for each node in the R tree,and put forward a new degree of outlier estimation method.In addition,in the light of the existence feature of outliers in space,select the node which has the smallest density in the R-tree.And select nodes with smaller density from the R-tree,then preferentially calculate the data points in the node that are more likely to be outliers.This method can quickly determine a large filtering threshold.In order to improve the filtering effect,a split-new batch filtering theorem is put forward,which can reduce the amount of calculation.(2)Based on the RODA_SQ algorithm,the R-tree Outlier Detection Algorithm-Multiple Query(RODA_MQ)is proposed which can support multiple queries.The first step is to group the query tasks specified by the user,and ensure that the query tasks in this group share as much content as possible.This can guarantee the execution efficiency of outlier detection multiple queries to a certain extent,and reduce memory waste.The second step is to perform query processing operations on multiple query tasks in the group,until all processing of all query groups is completed.This algorithm realizes the sharing mechanism between multiple queries through in-depth analysis,and can complete multiple detection tasks in one processing.This can improved the detection efficiency to a certain extent and met the needs of users.(3)Using the real data sets and synthetic data sets,the two algorithms proposed in this paper are compared with the existing Iorca algorithm and SOP algorithm.The experimental results prove that compared with the existing outlier detection algorithms,whether it is a single query or a multiple query,the two algorithms proposed in this paper have improved the operation efficiency,and have good applicability and practical significance.
Keywords/Search Tags:Outlier Detection, Multiple Query, R-tree, Filtering Threshold, Data Mining
PDF Full Text Request
Related items