Font Size: a A A

Research On Outlier Detection Based On Neighborhood Rough Sets

Posted on:2022-12-09Degree:MasterType:Thesis
Country:ChinaCandidate:X DuanFull Text:PDF
GTID:2518306770995479Subject:Computer Software and Application of Computer
Abstract/Summary:PDF Full Text Request
Outlier detection is an important research direction in the field of data mining.Its purpose is to find out data objects that behave significantly differently from other data objects in the dataset.Outlier detection has important research and application value in intrusion detection,credit card fraud,medical diagnosis and other fields.In recent years,research in outlier detection has received a lot of attention.Many scholars have proposed a series of outlier detection methods,but many existing methods do not take into account the uncertainty and incompleteness of the data.Therefore,the theory of rough sets is widely used for outlier detection.However,the classical rough set method needs to discretize the numeric data in the numerical and mixed datasets,and the discretization process is facing the problem of information loss,which easily leads to the performance degradation of outlier detection.Currently,how to solve these problems has become a hot research topic in the field.In this thesis,firstly,the main outlier detection algorithms are introduced and the basic concept of neighborhood rough sets is discussed.Second,in order to solve the problem that the existing rough set-based outlier detection methods cannot effectively handle both numerical and mixed datasets,a neighborhood granularity entropy model is proposed and an outlier detection algorithm based on neighborhood granularity entropy is proposed,which can effectively detect outliers from both numerical and mixed datasets.Third,a neighborhood size differentiation index is proposed,and the neighborhood size differentiation index is combined with the distance-based outlier detection method.It can simultaneously solve the problem that the rough set-based outlier detection method cannot effectively deal with numerical and mixed data sets,and the distance-based outlier detection method cannot effectively deal with symbolic and mixed data sets.The main work of this thesis can be summarized as follows:(1)Neighborhood granularity entropy model.In this thesis,we propose a new information entropy model--neighborhood granularity entropy,which provides a more comprehensive uncertainty measure mechanism and integrates the two concepts of neighborhood information entropy and neighborhood knowledge granularity,where the former can portray the completeness of neighborhood knowledge while the latter can portray the granularity size of neighborhood knowledge.(2)Outlier detection algorithm based on neighborhood granularity entropyTo address the problem that the traditional rough set-based outlier detection algorithm cannot effectively deal with numerical and mixed data sets,this thesis adopts a neighborhood rough set-based outlier detection method and introduces neighborhood granularity entropy into the neighborhood rough set-based outlier detection algorithm,and proposes a neighborhood granularity entropy-based outlier detection algorithm(OD?NGE).By calculating the outlier factor of each data object through the neighborhood granularity entropy,outliers can be effectively detected from numerical and hybrid datasets.The effectiveness of the algorithm is also demonstrated by comparing it with various other algorithms on a publicly available dataset.(3)Neighborhood granularity discrimination indexIn this thesis,we propose a new information discrimination metric based on neighborhood relations,called the neighborhood granularity differentiation index.The neighborhood granularity differentiation index has similar properties to Shannon's entropy,however,it is defined directly on the neighborhood relations and is obtained more efficiently and quickly by calculating the potential of the neighborhood relations instead of the neighborhood similarity classes.Also,because it incorporates the concept of neighborhood granularity,it can effectively measure the distinguishing ability of feature subsets.(4)Outlier detection algorithm based on neighborhood granularity discrimination index and distanceFor the traditional distance-based outlier detection algorithm cannot effectively handle symbolic and mixed datasets,while the classical rough set-based outlier detection algorithm cannot effectively handle numerical and mixed data,the neighborhood granularity discrimination index is introduced into distance-based outlier detection algorithm,and an outlier detection algorithm based on neighborhood granularity discrimination index and distance(OD?NGDID)is proposed.Outliers can be effectively detected from numerical,symbolic,and mixed datasets.The validity of the algorithm is verified by comparing it with other algorithms on open datasets.
Keywords/Search Tags:outlier detection, neighborhood rough set, neighborhood granularity entropy, neighborhood granularity differentiation index, outlier factor
PDF Full Text Request
Related items