Font Size: a A A

Research On Outlier Detection Algorithm Based On The Change Of The Center Of Gravity In Neighborhood

Posted on:2016-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y HuangFull Text:PDF
GTID:2308330464956883Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, data mining technology is constantly updated, which plays an important role in various fields. This technology has many branches, which are worthy of in-depth study. Outlier detection is one of those branches. Outlier detection is to find objects those behavior or characteristics are different from most objects’ from the same data set. At present, many industries have used outlier detection technique, for example, it can be used to detect financial fraud and analysis of the formation of abnormal climate.The paper first introduces the concept, definition of outlier, and points out the advantages and shortcomings of typical outlier detection algorithm. Then it describes the main content of this paper, which is the innovation of this paper. The main research contents are two aspects as follow.First, there is a problem of newest and widely used robust outlier detection algorithm based on instability factors. The problem is the test result maybe influenced because of the tested point locating in space between sparse region and dense region. In order to solve this problem, this paper proposed an outlier detection algorithm based on change of neighborhood center of gravity. The paper defines related center of gravity in neighborhood and unrelated center of gravity in neighborhood. The influence caused by the uneven distribution of the neighbors can be avoided by comparing the change related center of gravity in neighborhood with the change of unrelated center of gravity in neighborhood. The radio of total change of related center of gravity in neighborhood and total change of unrelated center of gravity in neighborhood is instability value. The greater instability value of tested point is, the more possible an outlier could be.Second, updating data set can be deleting data or adding data. Currently, incremental outlier detection algorithms based on neighborhood information have to search the whole data set to finding the affected object and the neighbors of new object. Thus it is inefficient. In fact, for the huge data set only a small part of objects is affected by update, also only a small part of objects will become neighbors of new objects. So there is no need to search the whole data set. In order to improve the search efficiency, do incremental improvements on the outlier detection algorithm based on change of center of gravity in neighborhood, which reduces the search time. Data set is clustered on the foundation of outlier detection algorithm based on change of center of gravity in neighborhood, then every cluster is searched to find affected objects. every cluster is checked up whether worthy of seraching before searching. If the cluster has no worthy, it can be passed. This method reduced the search space, improved the detection efficiency.At last, experiments were carried out on robust outlier detection using the instability factor and LOF and the proposed method using the artificial and real data sets to verify the effectiveness of the proposed algorithm. Experiment’s results showed that the proposed outlier detection algorithm based on the change of center of gravity in neighborhood is more accurate. Experiments were carried out on Inc LOF and proposed incremental algorithm on real data set to verify the efficiency of the proposed incremental algorithm. Experiment’s results showed that the proposed incremental algorithm is more efficiency.
Keywords/Search Tags:data mining, outlier detection, center of gravity in neighborhood, incremental
PDF Full Text Request
Related items