Font Size: a A A

Research And Improvement Of Local Outlier Detecting Algorithm Based On Density

Posted on:2015-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:X X ZhaoFull Text:PDF
GTID:2268330428967683Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Outlier refers to these objects that do not accorded with the general rule of the normal data or deviate from the normal data. Their generation mechanism is completely different compared with the normal object, but they may contain very important information that we always neglected. Outliers are the main objects we will study such as in the field of Credit Card Fraud detection, Mobile Communication. Outlier will enable us to consider the problem from a new angle and to discover new theory and application.The outlier detection technology has important research value and can be widely used such as in drug research, user behavior analysis, network intrusion detection tock trading, industrial impairment, finance and other fields. The presence of financial fraud is detected by analyzing financial transaction data in financial analysis; Outlier detecting algorithm can be used to determine the consumption behavior of the very high or the very low income customers in the market analysis, and then analyzes the customer, classify the customer, then the orientate and predict the market; Outlier detecting algorithm can be used to find the unusual reactions of various methods for the treatment in the medical analysis. At present, how to find and process the outliers on large-scale and high-dimension dataset quickly and effectively is a very challenging problem.There are many categories of outlier detection algorithms, these algorithms can be divided into several categories:statistical-based, distance-based, density-based and deviation-based outlier detection algorithm. At present, with the fast development of artificial intelligence, machine learning and pattern recognition, more and more effectively, novel outlier detection methods appear. These novel outlier detection techniques include self organizing mapping technique, the technique of artificial neural network, fuzzy rough sets and partition technique etc. However, most of the existing outlier detection algorithms have some disadvantages to a certain extent:low detection accuracy, high time complexity, parameter settings depending on users strongly and bad algorithm scalability etc.The detection accuracy, time complexity and flexibility of most local outlier detection algorithms are not ideal, an improved density-based local outlier detection algorithm is proposed:(1) multiple parameters are used in the DBSCAN algorithm to get different cluster models, and then integrate these cluster models, prune the clustering results preliminarily and get preliminary abnormal data set. In order to avoid the error pruning of outliers located at the edge of clusters because of the improper parameter settings in DBSCAN, and thus can reduce the time complexity and keep the detection accuracy;(2) The concept of leave-one partition information gain is introduced, and the weight of attribute is determined by leave-one partition information gain, contributing degree of attributes determines the weight of attribute. the weight of attribute is determined by experts in most detection algorithms, which will have great influence on the test results. There are lots of human factors. leave-one partition information gain is a good solution to this problem o The introduced of leave-one partition information gain can reduce the dimensions of high dimensional data sets, and make the improved algorithm have good scalability on high dimensional data sets;...
Keywords/Search Tags:Data mining, outlier detection, density, degree of outlier
PDF Full Text Request
Related items