Font Size: a A A

Research On Outlier Detection Based On Density Difference

Posted on:2016-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:L L XinFull Text:PDF
GTID:2308330467972790Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of information technology, data acquisition and storage technology, many organizations such as enterprises, research institutions, and government agencies collect more data with large and complex data structures. How to extract valuable knowledge quickly and efficiently is a hot topic. Data mining is a useful technique to extract valuable knowledge. Outlier mining has received more and more attention due to the necessity of abnormal data mining. Rare events are often more valuable than the common event in many applications, such as network intrusion detection, case studies, business analysis, and so on. Some outliers are not wrong, and often contain meaningful knowledge. Thus, outlier detection is a hot topic.Existing outlier detection algorithms can be divided into outlier detection based on the distribution, outlier detection based on distance, outlier detection based on data depth, outlier detection based on density, outlier detection based on clustering. Density based local outlier detection algorithm evaluates the extent of an object that is an outlier, which has a broader application prospect.In this paper, we mainly focus on local outlier and collective outlier detection, and study outlier detection algorithm based on density. Through the analysis of the existing local outlier detection method based on density, we propose an improved algorithm based on mountain method that also is called IMMOD. In addition, we introduce the idea of clustering, improve the method of computing factors, and propose a collective outlier detection algorithm based on density difference. The contents are as follows:When detecting local outliers, we introduce algorithm IMMOD. IMMOD considers the difference among different attributes, and introduces entropy to confirm the significance of attribute. The weight of attribute is used to calculate the weighted distance between objects. The important attribute is with a big weight. Moreover, determining and reducing the secondary attributes can guarantee the precision and low down the computational complexity on the high dimensional datasets. The theoretical analysis and the empirical study both show that IMMOD could be applied on high dimensional datasets well with a few of parameters and high accuracy, which is better than other algorithms.When detecting collective outliers, we introduce the idea of clustering to know the structure of dataset. Firstly, we use IMMOD to evaluate the initial cluster centers. Secondly, we use FCM to cluster dataset into different clusters. Then, we determine large clusters and small clusters by certain rules. Last, outlier factor is computed by using improved DBLOF. DBLOF thinks that objects in small cluster may be collective outliers. The main purpose of clustering is to obtain the knowledge of clusters. Experiments show that DBLOF is more stable when detecting collective outliers.
Keywords/Search Tags:Information Entropy, Weighted Distance, Attribute Reduction, Clustering, Local Outlier, Collective Outlier
PDF Full Text Request
Related items