Font Size: a A A

Study On Local Outlier Detection Algorithm Based On Muti-clustering

Posted on:2014-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:H B LiuFull Text:PDF
GTID:2268330392971509Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer technology and databasetechnology, the data is growing explosively. In order to make full use of the data, datamining technology come into being. The main function of data mining is to extractunpredictable but potentially valuable knowledge implied in huge datasets. The outlierdetection is a very important area of data mining research. Outlier detection is mainlyused to find the data from the large and complex dataset which do not satisfy thegeneral pattern or deviate seriously from the mainstream data(normal data). Outlierdetection has many applications in the people’s production and life, such as networkintrusion detection, web mining, analysis of market transactions, medical diagnostics,meteorological research and so on.Scholars put forward various types of outlier detection algorithm, such asdensity-based algorithm, distance-based algorithm, deviation-based algorithm and so on.However,most of them have the same drawback that the time complexity is very high.Thus, researchers developed also a series of technologies to improve the performance ofthe outlier detection algorithm, for example, pruning techniques for data sets. Toovercome the shortcomings of LDOF, this paper proposes a multi-clustering basedoutlier detect algorithm PMLDOF. The new algorithm can reduce the time complexityof outlier detection, reduces the sensitivity of the nearest neighbor parameter K and alsoovercome the shortcoming of a single DBSCAN algorithm. Specifically, the mainresearch of this paper is as follows:①This paper introduces the research background and the domestic and overseasresearch situation of outlier detection.②Gives a detailed analysis to outlier detection, and summarizes the core idea ofeach algorithm and their scope of application. Simultaneously, we analysissystematically and comprehensively the ensemble learning, and discusses its core ideasand related technologies.③In order to reduce the time complexity and sensitivity of KNN in LDOF, anoutlier detection algorithm based DBSCAN-pruning PLDOF is proposed. However,PLDOF has a shortcoming that sometimes it prunes outlier. To overcome theshortcoming, this paper introduced the multi-clustering idea. Meanwhile, amulti-clustering-pruning based local outlier detect algorithm PMLDOF is proposed. ④Before the cluster partitions were integrated, the equivalent cluster in partitionsmust be matched. All mismatch situation of the equivalent cluster in different partitionsare analyzed, and a cluster match method is proposed. Similarly, the main content of themethod has been described in detail.⑤Gives a theoretical analysis to the PMLDOF, and verifies the effectiveness of thealgorithm on simulated data sets and real data sets.⑥Finally, summaries the work done in this paper, and proposes the future researchof outlier detection.
Keywords/Search Tags:Data mining, Local outlier detection, Multiple clustering, Pruning
PDF Full Text Request
Related items