Font Size: a A A

Study And Improvement Of Local Outliers Mining Based On Density

Posted on:2015-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Z LiuFull Text:PDF
GTID:2268330422471568Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data mining is a hot issue in the field of computer research, whose purpose is toextract effective, novel, potential and available knowledge from large amounts of data.Traditional data mining focused on finding the general behavior of the data set, such asassociations, classifications and cluster analysis. However, outliers’ mining hammers atsearching relatively abnormal data patterns in large data set. Outliers’ mining plays animportant role in many application domains, such as financial fraud detection,detecting the network intrusion, finding rare elements of the set and testing theabnormal reactions of the new therapies.Outliers’ mining includes two main problems: outliers’ detection and outliers’paraphrase, while this thesis aims at how to detect outliers in data set effectively.Density-based outliers mining is an effective and practical method among kinds ofrelevant algorithms, which analyzed the abnormal behavior of object from local scopeand adopted the outlier factor to measure the outlying degree. This thesis researchedand improved the density-based outliers mining, the main work is as follows:①Elaborated the background and significance, the study status at home andabroad of outliers’ mining. Described the framework for outliers’ detection,introducing the basic information and relevant technologies of it. And this thesisexplained the evaluation criterion for outliers’ detection algorithms.②Summarized the typical algorithms for outliers’ mining, introducing theirmotivations and operating principle, analyzing the advantages and disadvantages ofthem.③Based on the deep analysis of the present density-based algorithms, this thesisproposed a novel algorithm based on the improved outlier factor—ISSDOF tostrengthen the effectiveness for outliers’ mining. ISSDOF used the concept ofInfluenced Space in influenced-outlierness-based algorithm (INFLO) to find theneighborhood for every object and created a new calculation method of outlier factorbased on Similar K_Distance Neighbor Series (SKDNS) through developing thethoughts of chaining distance in Connectivity-based Outlier Factor (COF).④This thesis verified the effectiveness of the proposed algorithm throughexperiments. The experiments on the analog data set demonstrated that the proposedalgorithm can effectively detect the outliers in data set with complex distribution, while the experiments on the real data set from UCI Machine Learning Repository furthervalidated the advantages of the proposed method. At last, this thesis applied theproposed algorithm and other methods to analyze the statistical data of basketballplayers, which revealed the universality and diversity of the proposed one.
Keywords/Search Tags:Outliers Detection, Influenced Space, Similar K_Distance NeighborSeries, Outlier Factor
PDF Full Text Request
Related items