Font Size: a A A

Outlier Mining Method Based On Gini Indexes And Sub-space Research

Posted on:2013-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:W W SunFull Text:PDF
GTID:2218330374963627Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The outlier detection and analysis is one of the main research contents inthe fields of data mining, and widely applied in many fields, such as fraudanalysis, network intrusion detection. Because the parameters are artificially setin outlier mining algorithm, outlier mining algorithms are studied by using Giniindex as the measure factor of outlier which are based on the attributesclassification character and the data set feature. The main research work can beshown as follow:1) An outlier mining algorithm based on Gini index is presented. Thealgorithm adopts Gini index to measure data set's impurity in order to depict theoutlier degree. The algorithm's mining results are objective because theparameters were not artificially input. In the end, the experimental resultsvalidate the feasibility and efficiency of the algorithm by adopting UCI and thespectrum data sets.2) Outlier subspace and outlier mining algorithm based on weighted Giniindex are presented. The outlier subspace and attribute weighted vectors of thedata sets are obtained by using Gini index value on every dimension, thenoutliers are mined by adopting statistics idea. Because the parameters are notartificially input, the effect of anthropogenic factor to the outlier mining result isavoided and can effectively respond to high dimension outlier mining. In theend, the experimental results validate the feasibility and efficiency of thealgorithm by adopting UCI and the spectrum data sets.
Keywords/Search Tags:Outlier, Gini index, Outlier measure factor, Attributeweighted vectors, Outlier subspace
PDF Full Text Request
Related items