Font Size: a A A

Analysis And Research On Density-based Local Outlier Detection

Posted on:2013-05-26Degree:MasterType:Thesis
Country:ChinaCandidate:C M JieFull Text:PDF
GTID:2248330362974269Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
In recent years, with the rapid development of computer scientific technology anddatabase technology, data mining technique has been developing rapidly and is widelyused in various fields. Data mining has been defined as the non-trivial process ofidentifying valid, novel, potentially useful, not known in advance and ultimatelyunderstandable information and knowledge from a large amount of uncompleted, noisydata. Outliers are often more interesting than the common ones since they contain usefulinformation underlying the abnormal behavior from a knowledge discovery standpoint.As an important branch of data mining field, outliers detection is to find the exceptionalobjects which do not satisfy the common patterns or deviate much from the rest objectsof the dataset by some measure. At present, outliers detection has many importantapplications, such as fraud analysis of the telecommunication bill, credit card frauddetection, network attacks detection, pharmaceutical test, video surveillance, extremeweather prediction and so on.Outliers mining technique was used in the sector of statistical analysis firstly andformed the statistical distribution based outliers detection algorithm. And then, manyexperts and scholars put forward a variety of classic outliers detection algorithms, suchas deviation-based algorithm, distance-based algorithm, cluster-based algorithm,density-based algorithm and so on. To a certain extent, these algorithms are able to getgoods results in their suitable fields; however, there are some deficiencies andshortcomings in some aspects, such as low detection efficiency, low detection accuracy,parameter settings depending on the user’s prior knowledge, poor applicability inhigh-dimensional dataset and so on. In the dissertation, many existing differentalgorithms are studied and discussed in detail, and an improved density-based localoutliers detection algorithm named SSMOD is proposed to resolve the defects andshortcomings of existing methods. Specifically speaking, the work made in thisdissertation mainly includes the following aspects:①A lot of analysis and research on the background and significance of outliersmining is done, and a survey of domestic and international research advances is alsomade.②Some classic outliers mining algorithms are analyzed systematically andcomprehensively, statistical distribution based algorithm, depth based algorithm, cluster based algorithm and distance based algorithm are elaborated emphatically. Comparingthe advantages and disadvantages of classical algorithms just mentioned. Outliersmining hotspots and development trends are introduced in brief at the last part ofchapter two.③Based on existing classical algorithms, like LOF and NDOD, a new improvedsquare symmetric neighborhood and memory-effect based outliers detection algorithmis proposed. In this new algorithm, a new measure is taken to measure the outlier-nessof objects and the complexity of algorithm can be reduced greatly with the help of themeasure.④From the view of theoretical analysis and experimental test, a comprehensiveperformance evaluation on the new algorithm is made. And this paper also explores theimpact of the initial parameters on the results of outliers detection. How the algorithmworks under low dimensional and high dimensional datasets is also discussedpreliminarily in chapter four.⑤Based on the work of this paper,some predictions on outliers mining techniqueresearch are made in brief..The new improved density based outliers detection algorithm is evaluated byexperiments, using synthetic dataset and KDD CUP1999dataset. Experimental resultsindicate that SSMOD is not only efficient in the computation but also more effectivethan LOF and NDOD algorithms in detection accuracy.
Keywords/Search Tags:Outliers Detection, Square Symmetric Neighborhood, Memory-effect, Density, Local Outlier Degree
PDF Full Text Request
Related items