Font Size: a A A

K-distance-based Outliers And Clustering Algorithm

Posted on:2007-08-22Degree:MasterType:Thesis
Country:ChinaCandidate:C K JiaFull Text:PDF
GTID:2208360185971217Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The process' of discovering interesting, useful and previously unknown knowledge from very large database is known as data mining. Data mining,also known as knowledge discovery in database(KDD),is one of the most active fields in database.Data mining aims to discover many trustful, novel, useful and readable knowledge, rules or abstract information from very large database. This plays a new significant role to the stored data in the info-times. With the rapid development of the data mining techniques, clustering analysis and outlier detection, as important parts of data mining,are widely applied to the fields such as pattern recognition,data analysis ,image processing,and market research.Research on clustering analysis and outlier detection algorithms has become a highly active topic in the data mining research.In this thesis, the author introduces the theory of data mining, and deeply analyzes the algorithms of clustering and outlier detection. Based on the analysis of density-based clustering and outlier algorithms, we present Local Outlier Coefficient (LOC),K-Distance Factor(KDF) ,Enhanced K-Distance Factor(EKDF) outlier algorithm and Local Outlier Coefficient-Based Clustering (LOCBC) algorithm.In this thesis, we have developed LOC, KDF, EKDF, LOCBC, LOF and RDNKNN algorithm and implemented it using Visual C++ 6.0. We conducted a series of experiments on synthetic datasets and the real database to verify the correctness of outlier algorithm, to verify the efficiency of outlier algorithm on synthetic datasets. We have verified the correctness of clustering algorithm on synthetic dataset, on the real database and on the dataset with different density; to verify the efficiency of clustering algorithm on synthetic datasets.As shown in the experimental results, LOCBC, LOC, KDF, EKDF algorithms can cluster correctly and discover outliers. Clustering and outlier algorithms are better on the response time, clustering precision and outlier detetion than that of RDBKNN and LOF.To sum up, LOCBC lgorithm can not only cluster on the database with even density, but also on the database with multi-density. The LOCBC algorithm can not...
Keywords/Search Tags:data mining, cluster algorithms, outliers detection, k-distance of p, k-distance neighbourhood
PDF Full Text Request
Related items