Font Size: a A A

Study And Implementation Of Clustering And Outlier Detection Algorithms

Posted on:2007-04-01Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiuFull Text:PDF
GTID:2178360212467750Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a decision support approach that extracts hidden, unknown, potentially useful knowledge and pattern from huge volume of data. Clustering and outlier detection are important areas in data mining. To meet the requirement of discovering knowledge in very massive dataset efficiently, the algorithms of data mining are required to have excellent scalability and high clustering accuracy. The grid-based approach can deal with massive low-dimensional datasets efficiently, which efficiency is low for high-dimensional datasets. This thesis studies previous clustering approaches based on grid, analyzes their characteristic and fitness, and then proposes a clustering algorithm based on CD-Tree, called CDT. Two pruning strategies are developed to improve the efficiency of CDT further. Extensive experiments on real and synthetic datasets also testify that CDT is better that other clustering algorithms based on grid.A new density-based algorithm is proposed. It can find the boundary of density change using by linear regression, and get multi-level clusters by DBSCAN to cluster the data objects in same density areas. Furthermore, the algorithm can get outliers when clustering by integrating DBSCAN and outlier detection algorithm (LOF). Experiments on real and synthetic datasets also show the validity of algorithms.The clustering and outlier detection algorithms are integrated into the data mining system-Scopeminer. The thesis introduces data structures used in the system and flow charts of algorithms show the usage method of the system by synthetic datasets.
Keywords/Search Tags:Data Mining, Clustering Analysis, Outlier Detection, CD-Tree
PDF Full Text Request
Related items