Study And Implementation Of Clustering And Outlier Detection Algorithms

Posted on:2007-04-01

Degree:Master

Type:Thesis

Country:China

Candidate:J L Liu

Full Text:PDF

GTID:2178360212467750

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Data mining is a decision support approach that extracts hidden, unknown, potentially useful knowledge and pattern from huge volume of data. Clustering and outlier detection are important areas in data mining. To meet the requirement of discovering knowledge in very massive dataset efficiently, the algorithms of data mining are required to have excellent scalability and high clustering accuracy. The grid-based approach can deal with massive low-dimensional datasets efficiently, which efficiency is low for high-dimensional datasets. This thesis studies previous clustering approaches based on grid, analyzes their characteristic and fitness, and then proposes a clustering algorithm based on CD-Tree, called CDT. Two pruning strategies are developed to improve the efficiency of CDT further. Extensive experiments on real and synthetic datasets also testify that CDT is better that other clustering algorithms based on grid.A new density-based algorithm is proposed. It can find the boundary of density change using by linear regression, and get multi-level clusters by DBSCAN to cluster the data objects in same density areas. Furthermore, the algorithm can get outliers when clustering by integrating DBSCAN and outlier detection algorithm (LOF). Experiments on real and synthetic datasets also show the validity of algorithms.The clustering and outlier detection algorithms are integrated into the data mining system-Scopeminer. The thesis introduces data structures used in the system and flow charts of algorithms show the usage method of the system by synthetic datasets.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Outlier Detection, CD-Tree

PDF Full Text Request

Related items

1	Study On The Algorithms Of Clustering And Outlier Detection Based On Neighborhood
2	Study Of Clustering And Outlier Detection Algorithm In Data Mining
3	Research And Implementation Of Clustering And Outlier Detection Algorithms
4	Research And Application Of Outlier Detection Algorithm
5	Study On Outlier Mining Algorithms Based On Clustering
6	Based On Clustering Analysis Of The Outlier Detection Research And Its Application In The Audit
7	Outlier Detection Based Medicare Anomalous Data Mining
8	Patent Value Mining Based On Deep Clustering And Outlier Detection
9	Research And Application On Outlier Data Mining Algorithm In Large Data Set
10	Research On Data Preprocessing Methods Based On Clustering And Outlier Detection