Multi-Density Clustering And Outlier Recognition Algorithm Based On Grid Adjacency Relation

Posted on:2011-08-27

Degree:Master

Type:Thesis

Country:China

Candidate:G X Li

Full Text:PDF

GTID:2178360305961073

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

Cluster analysis and outlier recognition are the important branch in data mining domain. With a wide range of applications of the cluster analysis and the outlier recognition technology in scientific research, market analysis, life sciences, and many other disciplines, their important position is also increasingly obvious. By researching on adjacency relations between grid units in data space, the thesis proposes a novel clustering and outlier recognition method using grid unit's relations in data space. The research work are as follows:Based on analyzing the relation between grid division and uniform distributive data projection diversity, the thesis presents a relationship theorem of grid division and the data projection diversity, and a diversity grid division method. It can deal with fraction when grid division is not an integer. This grid division method is easy and feasible because of considering data distribution and reducing the redundant grid. In order to determine the relationship between adjacent units, a kind of diversity function on distance of center of mass and relative density is defined.Outlier are some deviation objects of data points. The thesis presents an outlier recognition algorithm based on grid adjacency relation (GAO), according to the density of outlier unit comparing to its neighborhood high or low. Outlier and outlier unit are determined by the degree of deviation, which is measured by the relative density and distance of center of mass between units. The experimental results show that the algorithm can recognize outlier of multi-density and large data sets effectively. The algorithm efficiency is better than that of the Cell-based algorithm.The thesis proposes a multi-density clustering algorithm based on grid adjacency relation (GAMD) using data distribution characteristics within units, which is reflected by the unit density and the center of mass. In order to determine the unit boundary, the algorithm measures the similarity between units by the relative density of units and relative distance of center of mass. Cluster is processed while outliers are recognized simultaneously. Goodness of fit is proposed for evaluating clustering validity. The experimental results show that the algorithm can cluster the arbitrary shape and multi-density data sets effectively. The clustering results have no relationship with data input and unit order.

Keywords/Search Tags:

Clustering analysis, Grid division, Adjacent cells, Diversity function, Outlier, Goodness of fit

PDF Full Text Request

Related items

1	Research On Smart Grid Big Data Outlier Detection And Analysis Of Electricity Behavior Based On Density Peaks Clustering Algorithm
2	Research On Local Outlier Detection Algorithm
3	Based On Grid Clustering Algorithm And Isolated Points
4	Clustering-based And Density Outlier Detection Method
5	Based On NMF And Similarity Metric Function Outlier Detection
6	An Outlier-Analysis Algorithm Based On The Grid Model
7	Grid-based Clustering Algorithm Analysis And Research
8	Research On Data Stream Clustering Algorithm Based On Density Grid
9	Research On Intrusion Detection Based On Clustering And Outlier Detection
10	Distributed Grid Environment Outlier Mining Method