Based On Grid Clustering Algorithm And Isolated Points

Posted on:2008-03-02

Degree:Master

Type:Thesis

Country:China

Candidate:C Zhang

Full Text:PDF

GTID:2208360215961240

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The process of discovering interesting, useful and previously unknown knowledge from very large database is known as data mining. Based on special request by people, data mining can retrieve the information that is on demand from a mass of dataset for use by people. As forecasted by forecast expert: with the development about computer technology and accumulation about data, data mining will become a new industry after five or ten years in China.Clustering analysis has become a highly active topic in the data mining research. Its task is the process of grouping the data into classes or clusters so that objects with--in a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. From the other view, clustering analysis is just as outlier detection, the research objects are very small data that deviate from the other large dataset. For applications such as detecting criminal activities of various kinds (e.g. in electronic commerce), rare events, deviations from the majority, or exceptional cases may be more interesting and useful than the common.Clustering and outlier detection are supply each other, we need to decide how to deal with outliers when clustering; at the same time, detecting outlier also need has a knowledge about clustering some time. People use clustering and outlier detection technology to identify dense area or isolate area, and finally find distributed pattern when all comes to all, the interesting relationship among data attributes. Nowadays, clustering and outlier detection are widely applied to the fields such as pattern recognition, data min- -ing, machine learning, space database technology, biology, in-break detection and weather forecast, make a huge success and create high values.Based on the analysis of traditional grid clustering algorithms and in order to resolve the defects that it has, we advance supply an algorithm based on overlapping grid and have conducted a series of experiments, including the experiment of the correctness of grid clustering, the experiment on synthetic datasets and real dataset. As shown in the experimental results, OGBC algorithms can be more nicely and efficiently identify cluster with any shapes or size and has more effectively than others algorithms on performance and precision.At the same time, Analysis of the existing density based on the outlier detection algorithm, based on its performance and the accuracy of the shortfall, this paper presents a new outlier detection algorithm based on local deviation coefficient factor. The results showed: The algorithm about outlier detection for the same type of technology based on the density of the outlier detection algorithm in performance and quality has big advantage.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Outlier Detection, overlapping Grid, Local Deviation Ratio, Local Deviation Coefficient

PDF Full Text Request

Related items

1	Research On Outlier Mining Method Based On Deviation Characteristic
2	Study On Local Outlier Detection Algorithm Based On Muti-clustering
3	Study On Overlapping Community Detection Based On Local Expansion And Optimization
4	Property Analysis-based Local Outlier Mining Algorithm And Its Application
5	Fog-degraded Image Restoration Using Local Mean Value And Standard Deviation
6	Research On Local Outlier Detection Algorithm
7	The Local Outlier Mining Algorithm Based-on Conditional Cumulative Holoentropy And Global Neighbourhood
8	Research Of Local Image Enhancement Algorithm
9	Outlier Mining Algorithm Research And Application
10	An Outlier Detection Of Hubness Algorithm Based On Density Deviation