Font Size: a A A

Based On Grid Clustering Algorithm And Isolated Points

Posted on:2008-03-02Degree:MasterType:Thesis
Country:ChinaCandidate:C ZhangFull Text:PDF
GTID:2208360215961240Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The process of discovering interesting, useful and previously unknown knowledge from very large database is known as data mining. Based on special request by people, data mining can retrieve the information that is on demand from a mass of dataset for use by people. As forecasted by forecast expert: with the development about computer technology and accumulation about data, data mining will become a new industry after five or ten years in China.Clustering analysis has become a highly active topic in the data mining research. Its task is the process of grouping the data into classes or clusters so that objects with--in a cluster have high similarity in comparison to one another, but are very dissimilar to objects in other clusters. From the other view, clustering analysis is just as outlier detection, the research objects are very small data that deviate from the other large dataset. For applications such as detecting criminal activities of various kinds (e.g. in electronic commerce), rare events, deviations from the majority, or exceptional cases may be more interesting and useful than the common.Clustering and outlier detection are supply each other, we need to decide how to deal with outliers when clustering; at the same time, detecting outlier also need has a knowledge about clustering some time. People use clustering and outlier detection technology to identify dense area or isolate area, and finally find distributed pattern when all comes to all, the interesting relationship among data attributes. Nowadays, clustering and outlier detection are widely applied to the fields such as pattern recognition, data min- -ing, machine learning, space database technology, biology, in-break detection and weather forecast, make a huge success and create high values.Based on the analysis of traditional grid clustering algorithms and in order to resolve the defects that it has, we advance supply an algorithm based on overlapping grid and have conducted a series of experiments, including the experiment of the correctness of grid clustering, the experiment on synthetic datasets and real dataset. As shown in the experimental results, OGBC algorithms can be more nicely and efficiently identify cluster with any shapes or size and has more effectively than others algorithms on performance and precision.At the same time, Analysis of the existing density based on the outlier detection algorithm, based on its performance and the accuracy of the shortfall, this paper presents a new outlier detection algorithm based on local deviation coefficient factor. The results showed: The algorithm about outlier detection for the same type of technology based on the density of the outlier detection algorithm in performance and quality has big advantage.
Keywords/Search Tags:Data Mining, Clustering Analysis, Outlier Detection, overlapping Grid, Local Deviation Ratio, Local Deviation Coefficient
PDF Full Text Request
Related items