Font Size: a A A

Based On A Grid Of Dbscan And Cluster Boundary Technology

Posted on:2008-04-20Degree:MasterType:Thesis
Country:ChinaCandidate:F ZhangFull Text:PDF
GTID:2208360215460485Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the wide usage of information technology, data generated from different information systems become more and more. To utilize the huge original data to analyze current situation and make prediction effectively, have already become a great challenge. Data Mining then appears to satisfy the demands. Data mining, also known as knowledge discovery in database (KDD), is the process of discovering interesting, useful and previously unknown knowledge from very large databases. It is one of the most active fields in database. Data mining aims to discover much trustful, novel, useful and readable knowledge, rules or abstract information from very large database. This plays a new significant role to the stored data in the info-times. With the rapid development of the data mining techniques, clustering analysis and boundary pattern detection, as important parts of data mining, are widely applied to the fields such as pattern recognition, data analysis, image processing, and market research. Research on clustering analysis and boundary pattern detection algorithms has become a highly active topic in the data mining research.Clustering is an important task of data mining. Clustering, as an unsupervised classifying method, is the process of grouping together similar multi-dimensional data vectors into a number of clusters. Clustering aims to maximize the similarity between objects within the same cluster and minimize the similarity between objects in different clusters. Traditional density-based clustering methods DBSCAN, could be adaptable to the dataset of arbitrary shape, but have high computational complexity; Traditional grid-based clustering methods have high efficiency, but low precision. This paper proposes GbDBSCAN (Grid based DBSCAN), which has synthesized the merits of both the above two clustering methods, adopts gird and data binning technique to improve the efficiency. It can also identify and handle border points. Experiment results show that GbDBSCAN is much more efficient than DBSCAN in low dimensional data space, without lowering the quality of DBSCAN.Boundary pattern detection has been playing an important role in daily application and is important for data mining. In order to detect boundary points of clusters effectively, we proposes an algorithm named BOURN (Boundary Pattern Detection based on Statistics Information). BOURN sets neighborhood radius based on the k-dist statistics information of the objects in the whole dataset, and searches boundary points based on the k-dist statistics information of neighbors in the neighborhood around it. Experiments show that BOURN can find boundary points of clusters of arbitrary shapes, different sizes and different density, and can remove noise effectively.
Keywords/Search Tags:Clustering, density, boundary
PDF Full Text Request
Related items