Data Mining, Cluster Analysis Algorithm

Posted on:2007-10-24

Degree:Master

Type:Thesis

Country:China

Candidate:X Wang

Full Text:PDF

GTID:2208360182996977

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

With the wide usage of information technology, data generated from differentinformation systems become more and more. How to utilize the huge original data toanalyse current situation and predict future of quantities effectively, have alreadybecome a great challenge that the mankind has faced. Therefore the data miningtechnology is arised at the historic moment and can be developed rapidly, which isattributed to the necessary consequence of the conflicting movement between therapid increasing data and the poor information day by day. Data Mining, also called as knowledge discovery of databases (KDD), is aprocessing procedure of extracting credible, novel, effective and understandablepatterns from databases. Data Mining is a relatively young research and applicationarea based on database techniques, which synthesizes multidisciplinary productions,such as logic statistics, machine learning, fuzzy theory and visual computing, in orderto acquire usable information from database .It has achieved increasing attention inthe past years ,and has been applied to finance, insurance, communalfacilities, government, education, telecommunication, software developmentof the bank, transporting, etc.Cluster analysis is an important technology in data mining. Clustering, anunsupervised classifying method, is the process of grouping together similarmulti-dimensional data vectors into a number of clusters or bins. Clustering processesare always carried out in the condition with no pre-known knowledge, so the mostresearch task is to solve that how to get the clustering result in this premises. The mostresearch about clustering is focused on clustering algorithms, the main purpose is toproduce practical algorithms with better performance. Up to now, many clusteringalgorithms have been presented, but these algorithms are only suited special problemsand users. Furthermore, they are imperfect both theoretically and methodologically,even severe fault. Optimizing deeply clustering algorithms will not only help toperfect its theory, but also its popularization and application. This dissertation systematically, deeply, roundly and detailedly studies andanalyses the data mining technique, especially clustering analysis. The main contentsare listed as follows:(1) Description in brief of the data mining technique. This paper introduces thebasic concept, classes, main functions, key technique and typical applications.(2)Research Of data mining Tools .The paper introduces the common tools ofdata mining, Analyses and compares the functions of overseas leading data miningtools(SPSS and DBMiner)based on the real examples and the conclusion is given inthe paper.(3) Description in brief of clustering analysis. The paper analyses the clusteringmethods and the representative clustering algorithms, puts forward the typicalrequests of clustering and compares the common clustering algorithm, so that peoplecan easily find a clustering method that suits a special problem.(4) An improved algorithm of K-Means is proposed. This paper analyses theNearest Neighbors Absorbed First (NNAF) clustering algorithm. This algorithm cancluster quickly with noisy . However, clustering quality will degrade when the clusterdensity and distance between clusters are not even. In this paper aNearest-Neighbors-First clustering algorithm based on data partitioning is proposed.The analysis on theory and experiments show that the new algorithm can wellimproves the the quality of clustering.

Keywords/Search Tags:

Data Mining, Clustering Analysis, data partitioning, nearest neighbor first

PDF Full Text Request

Related items

1	Study On Continuous Attributes Discrtization Algorithm Based On The Nearest Neighbor-clustering
2	Study On Space Partitioning-based Optimized Clustering Algorithms And Related Techniques
3	An Uncertain Continuous Nearest-Neighbor Query Based On The Conceptual Partitioning
4	Improved K-nearest Neighbor Algorithm And Its Application In Text Analysis
5	Natural Neighbor:The Concepts And Applications In Data Mining
6	Customer Transaction Data Clustering Analysis And Parallelism Based On Shared Nearest Neighbor
7	Research On K Nearest Neighbor Query Over Encrypted Data
8	Clustering Incomplete Data Using Pseudo Nearest Neighbor And Interval-valued Distance
9	Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets
10	Research On Fast Density Clustering Algorithm Based On Nearest Neighbor Query Technology