Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets

Posted on:2005-02-22

Degree:Master

Type:Thesis

Country:China

Candidate:H Wang

Full Text:PDF

GTID:2168360122497723

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Clustering analysis plays a very important role in the theory and applications of data mining, also a fundamental measure and method about data partition or grouping disposal. Up to now, large numbers of data clustering algorithms have been raised. Furthermore there are many successful applications about them. Whereas they extend chiefly the conclusions of Multivariate statistics analysis and fuzzy mathematics, namely, they are these kinds of algorithms based-on distance or threshold. When applied to practical scopes, most clustering algorithms need modify or some domain knowledge will be considered. Many of them are not suitable for the multivariate mixed data clustering. Especially in the fields of commerce decision-making, market analysis, criminal investigation detection, knowledge discovery, biology, Web document classification, and so on. The multivariate mixed data clustering algorithms are potentially promising. In the background of this challenge, this thesis is brought out.This thesis studies primarily the three representative multidimensional data clustering algorithms such as hard clustering partition, fuzzy C-mean clustering aad possibility clustering etc. Among them, an appropriate threshold is a prerequisite. Actually the uniform criterion of the threshold doesn't exist, such brings users more flexibility while enhancing their randomicity. Such are the defect of these kinds of clustering algorithms. This thesis raises a new clustering algorithm based-on partitioning clustering. Such are hard cores as three basic principles as follows: Firstly, "Concurrence Maximization Theory", two valid data sets are very similar if they have the same most of their central characteristics. Secondly, "Class Label Minimization Theory", some valid data sets are possibly similar if they have one vital or crucial attribute value. Thirdly, "Membership Theory", this thesis utilizes the frequent-items sets algorithm of the association roles to judge data sets most possibly pertain to which class. Lastly, "Inherited Attribute Theory", the inherited relation between two data sets can reflect some similarity. This thesis achieves semi-fuzzy C-mean clustering. This new clustering algorithm is potentiallysuccessful through theoretical proof and experimental testing, it can attain more valid classes than traditional K-mean methods. The partitioning clustering algorithm implements the whole scopes clustering, the performance of the new clustering is insensitive to data sets' input order, obtains dynamic clustering , is efficient to multivariate attributes data clustering and minimizes the domain knowledge. This thesis is an exploratory work to the clustering theory of data mining, especially about the multivariate mixed variables.

Keywords/Search Tags:

Data Mining, Clustering Analysis, Partitioning, Possibility Clustering, Semi-Fuzzy C-mean Clustering

PDF Full Text Request

Related items

1	Semi Supervised Clustering Algorithm And Its Application And Research
2	Technology Research, Data Mining Based On Fuzzy Clustering
3	Novel Fuzzy Clustering Algorithms And Applications
4	Research On Clustering Ensemble And Semi-Supervised Clustering In Data Mining
5	Research On Subspace Fuzzy Clustering Algorithm Driven By Viewpoint
6	Study On Space Partitioning-based Optimized Clustering Algorithms And Related Techniques
7	Research On Fuzzy Clustering Analysis In Data Mining
8	Research On Blocking Fuzzy Clustering Algorithm Based On Density Of Samples
9	Research On Outlier Detection Based On Possibilistic Fuzzy Clustering
10	Research On Clustering Algorithms In Traffic Domain