Font Size: a A A

Research And Implementation Of Clustering Algorithm For Multidimensional Data Sets

Posted on:2005-02-22Degree:MasterType:Thesis
Country:ChinaCandidate:H WangFull Text:PDF
GTID:2168360122497723Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Clustering analysis plays a very important role in the theory and applications of data mining, also a fundamental measure and method about data partition or grouping disposal. Up to now, large numbers of data clustering algorithms have been raised. Furthermore there are many successful applications about them. Whereas they extend chiefly the conclusions of Multivariate statistics analysis and fuzzy mathematics, namely, they are these kinds of algorithms based-on distance or threshold. When applied to practical scopes, most clustering algorithms need modify or some domain knowledge will be considered. Many of them are not suitable for the multivariate mixed data clustering. Especially in the fields of commerce decision-making, market analysis, criminal investigation detection, knowledge discovery, biology, Web document classification, and so on. The multivariate mixed data clustering algorithms are potentially promising. In the background of this challenge, this thesis is brought out.This thesis studies primarily the three representative multidimensional data clustering algorithms such as hard clustering partition, fuzzy C-mean clustering aad possibility clustering etc. Among them, an appropriate threshold is a prerequisite. Actually the uniform criterion of the threshold doesn't exist, such brings users more flexibility while enhancing their randomicity. Such are the defect of these kinds of clustering algorithms. This thesis raises a new clustering algorithm based-on partitioning clustering. Such are hard cores as three basic principles as follows: Firstly, "Concurrence Maximization Theory", two valid data sets are very similar if they have the same most of their central characteristics. Secondly, "Class Label Minimization Theory", some valid data sets are possibly similar if they have one vital or crucial attribute value. Thirdly, "Membership Theory", this thesis utilizes the frequent-items sets algorithm of the association roles to judge data sets most possibly pertain to which class. Lastly, "Inherited Attribute Theory", the inherited relation between two data sets can reflect some similarity. This thesis achieves semi-fuzzy C-mean clustering. This new clustering algorithm is potentiallysuccessful through theoretical proof and experimental testing, it can attain more valid classes than traditional K-mean methods. The partitioning clustering algorithm implements the whole scopes clustering, the performance of the new clustering is insensitive to data sets' input order, obtains dynamic clustering , is efficient to multivariate attributes data clustering and minimizes the domain knowledge. This thesis is an exploratory work to the clustering theory of data mining, especially about the multivariate mixed variables.
Keywords/Search Tags:Data Mining, Clustering Analysis, Partitioning, Possibility Clustering, Semi-Fuzzy C-mean Clustering
PDF Full Text Request
Related items