Font Size: a A A

Research On Clustering And Classification Algorithm Based On Rough Set And Inclusion Degree

Posted on:2016-09-17Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2308330461975295Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Clustering analysis is an important way of data mining in the field of data mining.It is a process of gathering the dataset composed of data object into different classes. In the dataset after clustering, the objects belonging to the same class have higher similarity and the objects belonging to different classes have lower similarity. Now the clustering analysis has been used widely in the following fields, such as machine learning, document classification, image processing, pattern recognition,deicision-making, data compressing and so on. At present, the main types of clustering are showing as following: type partition clustering whose representative is K-means and Rough K-means, hierarchical clustering whose presentative is Cure,Agnes,Dlana,Rock,Chanmelon,Birch, hybrid clustering, the clustering based on density whose presentative is Dbscan,Optics and so on.Among these, K-means is a classical type partition clustering. It is used widely because it can be understood easily, its higher efficiency and it can adapt to the process of large data sets. Meanwhile it also has some disadvantages, such as the result relying on the initial choice of central value too much, unstable result, easy to fall into the local optimum, only the globular clusters can be got and so on. Besides, it doesn’t take the fuzzy class on border and the clustering problem of the data set combining numerical attribute and classification attribute into consideration, which limits its application value.So far, there are widely literature proposing specific improving measures to improve the K-means and many methods have been proved to be useful but with some limitation. On the same time, to the data set owning the classification attribute, it is more useful to introduce the decision tree classification such as ID3 than the traditional clustering.Using the tradition ID3 also have some disadvantages, such as: the problem of quantitative deviation, the decision tree being complicated and low efficiency. So, it is a significant to perfect the clustering classification algorithm.This paper research the following works: 1. Improve the deficiency of partition clustering algorithm that it only can cluster the data set owning the numerical attributeand then proposed the K-means aims at the data set owning the mixed attributes. Take it into consideration that the influence caused by numerical and classification in clustering.And also proposed a new distance measure- for the frequency distance metric dimensions. 2. Proposed the Rough K-means based on density weighted to solve the problem of falling into local optimum caused by the random choice of the initial value of the center and the problem of deviation between the mean value and the practical distribution of data object caused by the influence of outlier. This will make the result conform to the practical distribution and more precise.3. This paper propose a new attributes selection measure to build up decision tree, that is, inclusion important degree. The proposed approach is based on the rough set theory and inclusion degree theory. It consider not only the classification ability of attributes, but the comprehensive contribution of the attributes in classification. The experiments showed that the improved ID3 algorithm solved the multi-valued attribute incline problem and outperform traditional algorithm ID3 both in classification efficiency and the size of decision tree.
Keywords/Search Tags:Cluster analysis, K-means, Rough sets, Inclusion degree, ID3 algorithm
PDF Full Text Request
Related items