Font Size: a A A

Research On Clustering Algorithm Based On Approximation Set Theory Of Rough Set

Posted on:2020-02-25Degree:MasterType:Thesis
Country:ChinaCandidate:Z H WangFull Text:PDF
GTID:2518306305995839Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays,with the great growth of AI technology,data mining,one of the most important fields,have attracted great attention,and has been applied to many fields,such as pattern recognition,biological science,natural language processing,and so on.Cluster analysis is an important technique for data analysis and processing in data mining,which is divided into unsupervised clustering and semi-supervised clustering.Unsupervised clustering is grouped unlabeled data objects according to their similarity.Semi-supervised clustering can be used to analyze a small amount of labeled data to partition unlabeled data,and has a wide range of applications.Rough set theory is to use the known knowledge base to describe those inaccurate or uncertain knowledge.In this dissertation we combine the approximation theory of rough set with unsupervised Fuzzy c-means(FCM)algorithm and semi-supervised K-means algorithm to improves the original algorithm,and the performance of the two improved algorithms is better than that of the original algorithms.The main work is as follows.(1)The traditional FCM algorithm is stable and easy to implement.However,when we use FCM in clustering,the data in the boundary can easily be divided into wrong classes,which reduces the accuracy.To solve this problem,we take the advantages of rough set approximation theory,and propose an improved algorithm Rough-FCM.Firstly,in Rough-FCM algorithm we divide the data into lower approximation sets or boundary region sets of all classes according to the threshold values.Each data can belong to the boundary region sets of multiple classes,but only to the lower approximation set of one class.Secondly,we use the new equation to update the clustering centers and membership matrices.Finally,we perform second clusters and get the final clustering result.By comparing the experimental results of Rough-FCM algorithm with other three clustering algorithms,Rough-FCM algorithm has certain advantages.(2)In view of the fact that many attribute values of high-dimensional sparse data are zero,we combined approximation set theory of rough set with semi-supervised K-means algorithm to propose Rough-kmeans algorithm.Firstly,the ratio of the number of non-zero data belonging to one class to the number of non-zero data belonging to other classes in each attribute of labeled data set is calculated,and the key attributes in each class are selected and classified into the key attributes set.Then according to these key attributes,K-means algorithm is used to cluster unlabeled dataset,and the clustering center is calculated.After that,we use approximation set of rough set to calculate the information gain of each attribute in unlabeled dataset.The information gain of every attribute is compared with the upper approximation and boundary threshold,and then we divide it into corresponding approximate sets.Next we increase the number of attributes which are related with clustering,and update the clustering center in order to improve the clustering accuracy.The experimental results show that Rough-kmeans algorithm can chose out the important attributes in each class,filter the invalid information,and improve the accuracy significantly.
Keywords/Search Tags:Rough set, Approximation set, Clustering, semi-supervised clustering, high dimensional sparse data
PDF Full Text Request
Related items