Font Size: a A A

Research On Semi-supervised Clustering Algorithms With Pairwise Constraints

Posted on:2021-05-05Degree:MasterType:Thesis
Country:ChinaCandidate:Y QinFull Text:PDF
GTID:2428330626458573Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semi-supervised clustering algorithm is a new type of algorithm,which is formed by adding semi-supervised learning into traditional clustering.It can guide clustering by using supervised information.Supervised information can be divided into pairs of constraints and independent class labels.However,in real life,independent class labels often need a lot of extra work to obtain,and the determination of pairwise relationship between samples is relatively simple.Therefore,we consider to improve the clustering performance through the supervision information of pairwise constraints.But the shortcomings of traditional semi-supervised clustering cannot be ignored.At first,the filtering of the initial prior set is random.Secondly,in the data set,the number of sample points with supervision information is far less than the number of unmarked sample points.At this time,through active learning,unmarked sample data can be trained.However,the existing semi-supervision framework combined with active learning has a high iterative time.Finally,up to now,the pairwise constraint is in the module in the soft partition methods such as paste clustering,local optimization may also occur.To solve these problems,this paper studies the semi-supervised algorithm with pairwise constraints.The details are as follows:In view of the randomness of iterative instability and prior information selection in the existing framework and model of active learning semi-supervision,this paper considers the use of density criteria to determine the prior set,and through active learning,the points with the greatest uncertainty in the unlabeled samples are marked with active constraints,and the constraints are redefined.Through traditional semi-supervised clustering an improved stable cop-kmeans clustering algorithm based on active learning(ISSCC-AL)is proposed.ISSCC-AL algorithm is divided into two parts,one is to build a stable prior set and the other is to build an active iterative framework.Compared with traditional semi-supervised clustering,the algorithm has better performance in clustering results and iterative time.In real life,many data sets are fuzzy.In order to solve the problem of mistaking caused by fuzziness and add pairwise constraints,an improved active semi-supervised FCM based on cross entropy(ASFCM-CE)is proposed.In this algorithm,weight and cross entropy are added to improve the objective function,and in the following process,the fuzzy points on the boundary are actively labeled to make the clustering boundary division clearer.Finally,through experiments,the algorithm in this paper can get higher accuracy.This paper has 28 figures,14 tables and 111 references.
Keywords/Search Tags:Pairwise constraints, Clustering, Semi-supervised clustering, Active learning, Cross entropy
PDF Full Text Request
Related items