Font Size: a A A

Semi-supervised Clustering Algorithm And Implementation Based On Seeds Set And Pairwise Constraints

Posted on:2022-09-18Degree:MasterType:Thesis
Country:ChinaCandidate:X LiFull Text:PDF
GTID:2518306509465104Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Semi-supervised clustering is a combination of semi-supervised learning and clustering algorithms to improve the performance of algorithms by using existing prior information to guide clustering,which is widely used in many fields,such as biomedicine,image processing,Chinese information processing,etc.A prior information is mainly divided into a small amount of tag sample set and a pair of two types.At present,most of the semi-supervision clustering algorithms have a single use of a supervised information guidance cluster,which will cause some waste of prior information,this paper systematically studies how to use two kinds of prior information to guide the clustering algorithm,and expand the pairwise constraint information to improve the performance of the algorithm.The main works are as follows.(1)A semi-supervised clustering algorithm with attribute weights based on Seeds set and pairwise constraints is proposed.In this algorithm,firstly the labeled sample set is optimized,secondly the pairwise constraints are expanded by the optimized labeled sample set,thirdly the weight contribution rate of the attributes of the data set is found and added to the similarity measure,and fourthly the violation or satisfaction of the must-not-connect constraint information is used to guide the clustering process.The algorithm not only uses both kinds of prior information,but also optimizes the prior information.Finally,the effectiveness of the proposed algorithm is verified by comparing it with other algorithms in a real UCI dataset.(2)A neighborhood-based pairwise constraint expansion algorithm is proposed.In this algorithm,firstly,the pairwise constraint is used to construct a pairwise closure,then the shortest distance between two pairwise closures is defined and their sample indexes are located,and the sample point of the index is used as the core point to make a neighborhood judgment to another closure to merge different pairwise closures that satisfy the conditions at the same time,and finally all the merged closures are expanded with pairwise information.The algorithm effectively expands the pairwise constraints and can be used in different pairwise constrained semi-supervised clustering algorithms.Finally,the feasibility of the algorithm is demonstrated experimentally.(3)A MATLAB-based semi-supervised clustering system is designed and developed.The system has functions such as data set loading,algorithm selection,and visualization of comparison results,which provides convenience for more researchers.The research results of this paper further enrich the research of semisupervised clustering algorithm,which has better application value in the practical field and is believed to solve more practical problems in the future.
Keywords/Search Tags:Clustering, Semi-supervised clustering, Seed sets, Pairwise constraints, Transfer closures
PDF Full Text Request
Related items