Font Size: a A A

Multi-task Clustering Algorithm By Partition Matrix

Posted on:2021-10-18Degree:MasterType:Thesis
Country:ChinaCandidate:W W ZhuFull Text:PDF
GTID:2518306293980499Subject:Statistics
Abstract/Summary:PDF Full Text Request
Clustering has a long history and wide applications in the machine learning.It aims to partition data points into groups,so that the data points in the same group are relatively similar,while the data points in different groups are relatively dissimilar.However,most methods are limited to a single task in current machine learning.In a single task,where independent identical distribution assumption of the data samples holds,we refer them as single-task clustering.But sometimes the data samples in a single task are very limited,which is not enough to discover a good cluster structure.However,simply combining single-task datasets together followed with traditional single-task clustering approach does not necessarily lead to performance improvement,because their distributions are different,which violates the independent identical distribution assumption in single-task clustering.To address this problem,the application scenario of clustering is extended from single-task learning to multi-task learning,namely multi-task clustering.Multi-task clustering can utilize the relation of different tasks and transfer relevant knowledge across the related tasks to improve the clustering performance of each task.The research content of this paper mainly aims at the clustering of multi-task datasets.Therefore,the correlation between tasks needs to be discovered and utilized to improve the clustering performance.We find some problems and shortcomings through the in-depth study of a large number of multi-task clustering algorithms.Therefore,this paper combined the advantages of multi-task learning with the classical single-task clustering method to achieve multi-task clustering.Two kinds of multi-task clustering algorithms are proposed based on the classical LSSMTC algorithm: shared subspace multi-task clustering algorithm of partition matrix summation constraint,shared subspace multi-task clustering algorithm of self-paced partition matrix summation constraint.(1)This paper is an improvement on the basis of the LSSMTC algorithm.Considering that the constraint on partition matrix in LSSMTC algorithm is onlynon-negative,it does not accord with the physical significance of partition matrix clustering.Therefore,this paper improves the non-negative constraints into sum constraints to emphasize the physical significance of clustering partition matrix and improves the performance of clustering.Shared subspace multi-task clustering algorithm of partition matrix summation constraint is proposed,and a new optimization method is used to optimize the partition matrix.Experiments on several cross-domain text data sets demonstrate that the improved summation constraint can effective improve the clustering performance of algorithm.(2)Although the shared subspace multi-task clustering algorithm of partition matrix summation constraint achieves a good clustering performance,we propose a shared subspace multi-task clustering algorithm of self-paced partition matrix summation constraint,in order to solve the non-convexity problem of this algorithm under unsupervised conditions.The PSSMTC algorithm is based on the simultaneous execution of within-task clustering and cross-task clustering,a self-paced learning framework from easy to difficult training samples was added to optimize the model.The experimental results show that that we improve the clustering performance to some extent while solving the non-convex problem.
Keywords/Search Tags:Multi-task learning, Clustering, Partition matrix, Shared subspace, Summation constraint, Self-paced learning
PDF Full Text Request
Related items