Font Size: a A A

Research On Multi-task Clustering Based On Multi Domain Data

Posted on:2023-04-14Degree:MasterType:Thesis
Country:ChinaCandidate:H WenFull Text:PDF
GTID:2558307073483094Subject:Computer Science and Technology
Abstract/Summary:
With the rapid advancement of information technology,a large number of unlabeled data are constantly produced.How to process the massive unlabeled data for scientific research and other fields that need high-quality labeled data needs to be explored.Clustering,an unsupervised machine learning method that groups data into isolate clusters according to implicit features,has captured much attention.Traditional single task clustering methods group the samples of each task into several clusters independently.However,the data of different tasks are correlated in real life.In this context,it is of great significance to cluster multiple tasks simultaneously by utilizing the knowledge of multiple related tasks.Therefore,this thesis focuses on three ideas of knowledge transfer to study the multitask clustering research on multi-domain data,that is transferring features,instances and model parameters.Based on the idea of transferring model parameters,a multitask clustering algorithm based on spectral clustering and linear regression model is proposed in this thesis.Specifically,it combines spectral clustering algorithm and linear regression model to cluster each single task and learn the model parameters of each cluster in each task.Then,based on the intuition that if two tasks are similar then the model parameters between them are similar,l2-norm is introduced to learn the correlation of each pair of clusters between each pair of tasks,and then the cluster correlation matrix of each pair of tasks is obtained.To solve this algorithm,an effective alternating optimization method is proposed to iteratively update the cluster indication matrix,the model parameter matrix and the cluster correlation matrix of each pair of tasks.Experiments on several real text datasets demonstrate the effectiveness of the algorithm.In terms of the disadvantages that existing multitask clustering algorithms separate the process of feature extraction and clustering and lack of semantic knowledge,this thesis turns its attention to the research of multitask image clustering.A deep multitask clustering model that can transfer knowledge of features,instances and model parameters is proposed.Firstly,the model utilizes convolutional neural network to extract features for the samples of all tasks,and mines the similar pairs of instances according to the obtained clustering indication matrix.In order to make the mined features and similar pairs more stable and consistent,a fully connected network is designed to process the shallow information,then the distribution between the information learned by fully connected network and the deep information extracted by convolution neural network is restricted to be similar.Finally,the model parameters trained on all tasks are used as the initial parameters of each single task network,thus each task will be trained independently to obtain the final clustering results.Experimental results on multiple multitask image datasets show the effectiveness of the model.In addition to the algorithm research,this thesis designs a clustering visual demonstration system based on the proposed two models.The system realizes the clustering of each task and the visualization of clustering results,further feeds back to users.
Keywords/Search Tags:multitask clustering, task relationship learning, knowledge transfer, convolutional neural network, deep multitask clustering
Related items