Font Size: a A A

Research Of Heterogeneous Multi-task Learning And Efficiency Of Task Grouping

Posted on:2019-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:X B LiFull Text:PDF
GTID:2428330566488848Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Multi-task learning(MTL)is a promising area in machine learning,which aims to improve the performance of multiple related learning tasks by leveraging useful information among them.In the era of big data,there are many related learning tasks,but we often deal with them individually,ignoring the shared information between tasks.MTL is dedicated to solving the information sharing problem of related tasks.In this paper,we study two issues.First,most existing MTL methods assumed that the multiple tasks to be learned have the same feature representation.However,this assumption may not hold for many real-world applications.Second,different tasks will have different performance when put them together to train.Task grouping is a research hotspot in multi-task learning.CFSFDP clustering algorithm can be used to solve task grouping problems,but the algorithm is too complicated and difficult to apply.The main works are described as follow:Firstly,for heterogeneous multitask learning problems,an improved algorithm IMTNMF of MTNMF algorithm is proposed.The MTNMF algorithm assumes that multiple tasks have the same output space(i.e.,the class labels of different tasks are the same or most overlaps).The bipartite graph is constructed for each task to capture the relationship between the instances and classes.And,the relationships between multiple tasks are built through the class label layers.Then,the non-negative matrix factorization-based multitask(MTNMF)method is used to learn the common semantic feature space shared by multiple tasks with heterogeneous feature,and the unlabeled instances information is added in training.Finally,we model the heterogeneous MTL problem as a multitask multiview learning(MTMVL)problem.We propose the IMTNMF algorithm to directly decompose data feature matrices without the need to construct correlation matrix between features and class labels,avoiding the information loss.Secondly,for CFSFDP clustering algorithm with high complexity,we propose a parallel CFSFDP algorithm based on Spark.CFSFDP clustering algorithm is an innovative clustering algorithm proposed in recent years.It performs well in dealing with spherical distribution and irregular shape distribution and can remove noise.Finally,in this paper,the validity of the proposed method is proved by the three real-world datasets and we demonstrate the efficiency and effectiveness of the parallel CFSFDP algorithm by using four clustered datasets and calculating speedup,sizeup,and scaleup.
Keywords/Search Tags:Machine Learning, Transfer Learning, Multi-task Learning, Clustering Algorithm, Distributed Computing, Spark
PDF Full Text Request
Related items