Research Of Heterogeneous Multi-task Learning And Efficiency Of Task Grouping

Posted on:2019-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:X B Li

Full Text:PDF

GTID:2428330566488848

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Multi-task learning(MTL)is a promising area in machine learning,which aims to improve the performance of multiple related learning tasks by leveraging useful information among them.In the era of big data,there are many related learning tasks,but we often deal with them individually,ignoring the shared information between tasks.MTL is dedicated to solving the information sharing problem of related tasks.In this paper,we study two issues.First,most existing MTL methods assumed that the multiple tasks to be learned have the same feature representation.However,this assumption may not hold for many real-world applications.Second,different tasks will have different performance when put them together to train.Task grouping is a research hotspot in multi-task learning.CFSFDP clustering algorithm can be used to solve task grouping problems,but the algorithm is too complicated and difficult to apply.The main works are described as follow:Firstly,for heterogeneous multitask learning problems,an improved algorithm IMTNMF of MTNMF algorithm is proposed.The MTNMF algorithm assumes that multiple tasks have the same output space(i.e.,the class labels of different tasks are the same or most overlaps).The bipartite graph is constructed for each task to capture the relationship between the instances and classes.And,the relationships between multiple tasks are built through the class label layers.Then,the non-negative matrix factorization-based multitask(MTNMF)method is used to learn the common semantic feature space shared by multiple tasks with heterogeneous feature,and the unlabeled instances information is added in training.Finally,we model the heterogeneous MTL problem as a multitask multiview learning(MTMVL)problem.We propose the IMTNMF algorithm to directly decompose data feature matrices without the need to construct correlation matrix between features and class labels,avoiding the information loss.Secondly,for CFSFDP clustering algorithm with high complexity,we propose a parallel CFSFDP algorithm based on Spark.CFSFDP clustering algorithm is an innovative clustering algorithm proposed in recent years.It performs well in dealing with spherical distribution and irregular shape distribution and can remove noise.Finally,in this paper,the validity of the proposed method is proved by the three real-world datasets and we demonstrate the efficiency and effectiveness of the parallel CFSFDP algorithm by using four clustered datasets and calculating speedup,sizeup,and scaleup.

Keywords/Search Tags:

Machine Learning, Transfer Learning, Multi-task Learning, Clustering Algorithm, Distributed Computing, Spark

PDF Full Text Request

Related items

1	Multi-task Learning Based Classification Algorithm For Balanced Image Data And Unbalanced Temporal Data
2	Research And Implementation Of Distributed Machine Learning Algorithms Orchestration System For Big Data Processing
3	Research On Semi-Supervised Clustering Based On Transfer Learning And It’s Parallel Implementation
4	Research On Distributed Manifold Learning Algorithm Based On Spark
5	Research On Multi-task Learning
6	Research And Implementation Of Spark Application Performance Prediction Model Based On Machine Learning
7	Research And Implementation Of Distributed Machine Learning Platform Based On Spark And Pu-learning
8	Research On Deep Learning-Based Representation Learning Algorithms
9	Research On Yes/no Question Answering Based On Transfer Learning
10	Research And Implementation Of Distributed Machine Learning Platform On Spark