Font Size: a A A

Research On Extreme Multi-label Classification Based On Parallel Label Trees

Posted on:2021-02-10Degree:MasterType:Thesis
Country:ChinaCandidate:J Q LuFull Text:PDF
GTID:2428330620968130Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Extreme multi-label classification is an emerging research direction in the field of machine learning.It is often used in practical applications such as recommendation systems and text classification.These applications usually have a large volume of data and high requirements for the training speed of the model.Therefore,accelerating the training speed of the model has become the main research direction of extreme multi-label classification.This article also focuses on accelerating the training speed of the model as the main research content.The label tree has become the main solution for extreme multi-label classification due to its good interpretability and fast training speed.In this paper,the label tree is used as the research object.In order to solve the problem that the label tree is difficult to parallelize,a two-stage parallel method at the thread level is proposed by exploring the data independence among nodes and the exponential relationship between the depths of nodes and training time.The nodes' splitting process on the same layer and the balanced k-means algorithm of a single node are accelerated in parallel,reducing the training time from 27 hours to 1 hour.At the same time,the method can run on a single machine to minimize the hardware cost.In extreme multi-label classification,a large amount of training data needs to learn enormous parameters.At this time,insufficient memory on a single machine has become a major obstacle to limiting model training.In this paper,firstly,from the perspective of the algorithm,the parameter matrix of the model is divided into several small parameter matrices.Then,this article uses MPI to distribute them to different nodes for distributed training of parameters,and optimizes the distribution and reception of MPI to maximize overlapping model training and inter-process communication.Experiments show that the model's prediction precision rate is 81.30%,which is basically the same as other models,but the speedup is as high as 23.While maintaining the model's prediction precision rate,the model's training time is greatly reduced.
Keywords/Search Tags:Extreme Multi-label Classification, Machine Learning, Label Tree, Parallel, MPI
PDF Full Text Request
Related items