Research On Extreme Multi-label Classification Based On Parallel Label Trees

Posted on:2021-02-10

Degree:Master

Type:Thesis

Country:China

Candidate:J Q Lu

Full Text:PDF

GTID:2428330620968130

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Extreme multi-label classification is an emerging research direction in the field of machine learning.It is often used in practical applications such as recommendation systems and text classification.These applications usually have a large volume of data and high requirements for the training speed of the model.Therefore,accelerating the training speed of the model has become the main research direction of extreme multi-label classification.This article also focuses on accelerating the training speed of the model as the main research content.The label tree has become the main solution for extreme multi-label classification due to its good interpretability and fast training speed.In this paper,the label tree is used as the research object.In order to solve the problem that the label tree is difficult to parallelize,a two-stage parallel method at the thread level is proposed by exploring the data independence among nodes and the exponential relationship between the depths of nodes and training time.The nodes' splitting process on the same layer and the balanced k-means algorithm of a single node are accelerated in parallel,reducing the training time from 27 hours to 1 hour.At the same time,the method can run on a single machine to minimize the hardware cost.In extreme multi-label classification,a large amount of training data needs to learn enormous parameters.At this time,insufficient memory on a single machine has become a major obstacle to limiting model training.In this paper,firstly,from the perspective of the algorithm,the parameter matrix of the model is divided into several small parameter matrices.Then,this article uses MPI to distribute them to different nodes for distributed training of parameters,and optimizes the distribution and reception of MPI to maximize overlapping model training and inter-process communication.Experiments show that the model's prediction precision rate is 81.30%,which is basically the same as other models,but the speedup is as high as 23.While maintaining the model's prediction precision rate,the model's training time is greatly reduced.

Keywords/Search Tags:

Extreme Multi-label Classification, Machine Learning, Label Tree, Parallel, MPI

PDF Full Text Request

Related items

1	Research On Label Coding Algorithms For Multi-label Classification
2	Multi-label Learning Based On Label Weight And Weighted Kernel Extreme Learning Machine
3	Research On Multi-label Data Classification Based On Extreme Learning Machine
4	Research On Multi-label Data Stream Classification Method Based On Kernel Extreme Learning Machine
5	Imbalanced Multi-label Learning Algorithm Based On Density Label Space
6	Study Of Multi-label Class Imbalance Classification Based On Extreme Learning Machine
7	Research On Multi-label Classification Algorithms Based On Samples And Property Analysis
8	Research Of Calibrated Label Ranking Multi-label Algorithm Based On Spark
9	Research On Multi-label Classification Algorithm Based On Label Relationship
10	Exploiting Label Relationships In Multi-label Classification