Efficient Deep Ensemble Inference Via Base Model Scheduling

Posted on:2024-07-10

Degree:Master

Type:Thesis

Country:China

Candidate:Z C Li

Full Text:PDF

GTID:2568306932454814

Subject:Data science

Abstract/Summary:

PDF Full Text Request

The rapid advancement in deep learning has spawned widespread applications of deep ensemble learning,which amalgamates ensemble learning methods.These applications permeate various domains such as pattern recognition,autonomous driving,and search recommendation.Deep ensemble models augment the overall model accuracy and generalization performance by integrating outputs of multiple deep models.However,despite these accuracy enhancements,deep ensemble models bring about a significant increase in computational overhead.The supplementary computation and memory requirements during the inference process can cause excessive latency and high deadline miss rates,especially in resource-constrained scenarios.Such deficiencies are untenable in real-time business and other latency-sensitive tasks.As a result,the acceleration of deep ensemble models’ inference process to capitalize on their benefits in latency-sensitive tasks has become a salient research focus across academia and industry.This dissertation aims to address the redundancy inherent in the inference process of deep ensemble models.Our approach entails the selection and scheduling of base models.First,we introduce a novel sample difficulty measure grounded in model output distances,which is operationalized in real-time through lightweight neural networks,facilitating the differentiation of samples across varying difficulty levels.Subsequently,we design a dynamic programming-based base model selection and scheduling algorithm,considering both the difficulty levels and the inference queue.Ultimately,we develop and implement Schemble,a framework devised to accelerate the inference process of deep ensemble learning models.Schemble autonomously selects and schedules base models for real-time inquiries,and we thoroughly evaluate its performance under three real-time business scenarios.The salient contributions of this dissertation are as follows:(1)We propose a sample difficulty measure—Discrepancy Score—based on model output distances.By calculating the average distance between the outputs of all base models and the collective ensemble model,we ascertain the difficulty of the current sample.This method addresses the traditional difficulty measures’ shortcomings,such as substantial additional computational overhead and inadequate handling of heterogeneity and accuracy variance in deep ensemble models.Leveraging historical data,we design a lightweight neural network to evaluate the difficulty of incoming samples in real-time.This system forms the basis for subsequent base model selection and scheduling.(2)We introduce an online base model selection and scheduling algorithm based on dynamic programming.Upon arrival of an inquiry,the algorithm utilizes dynamic programming to devise an optimal base model selection and scheduling plan,taking into account the current system queue state and the inquiry’s difficulty level.This algorithm dramatically reduces the search space for plan execution via pruning operations.Our theoretical analysis validates the algorithm’s ability to yield near-optimal results for sub-problems at any given time and achieve a competitiveness ratio double the number of base models in online scheduling problems.The empirical results affirm that our dynamic programming-based algorithm can generate more accurate selection and scheduling plans than conventional scheduling algorithms.(3)Combining the aforementioned elements,we designed and implemented the comprehensive Schemble framework.We have fortified the overall framework’s usability by integrating a sample caching queue and a missing value imputation module.Upon implementing the Schemble code,we carried out simulation tests in three real-time business scenarios:text matching,vehicle counting,and image retrieval.Our comprehensive evaluations and comparisons utilized three different real datasets and three distinct deep ensemble models.The experimental results underscore Schemble’s effectiveness in reducing the deadline miss rate and inference latency of inquiries.For instance,in the text-matching task,Schemble slashed the deadline miss rate fivefold compared to the original ensemble model,while bolstering the inference accuracy by 30.8%.In the vehicle counting task,Schemble decreased latency by more than 50 times,all the while maintaining an impressive accuracy of 96.2%.

Keywords/Search Tags:

Ensemble learning, Deep learning, Efficient inference, Task scheduling

PDF Full Text Request

Related items

1	Research On GPU Resource Allocation Strategy For Deep Learning Inference Task
2	Design And Implementation Of Task Scheduling Subsystem In Distributed Deep Learning Inference System
3	Research And Implementation Of Execution Optimization System For Deep Learning Applications
4	Research On Efficient Cloud Task Scheduling Algorithm Based On Deep Reinforcement Learning
5	Research On Learning And Inference Methods In Deep Generative Models
6	Research On Cloud-edge Joint Task Inference And Model Collaborative Training In Edge Intelligence
7	Research On Cloud Task Scheduling Based On Deep Reinforcement Learning
8	Multi-level Selective Ensemble Learning Based On Multi-task Learning
9	Research On Deep Learning Task Scheduling Based On Small Scale GPU Cluster Platform
10	DNN Inference Business Scheduling System Based On Deep Reinforcement Learning