Font Size: a A A

Efficient Deep Ensemble Inference Via Base Model Scheduling

Posted on:2024-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:Z C LiFull Text:PDF
GTID:2568306932454814Subject:Data science
Abstract/Summary:PDF Full Text Request
The rapid advancement in deep learning has spawned widespread applications of deep ensemble learning,which amalgamates ensemble learning methods.These applications permeate various domains such as pattern recognition,autonomous driving,and search recommendation.Deep ensemble models augment the overall model accuracy and generalization performance by integrating outputs of multiple deep models.However,despite these accuracy enhancements,deep ensemble models bring about a significant increase in computational overhead.The supplementary computation and memory requirements during the inference process can cause excessive latency and high deadline miss rates,especially in resource-constrained scenarios.Such deficiencies are untenable in real-time business and other latency-sensitive tasks.As a result,the acceleration of deep ensemble models’ inference process to capitalize on their benefits in latency-sensitive tasks has become a salient research focus across academia and industry.This dissertation aims to address the redundancy inherent in the inference process of deep ensemble models.Our approach entails the selection and scheduling of base models.First,we introduce a novel sample difficulty measure grounded in model output distances,which is operationalized in real-time through lightweight neural networks,facilitating the differentiation of samples across varying difficulty levels.Subsequently,we design a dynamic programming-based base model selection and scheduling algorithm,considering both the difficulty levels and the inference queue.Ultimately,we develop and implement Schemble,a framework devised to accelerate the inference process of deep ensemble learning models.Schemble autonomously selects and schedules base models for real-time inquiries,and we thoroughly evaluate its performance under three real-time business scenarios.The salient contributions of this dissertation are as follows:(1)We propose a sample difficulty measure—Discrepancy Score—based on model output distances.By calculating the average distance between the outputs of all base models and the collective ensemble model,we ascertain the difficulty of the current sample.This method addresses the traditional difficulty measures’ shortcomings,such as substantial additional computational overhead and inadequate handling of heterogeneity and accuracy variance in deep ensemble models.Leveraging historical data,we design a lightweight neural network to evaluate the difficulty of incoming samples in real-time.This system forms the basis for subsequent base model selection and scheduling.(2)We introduce an online base model selection and scheduling algorithm based on dynamic programming.Upon arrival of an inquiry,the algorithm utilizes dynamic programming to devise an optimal base model selection and scheduling plan,taking into account the current system queue state and the inquiry’s difficulty level.This algorithm dramatically reduces the search space for plan execution via pruning operations.Our theoretical analysis validates the algorithm’s ability to yield near-optimal results for sub-problems at any given time and achieve a competitiveness ratio double the number of base models in online scheduling problems.The empirical results affirm that our dynamic programming-based algorithm can generate more accurate selection and scheduling plans than conventional scheduling algorithms.(3)Combining the aforementioned elements,we designed and implemented the comprehensive Schemble framework.We have fortified the overall framework’s usability by integrating a sample caching queue and a missing value imputation module.Upon implementing the Schemble code,we carried out simulation tests in three real-time business scenarios:text matching,vehicle counting,and image retrieval.Our comprehensive evaluations and comparisons utilized three different real datasets and three distinct deep ensemble models.The experimental results underscore Schemble’s effectiveness in reducing the deadline miss rate and inference latency of inquiries.For instance,in the text-matching task,Schemble slashed the deadline miss rate fivefold compared to the original ensemble model,while bolstering the inference accuracy by 30.8%.In the vehicle counting task,Schemble decreased latency by more than 50 times,all the while maintaining an impressive accuracy of 96.2%.
Keywords/Search Tags:Ensemble learning, Deep learning, Efficient inference, Task scheduling
PDF Full Text Request
Related items