In recent years,the artificial intelligence technology based on deep learning has been widely used in industrial intelligent manufacturing,medical image analysis,intelligent grid and smart city,bringing profound changes to traditional industries.However,when deep learning technology is applied to practical scenarios,especially in complex open environments with multiple scenes,difficult tasks,and high requirements for crossscenario adaptability,there are still many problems and challenges in the deployment of deep learning models:1.Flexible combination and reuse of multiple models for different task scenarios cannot be realized,leading to poor maintainability and scalability of the system;2.There is a lack of efficient and convenient model quantization schemes,and most hardware platforms only provide the simplest low-bit quantization methods,with limited optimization space,resulting in serious loss of accuracy after quantization;3.Existing AI service platforms mainly support the training stage of neural network models,and support is weak in the model lightweighting and service stage.To address these issues and challenges,this paper starts from the application scenario designing and builds a lightweight model service platform for open environments to support the lightweighting,service,and platformization of deep learning technology for practical applications.The main research contents are as follows:1.Starting from model service deployment,this paper investigates more efficient model service generation methods and proposes a model service workflow arrangement and deployment method based on Directed Acyclic Graph(DAG)structure.By modularizing components such as model training,preprocessing,and postprocessing,the model service workflow is combined in a serial or parallel manner to achieve rapid arrangement and reassembly of models in complex task scenarios,improving the scalability and maintainability of multiple model and multiple processing component combinations in open environments.2.Starting from quantization critical nodes and optimizable space,this paper investigates and implements a lightweight end-to-end neural network model quantization technology.Before quantization,the parameter distribution is adjusted across layers using the inverse ratio decomposition,making the model more suitable for quantization;during quantization,the weights and quantization parameters are optimized jointly layer by layer,improving the accuracy of the model after quantization;after quantization,a layer-wise error analysis-based operator scheduling algorithm is proposed based on the combination of hardware platform operator fusion strategies,achieving quantization acceleration through the combination of software and hardware.Compared to the original IN8 quantization implementation,the designed quantization algorithm can improve accuracy of about 2%on average,providing high-availability model quantization services for resource-constrained edge platforms.3.Finally,starting from the model platform service capability,this paper designs and implements a lightweight model service platform for open environments.Based on the above two research topics,this paper combines container technology and microservice architecture to build an end-to-end lightweight model service platform that provides services such as model quantization,model service workflow arrangement,and model deployment.The aim is to provide technical support for the rapid deployment of deep learning models in open environments with multiple scenarios and tasks. |