Font Size: a A A

Research On Key Technologies Of Edge Intelligent-oriented Human Behavior Recognition

Posted on:2022-03-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:C DaiFull Text:PDF
GTID:1488306524470914Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Video human behavior recognition technology based on edge intelligence has been a hot topic in computer vision research,which will directly promote the development of multimedia Io T and industrial Io T applications.The main idea of video human behavior recognition technology based on edge intelligence is to build a lightweight video behavior recognition deep network which can be deployed on edge devices to support edge multimedia Io T application systems.It will lay a technical foundation for real-time video behavior recognition applications on the edge intelligence system.In fact,the existing deep learning-based video behavior recognition model research mainly put emphasis on building hybrid deep network based on spatial and temporal features so as to improve the recognition accuracy.However,the non-convex parameter optimization mechanism which generates a large number of parameters during the parameter training process.The amount of parameters far exceeds the computing and storage capabilities of the edge device,which brings great challenges when considering the deployment on the edge device.To this end,we focus on two aspects of these problems.On the one hand,we study more targeted networks to encode video long-term temporal features to improve the performance of the overall network model;on the other hand,we have been conducted research on the problems of excessive memory resource consumption,realized deep model compression,and intend to lay the research foundation for the deployment of deep learning models on edge devices.The main content and innovations of this dissertation are as follows:1.Considering the difficulty of learning long-term temporal features encoding,this thesis proposes an end-to-end two-stream attention based LSTM network.It can selectively focus on the effective features for the original input images and pay different levels of attentions to the outputs of each deep feature maps.Moreover,considering the correlation between two deep feature streams,a deep feature correlation layer is proposed to adjust the deep learning network parameter based on the correlation judgement.The experimental results show that the improved deep learning model proposed in this dissertation can effectively extract long-term features,and has more competitive results than other algorithms of the same type.2.In this dissertation,a two-stream deep learning behavior recognition network is proposed for human skeleton feature learning,where the extracted features are represented as fake images to participate in training to improve the accuracy of recognition.Meanwhile,in the feature fusion stage,a global average pooling strategy is proposed,which can effectively avoid the isolation of temporal and spatial features in the traditional late fusion strategy,and improve the accuracy of recognition to a certain extent.On the other hand,considering the deep network deployment challenge for a large amount of parameters,a Tucker decomposition based knowledge distillation algorithm is proposed to effectively improve the learning ability of the student network.The experimental results show that our proposal has a better recognition rate combined with tradition algorithms.Besides that,the knowledge distillation algorithm of the Tucker diversity of the teacher model can improve the performance of student network.3.In this dissertation,a lightweight network model based on the improved FasterRCNN is proposed for video human behavior scene segmentation.On the one hand,the improved Faster-RCNN algorithm is used to identify and locate the background content of the video frame,and accurately extract the content of the video background area;on the other hand,an improved image similarity measurement method is proposed so as to determine whether the video frames belong to the same segment.Besides that,in order to further reduce the network parameters of the deep learning model,a naive Bayesian inference algorithm is used to optimize the proportion of the channel pruning.The experimental results show that the proposed algorithm has better performance than the scene segmentation algorithm based on fixed boundary even the parameter reduce by 30%.4.To build a lightweight deep learning model which can describe human behavior in videos,this dissertation proposes a hierarchical and multi-modal architecture for video based human behavior understanding,which uses deep learning model to encode the spatial and temporal information from video,and the language description deduce the context with deep reinforcement learning algorithm.Meanwhile,to enhance training efficiency,we propose a general and efficient computation way which uses Tensor-Train decomposition to factorize the input-to-hidden inference weight matrix.Besides that,an adaptive genetic algorithm is proposed to automatically search the suitable rank for tensor decomposition.
Keywords/Search Tags:Deep model compression, Human behavior recognition, Knowledge distillation, Channel pruning, Tensor decomposition
PDF Full Text Request
Related items