Font Size: a A A

Video Facial Expression Recognition Based On Deeply Compressed Spatiotemporal Model On Edge Devices

Posted on:2021-04-06Degree:MasterType:Thesis
Country:ChinaCandidate:B LiuFull Text:PDF
GTID:2428330611999324Subject:Integrated circuit engineering
Abstract/Summary:PDF Full Text Request
Recently,deep learning has attained remarkable achievements in the field of computer vision,speech processing,and natural language processing.Facial expression recognition(FER)is one of the most important applications in computer vision,which can be applied in many fields such as human-computer interaction,polygraph,and fatigue testing.Video facial expression recognition can be taken as FER with time-series data.It improves the accuracy of FER,but requires complicated neural networks to effectively capture essential features from video data.Therefore,the complicated model with redundant parameters contributes to a high demand for computation and storage resources.In this case,cloud computing can satisfy the requirements but usually with a high expense,while edge computing is a viable method on this regard.However,edge devices usually suffer from limited computation and storage resources which seriously hinders them from deep neural network applications.As a result,optimizations are crucial for deep learning-based applications on edge devices,especially for video data-based FER.In this dissertation,we propose a deeply compressed spatio-temporal model for FER with video data input,which is more applicable on edge devices compared to other deep learning-based methods with the developed optimization techniques.The whole system consists of two basic parts: a convolutional neural network-based feature extractor and a long short-term memory-based expression classifier.The feature extractor applies general matrix multiplication library and Winograd algorithm for an acceleration of inference,and the expression classifier employs tensorization as well as tensor train decomposition for compression to obtain further acceleration.In experiments on datasets of the Extended Cohn-Kanade dataset,Acted Facial Expression in Wild dataset 7.0 and MMI dataset,our proposed method achieves 97.96%,55.60%,and 97.33% classification accuracy separately,where the compression rate of tensorized spatiotemporal model achieves nearly 8.4%.Besides,it is deployed on various mobile platforms with specific computing units including ARM CPU,neural process unit,and Huawei Da-Vinci AI core.And in our evaluations of performance,our proposed framework attains high acceleration ratios up to 1.20,2.70 and 7.92 separately.
Keywords/Search Tags:deep learning, edge computing, video facial expression recognition, tensor compression, spatio-temporal network model
PDF Full Text Request
Related items