| With the rapid development of cloud computing,intelligent mobile and Io T technology,the volume of business data in various application areas is exploding.The recently emerged deep neural networks have made significant achievements in various fields such as image,speech,video,and natural language processing,and have become the core technology for big data analysis.However,the price of achieving excellent performance of deep neural networks is the expansion of the network scale,which is manifested by the massive amount of network parameters as well as training data.The large networks put higher demands on computational resources and also leads to high data labeling costs.At the same time,due to the intense competition pressure and fast response demand,most applications have high requirements for efficient utilization of data resources and efficient deep learning processing technology with high throughput and high real-time performance.Therefore,the research of high precision and high performance deep learning processing technology for industry applications has become a popular research topic in academia and industry.Data,models,and hardware are the three core elements that affect the processing performance of deep learning technologies.This paper studies efficient deep learning processing techniques and applications from three directions: data-efficient learning,efficient algorithm,and efficient architecture design.The data-efficient learning techniques aim at training deep neural networks with limited number of samples to improve the data utilization.The efficient deep learning algorithms aim to reduce the required computational resources with light weight models and workload partitioning strategies.Efficient deep learning architectures aim to design specialized architectures to accelerate the computation of neural networks,and address the problems of high latency and energy cost of general-purpose architectures such as CPUs or GPUs.The research content and main contributions are as follows.1.A few-shot point cloud learning method based on cascaded neural networks is proposed to address the difficulty of training models when the labeled samples of point cloud data are insufficient.The method is able to learn the topological features in irregular point cloud data based on graph neural networks,and learn the relationship between different classes based on a small number of point cloud samples,so as to discriminate the new classes by relational reasoning.Meanwhile,a discriminative edge label is proposed to model the channel-wise similarity of point cloud object features to assist relational inference.In addition,a few-shot circle loss function is proposed to enable maximizing the difference between point cloud features of different classes.2.An efficient time series prediction method based on dilated convolution is proposed to address the problem of difficulty in meeting both high computational speed and accuracy in multivariate time series analysis.The method proposes a position-aware dilated convolution,which is used to perform convolutional operations on the elements with hop factors so as to model the periodic patterns and predict the scale changes in the time series using an autoregressive model.Meanwhile,a multi-span spatio-temporal feature aggregation scheme is proposed to enable convolutional neural networks to learn the location information.In addition,unlike the traditional recurrent neural network-based time series prediction methods,the pure convolutional operation used in this method has no sequential dependence on the computational pattern and can efficiently utilize parallel computing units to speedup the computations.3.An efficient edge-cloud collaborative framework for deep dilated convolutional neural networks is proposed to increase the system throughput and reduce the computational latency for click-through rate prediction in large scale recommendation system.The method is able to analyze users’ long-and short-term interests based on their behaviors through convolutional neural networks,and make both accurate and fast CTR prediction.At the same time,an interest supervision loss function is proposed to learn better user interest representations during training by distinguishing whether the next target is a positive sample or a randomly selected noisy negative sample.In addition,a novel edge-cloud collaboration strategy is proposed to compute users’ long-term interests offline and cache them in the cloud,and use edge devices to analyze short-term user behavior online and combine the cached long-term interests to recommend items and reduce the online recommendation service latency.4.To reduce the amount of redundant computation and communication in the computation of vision self-attention operation in traditional computing platforms,a differential selfattention based vision Transformer algorithm and architecture co-design are proposed.A differential self-attention technique is proposed to reuse the features of adjacent patches and thus reduce the computational cost.In addition,the approach designs a specialized architecture that can dynamically reduce redundant computation and communication using a highly paralleled differential self-attention engine,and improve the speedup ratio and energy consumption ratio during vision Transformer inference with minimum hardware resources.This paper has achieved excellent research results in the field of efficient deep learning processing technology and applications,which has important research and application value and fully complements the frontier exploration research in this field. |