Font Size: a A A

Research On Key Problems And Implementation Technology Of Image Understanding Based On Deep Learning

Posted on:2019-03-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J Y ZhangFull Text:PDF
GTID:1368330611493003Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Nowadays,computers and information technology,represented by mobile internet,internet of things and smart mobile devices,have undergone tremendous changes in people's production and lifestyle in just a few years,humanity has entered the era of big data and intelligence.With the popularity of electronic devices such as cameras,mobile phones,cameras,etc.,video and images are increasingly becoming an important source of big data.How to carry out intelligent analysis of massive image data is a major challenge in current computer vision research.It is the ultimate goal of computer-based image understanding technology to analyze and understand images in the real world like human beings.Through the image understanding technology,the rich semantic information contained in the image can be obtained,and various visual tasks such as target classification,object detection,instance segmentation,relationship reasoning,image description and image retrieval are completed.With the advent of deep learning technology,it has had an important impact in many fields such as computer vision,natural language processing,text translation,human-computer interaction,and autopilot and so on,which is considered to be the most promising solution to the "semantic gap" in the current big data era,and has received strong attention from industry and academia.In this paper,based on the application of deep learning technology in image understanding research and the efficient implementation of edge-oriented devices,this paper proposes a variety of network architectures for different image understanding tasks based on the shortcomings of existing model structure design.In the implementation of the algorithm,in view of the difficulty that the current embedded microprocessor is difficult to run large-scale deep neural network efficiently,combined with a domestic multi-core vector processor oriented to the edge device,a variety of efficient implementation schemes are proposed,and makes a useful attempt to realize embedded artificial intelligence.The specific work of this paper is summarized as follows:1.A single-stage multi-objective detection and recognition model based on deep learning is proposed.The model uses a large-scale convolutional neural network with good transfer learning ability as the backbone network.Firstly,different scale output feature maps are generated at different stages of the backbone network.In order to fuse the detection information on different scale feature maps,the output feature map with high semantic information and the bottom output feature maps are merged by transposition convolution to effectively learn the hierarchical structure features of the image.Inspired by the human visual receptive field,we constructed an Inc?mod module that fuses different visual receptive fields.The module fuses the output feature maps with different receptive field information by using different scale convolution kernels and hole convolution.And introduced a shotcut connection to alleviate the gradient disappearance of the model.By introducing two parameters into the class loss function,the model loss function is weighted from the aspects of category imbalance and class probability,respectively,which improves the accuracy of target detection.The experimental results on multiple datasets demonstrate the effectiveness of the proposed model.Compared with the two-stage and single-stage classical target detection networks,the model achieves a good trade-off between detection accuracy and running speed.2.A joint multitasking image semantic understanding model based on deep learning is proposed.In order to overcome the shortcomings of the current neural network model,which can only accomplish the task of single image comprehension,this method is based on the deep residual network as the backbone network of feature extraction,and the characteristic pyramid with rich semantic information is constructed by the reduction and sampling operation of different size feature maps.Then,by setting different aspect ratio,anchor size,moving stride and other parameters to generate a certain number of candidate anchor on different input feature maps,and using the region selection network to produce the anchor box probability value and the anchor regression value which is equal to the number of candidate anchors on the corresponding input feature maps,to fine tune the candidate anchor,then using the bilinear interpolation of the region of interest to generate a fixed size of the subsequent input feature maps,and build a target detection and classification module,target instance segmentation module and human posture estimation module.Finally,based on the above-proposed modules,a deep learning model of joint multi-task is constructed,and the model learning is carried out through supervised fine-tuning.Experiments on challenging image datasets and generalization datasets demonstrate that the proposed model can achieve comparable or even better performance on multiple image comprehension tasks than a single task model.3.An image caption generation model based on multiple attention mechanism is proposed.The focus of human attention on the image is introduced into the area of image caption,and a number of attention modules are constructed.Firstly,an attention module based on image feature coding is constructed to generate the weight of each feature map in the channel direction,and the importance of feature channels is explicitly modeled.Then a spatial attention module is constructed to focus on the specific region of the image feature extraction module on the output feature map during the decoding phase.Then a text attention module is constructed to pay attention to the correlation between the generated statements in the decoding stage,and the contribution of the three attention modules to the final model is evaluated by means of ablation experiments.Finally,based on the above three attention modules,a complete multi-attention model is constructed and learned through supervised training.Experimental results on multiple classical datasets show that the proposed model is good at modeling the relationship between various objects in the image and the correlation between the object and the corresponding text,and obtains good experimental results.4.A series of deep neural network optimization algorithms for edge computing are proposed,and a low-level algorithm library and large-scale real-time target recognition system based on the embedded development platform are constructed.Firstly,the parallelism of fully connected neural networks,recurrent neural networks and convolutional neural networks are deeply analyzed.Then an efficient deep neural network mapping algorithm is designed and implemented for the edge processor's architecture and software programming framework,a layout scheme is proposed in which the input feature map is placed in DDR,and the reordered convolution kernel matrix is placed in the core memory.Aiming at the multi-dimensional matrix convolution calculation,multi-dimensional pooling calculation and local response normalization existing in the deep convolutional neural network model,the corresponding efficient vectorization mapping schemes are designed respectively,so that the utilization rate of MAC components in the kernel loop reaches 100%.An efficient multi-core task-partitioning scheme is designed for multi-branch convolution computation,in order to verify the actual performance of the model,a large-scale real-time target recognition system based on the microprocessor was constructed.Based on our algorithm optimization scheme,we implemented five commonly used deep neural network models such as AlexNet,VGG16/19,GoogLeNet,and ResNet on this platform,and carry on the statistic and analysis to their realization performance.Experimental results with CPU and GPU show that our algorithm implementation achieves better computational efficiency.
Keywords/Search Tags:Image understanding, Deep neural network, Edge computing, Multi-core acceleration, Vectorization
PDF Full Text Request
Related items