Font Size: a A A

Research On Key Technologies Of Deep Neural Networks For Object Recognition And Object Detection

Posted on:2019-10-25Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y LiFull Text:PDF
GTID:1368330551950043Subject:Electronic Science and Technology
Abstract/Summary:PDF Full Text Request
Recently,the field of artificial intelligence has once again come into the views of researchers,drawing the attention of academia and industry.It has been the new focus of attention in the current international competitions.Among the applications of artificial intelligence,object recognition and detection are one of the key technologies and hot research topics in computer vision field.The current framework of object detection usually decomposes the original detection problem into two sub-problems,object localization and recognition.For the latter,machine learning systems trained on image features are usually adopted to identify objects in images.Consequently,the design of robust and expressive features is an important research field.Naturally,handing a good feature extractor has become a key common technology in various computer vision fields.With the great breakthrough of the supervised training of deep convolutional neural networks(cnn)in 2012,the landscape of computer vision has been drastically changed and pushed from the hand-engineered features towards the representation learning via convolutional neural networks.Since that,convolutional neural networks have become the commonly accepted methods for object recognition and object detection.However,some new challenges are also introduced.The most representative challenges are designing more powerful network architectures,developing more effective optimization techniques,and encoding transformation-invariant features.Seeing the above challenges,this thesis chooses to conduct the study of representation learning via deep convolutional neural networks.Specifically,the thesis focuses on the aspects of designing new network modules,weight initialization techniques,and transformation-invariant features.The main contributions of the thesis are as follows:1)a novel parametric exponential linear units to bridge the gap of representation space between the rectified and exponential linear units.Currently,the non-linear units can be categorized as two classes,piecewise linear units and piecewise exponential units.The former can only describe the linear relationship for its negative inputs,whereas the latter is just the opposite.This difference indicates a gap of representation space between them,which limits the ability of non-linear modeling.To bridge the gap,we propose a parametric exponential linear units.Two learnable parameters are adopted to control the value to and at which the non-linear units saturate respectively.By optimizing the shape of the function,the linear and non-linear space for the negative parts can be covered in a single model.We evaluated the proposed method on several datasets and observed the improvement of the generalization capability.In addition to image recognition and object detection,the proposed method can also be generalized to other vision tasks.2)a technique of weight initialization for convolutional neural networks,reducing the limit of current initialization methods to the type of non-linear units.Currently,the theory of initialization technique is limited by the type of non-linear units.It is very difficult for the networks with high-order non-linear units to obtain an analytic solution for the initialization of weights.Seeing this,we propose a novel Taylor initialization that can work with a wider range of types of non-linear units.The method achieved this by removing the high-order terms in non-linear units and approximating the original function by its Taylor polynomial of degree 1.Whereas the method will lead to residuals,the theoretical analysis and experiments tell that the residuals are negligible at the initialization stage.Our method allows more selections for the design of network architecture,and also provides some reference for the further expansion of the weight initialization theory.3)a framework of learning transformation-invariant representations for convolutional neural networks.For handling large-scale transformations,the commonly adopted methods are data augmentation and the transformation-invariant pooling.The former provides good generalization capability to unfamiliar variations in a data-driven manner but requires longer training time and more parameters.The latter is a more parameter-efficient learning-based framework that can produce better performance for familiar variations with smaller number of parameters but takes more risk of overfitting to the predefined fixed transformations.Interestingly,the combination of the two methods allows the best of both worlds,which motivates our work.The proposed method formulates transformation-invariant representations by applying multiple random transformations to its inputs but keeps only one output according to the given dropout policy.We evaluated the proposed method on several transformed datasets and demonstrated its effectiveness.The method is mainly designed for the transformation-invariance but provides an alternative method for researchers to deal with visual recognition problems.
Keywords/Search Tags:deep learning, convolutional neural network, image recognition, object detection, representation learning
PDF Full Text Request
Related items