Font Size: a A A

Research On Some Key Technologies Of Deep Learning In The Field Of Computer Vision

Posted on:2018-05-01Degree:DoctorType:Dissertation
Country:ChinaCandidate:S C PangFull Text:PDF
GTID:1318330542452725Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years,computer vision has obtained a rapid attention and development,which also represents the most advanced research direction of deep learning.For example,from the earliest Image Net Large Scale Visual Recognition Competition(ILSVRC)to the recent battles of humans versus Alpha Go at Go,they make us see the strong vitality and potential that computer contains in computer vision field.Among them,the error rate of the image classification algorithms for ILSVRC has been much smaller than the recognition error rate from human eyes,and Alpha Go program has successively defeated Go grandmasters Lee Sedol and Ke Jie.However,behind the superhuman performance and results from them,it is the development and improvement of deep learning and its deep neural network that is inseparable and non-negligible.As we can see,deep learning is driving high-speed development of computer vision and artificial intelligence,also changing our life in the same time.However,due to the complexity of the deep neural network models and the lack of labelled training data,there are still enormous challenges when further promoting the development of deep learning into more aspects of computer vision.Therefore,based on the research on deep learning in these years,we have studied some key technologies of deep learning in the field of computer vision.The main contribution of deep learning is that it has changed our traditional thinking of dealing with computer vision problems.This thesis will focus on that how the learning model of feature represenation driven with big data replaces hand-crafted feature design model based on artificial experience and expert knowledge;how the solving idea of dividing a complex issue into multiple subproblems upgrades to an integrated end-to-end autonomous learning model;why the design idea with simple model transforms into designing complex model for dealing with big data.For these reasons,several innovative learning theories,feature representations and end-to-end visual systems are proposed in this thesis for computer vision.We also propose some new research methods and system frameworks for the technical bottlenecks from these aspects of computer vision,such as face recognition,single object tracking and multi-object tracking,biomedical image retrieval and classification.Meanwhile,preference learning is also firstly introduced into dealing with those problems from the field of computer vision,which provides a new research perspective and direction.In the end,the research ideas and the links between the chapters can be briefly summarized as follows: not only this thesis joints all kinds of visual research directions from the view of technical level,but also the difficulty of the research unit is progressive.With the image as the basic unit of computer vision field,this thesis researches from feature representation and learning for unsupervised general images,to the specific face image recognition,and then into difficult video data analysis and object tracking by fusing the time dimension information;from the feature learning and model training for supervised ordinary images,to high-precision and high-demand biomedical image analysis and processing;from the study of large-scale image data in one field to the processing of limited visual images in another field.Finally,with the main line of deep learning technology and the research object of different levels of difficult visual data,this thesis joints these various visual directions in series together.The contributions of this thesis are concluded as follows:1)Proposed a novel face recognition algorithm which is based on deep learning technology.Because of the limitations of hand-crafted image features,our proposed algorithm presents an unsupervised deep mechanism for image feature learning and representation driven with big data,which can perfectly describe and represent different face images.There are two stages in this face recognition algorithm: one is the offline training procedure,which constructs a stacked denoising autoencoder to learn the generic image features with the auxiliary training images from Tiny image dataset;another is online supervised face recognition procedure,which mainly adds a logical regression layer on the top of encoder part of stacked denosing autoencoder to build a deep neural network for the classification of face recognition and fine-tune the generic image features from last procedure for the specific face database.Finally,by comparing with the state of the art of face recognition algorithms on several public face recognition databases,it proves that our method achieves a more accurate recognition rate and a lower computational expense,and it is also a good tool in addressing big data.2)Presented a new idea and algorithm for single object tracking.Single object tracking is still a popular research direction in computer vision.In general,the main purpose of object tracking is to find the moving object on each frame of the video sequence.In order to track the object accurately in each frame,we have created a novel visual cognitive system by combining a knowledge-guided idea and a data-driven deep learning architecture.In other words,we fuse two types of techniques to achieve the accurate object tracking which are deep learning and preference learning.On the one hand,deep learning can automatically extract multi-level image feature representations from a large number of original images,whose great success has been proved in other tasks of computer vision.On the other hand,we firstly propose that object tracking can be regarded as a kind of preference learning problem,by calculating the overlapping area between object tracking block and positive and negative image blocks that are extracted around the center of object tracking block from last frame,to build the preference relationship which is to give the target tracking system a certain degree of guidance knowledge,and then output the sorting results of these possible object blocks and determine the object tracking position and size with the image block that is the largest overlapping area.Thus,for each frame,the algorithm outputs a series of candidate image blocks and so the preference learning is to train and learn the object preference relationship and establish the sort function.Through those comprehensive experiments,our proposed DPL2 tracker outperforms than other state-of-the-art trackers.3)Proposed a new multi-object tracking approach which is based on deep stacked denosing autoencoder framework.For any object tracking algorithm,how to represent the appearance of every target is still the very critical technology.We exploit more compact and high-level deep features to describe the appearance of target,which has been applied to design our multi-object tracking system.A novel technical program of unsupervised learning with large data and supervised learning with small data is proposed here.To adapt the change of target appearance during object tracking process,this approach needs to fine-tune multi-object tracking system after obtaining the generic image features by training with offline auxiliary data.In addition,temporal information that is used to record the dynamic duration time of each target is also used to guide system to correctly address the various interference factors and accurately follow each target.With the help of temporal information and particle filter,this multi-object tracking algorithm can automatically complete self-learning of target feature and update model of system,which is used to accurately identify each target that is subject to external interference in the tracking environment.A large number of experiments have also proved that this tracker can accurately perform the tracking of multiple objects in indoor and outdoor scenes and successfully undergo the interference for tracking objects,such huge posture change,scale change,occlusion,illumination change,interaction between multiple targets and so on.In addition,this multi-object tracking algorithm can also provide effective technical support for abnormal event detection and early warning.4)Exploited a new deep preference learning retrieval system for biomedical images.In order to assist the doctor to diagnose the patient's healthy condition,we design this deep preference learning biomedicial image indexing system.Based on the research on two types of deep neural networks,fully connected network and convolutional neural network,we have studied a general algorithm of feature representation for different visual problems,which greatly reduces the difficulty of solving new and interdisciplinary visual problems,thus,it realizes that how to extract more accurate deep features for biomedical images.Subsequently,to design the similar image matching framework,we successfully incorporate the preference learning philosophy into the construction of the biomedical image retrieval system.According to the type of query image,the corresponding preference model is established and trained,then the relevanted biomedical images will be retrieved.Finally,testing on the widely used public biomedical image database,our proposed two algorithms can obtain better retrieval performance than other popular biomedical image indexing algorithms and general image retrieval methods.5)Presented a highly reliable and accurate end-to-end biomedical image classification algorithm.In order to further explore how small data can promote the research of deep learning technology in the field of medical diagnosis,this thesis focuses on the idea of integrated end-to-end autonomous learning,replacing the old method of dividing a complex problem into several subproblems in the field of traditional pattern recognition.The traditional biomedical image classification algorithms,either use the combination of hand-crafted features and machine learning classifier,or directly exploit the well-trained deep neural networks to extract features for biomedical images,even train a deep classifier with limited biomedical images.However,the classification accuracy and stability of traditional models is still very difficult to meet the needs of real application.Therefore,we propose to exploit a domain transferred deep convolutional neural network to build classification framework,which can transfer the parameters of a well-trained deep model from another domain to biomedical image classification,and then fine-tune the deep model with biomedical images and finally complete the end-to-end classification task for biomedical images.This thesis is devoted to the research and exploration of neural network structure in deep learning technology,and deeply analyzes the design structure of different types of deep neural networks.Around the various technical problems in the field of computer vision,we propose the research on some key technologies of deep learning,which further enriches the related research content of deep learning in computer vision and has a certain theoretical value and practical significance.
Keywords/Search Tags:Deep learning, Computer vision, Deep neural network, Preference learning, Transfer learning
PDF Full Text Request
Related items