Font Size: a A A

Convolutional Neural Network Based Research On Image Understanding

Posted on:2018-02-20Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y D LiFull Text:PDF
GTID:1318330512488200Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In recent years,deep learning has made a series of breakthroughs in the research filed of artificial intelligence,which attracts wide attention from the research institute to the industrial world.As a crucial member of the deep learning family,Convlutional Neural Network(ConvNet)has a close connection with computer vision related researches.With a fast development of the network structure and the emergence of the large scale datasets,ConvNet has contributed a lot of breakthroughs in the past years.However,there are still a lot of undeterminded problems need to be solved.In this thesis,we categorize the researches of computer vision as three main levels: low level research on feature representation,mid level research on semantic representation and high level research on semantic understanding.We investigate the representative technologies at each level in our research.They are object recognition,scene labeling and scene recognition.The details are listed below.First,since the overfit problem of ConvNet is an unfavourable factor for the accurate object recognition,we investigate to use heterogeneous multi-column ConvNets for object recognition.Results from public object recognition datasets show that heterogeneous mult-column ConvNets yields better performance compared with a single column of ConvNet.In addition,as the traditional fusion methods are lack of generalization ability,we propose a Sliding Window Fusion(SWF)strategy for the fusion process.SWF is a more generalized way compared with the traditional fusion methods.It fuses the preditions from ConvNets selectively and provides better accuracy for object recognition.Second,we propose a ConvNet based framework for scene labeling and achieve superior performances compared with the exsisting scene labeling methods on both indoor and outdoor scene labeling datasets.We investigate the trained and general ConvNets features in our proposed framework for scene labeling.In addition,as the visual consistency is a common problem for scene labeling,we propose an algorithm called Region Consistency Activation(RCA)to improve the visual consistency of labeling results.RCA uses the global probability of boundaries and iteratively activate the unary potentials of a scene.Results on popular outdoor and indoor scene labeling datasets show that our proposed method produces better accuracy and visual consistency compared with the state-of-the-art methods.Third,we propose a scene recognition method which uses ConvNet to learn the features on multi-scale salient regions of a scene.Since the semantic informantion of a scene is very complex,we propose a trategy to find the salient region of a scene in the first step.Then,we use multi-scale bounding boxes to cover a wide range of context information.In addition,as the traditional handcrafted features show a weak discriminative ability for scene recognition,we investigate to use ConvNet as the general feature extractor to produce the feature representation of a scene.Results from the popular scene recognition datasets show that our proposed method produces better accuracy for scene recognition compared with the benchmark methods.In short,based on ConvNet,this thesis investigates three representative technologies in computer vision.For each task,we design the structure and the application mode of ConvNet.Meanwhile,we propose some strategies and algorithms to solve the problems from a specific field.Experiments on public datasets show that our proposed methods can provide superior performances in each field compared with the related exsisting methods.
Keywords/Search Tags:Convolutional Neural Networks, Object Recognition, Scene Labeling, Scene Recognition, Deep Learning
PDF Full Text Request
Related items