Font Size: a A A

Research On Compact And Efficient Deep Models For Fine-grained Image Recognition

Posted on:2021-03-05Degree:MasterType:Thesis
Country:ChinaCandidate:X X LiuFull Text:PDF
GTID:2428330647950742Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Fine-grained image recognition is a very challenging research topic in the computer vision area.Benefiting from the development of deep convolutional neural networks in recent years,fine-grained image recognition has achieved great progress,too.At present,methods based on deep CNN and bilinear pooling have become the mainstream methods for fine-grained image recognition.However,both of them require significant computational and storage resources.These facts make it cumbersome to deploy these models to real-world tasks in a resource-constrained environment.Therefore,it is of great scientific significance and has practical application value to study efficient and compact fine-grained image recognition models.Hence,this dissertation aims to tackle this problem by designing compact backbone networks and efficient bilinear features.The main contributions are summarized as follows.Deep CNN is formed by stacking plenty of convolutional layers and other linear / non-linear layers.Hence,it requires a lot of resources.At present,the mainstream compact fine-grained models mostly focus on designing compact bilinear features and pay little attention to what kind of backbone network is suitable for compact fine-grained models.Hence,in this dissertation,we propose Truncated Thi Net to fully exploit the acceleration potential based on the state-of-the-art compact network named Thi Net.Targetting the problem that bilinear pooling requires lots of resources,we propose Global Weighted Pooling,which adopts a self-attention mechanism to encode second-order information that forms efficient and compact fine-grained features.Experiments show that the proposed method is about 1.5× faster on mobile phones,about4× faster on computer GPUs and about 2× faster on computer CPUs than the state-ofthe-art lightweight network Mobile Net V2 with comparable accuracy.Analysis showsthat the proposed Global Weighted Pooling requires much fewer extra resources than most mainstream compact bilinear pooling methods.Besides,experiments show that the proposed model can work well with modern mobile phones equipped with AI coprocessors,which means the proposed methods can adapt to the future trend of mobile AI applications.Face recognition is a kind of fine-grained image recognition task.However,this task is much more difficult than general fine-grained tasks due to the complexity of face data.Therefore,mainstream face recognition models use carefully designed loss functions to boost the performance.Hence,the speed bottleneck of deep face models is the backbone network.Considering this fact,this dissertation proposes a Sandboxshaped Convolutional Block to construct a compact and efficient deep face model.The proposed method adopts a suppress-expand operation pair to form a compact convolutional block,in which the suppress operation reduces the computational complexity within a block and the expand operation reduces the information loss in the block.Experiments show that after applying the proposed block to the state-of-the-art deep face model,the accuracy losses are no more than 0.1% on 3 benchmark datasets while the speed is significantly improved.
Keywords/Search Tags:deep learning, convolutional neural networks, fine-grained image recognition, compact network design
PDF Full Text Request
Related items