Font Size: a A A

A Study On Fine-grained Image Recognition With Deep Learning Methods

Posted on:2020-07-27Degree:MasterType:Thesis
Country:ChinaCandidate:P Q ZhuangFull Text:PDF
GTID:2428330596964238Subject:Control engineering
Abstract/Summary:PDF Full Text Request
Fine-grained image recognition is a challenging problem due to its natural characteristics,e.g.large inter-class variance and small intra-class variance,which needs more informative features for recognition task.Although most existing methods have been exploited on finding discriminative regions from image,it may be limited to develop a high-capacity recognition system with image information alone.To address this problem,we aim at mimicking the human cognitive process,leveraging textual modal information as visual guidance to localize the most distinct parts in image.Hence,we propose two novel approaches in our paper to demonstrate the effectiveness of textual descriptions in recognition task.First,we introduce pairwise text descriptions,which mainly describe the visual differences in a pair of images,and consequently design a multi-modal fish network(MMFN)for distinguishing highly-confused species.More specifically,we leverage textual descriptions as visual attention guidance,and discover the most discriminative regions in image.With the aid of these texts,CNNs can extract features from these regions and then contribute to the final results.Second,we further propose to add individual text descriptions to increase the representation power of images.Besides,we design a multi-task training mechanism with both Image Classification and Image Caption tasks,and use the text generation regulations to help improve the quality of image features in a top-down way.As a result,it can not only accurately generate words to depict the details of the image content,but also enhance the capacity of the classifier.Finally,we have conducted extensive experiments and validated our novel designs in the recognition task with multi-modal data.It illustrates that,the textual data can enrich information,which is scarce in visual data,and consequently boost the performance in fine-grained recognition task.
Keywords/Search Tags:Multi-modal data, Fine-grained, Image recognition, Deep learning
PDF Full Text Request
Related items