Font Size: a A A

Dual-modality Modulation Model For Fine-grained Image Recognition

Posted on:2022-10-17Degree:MasterType:Thesis
Country:ChinaCandidate:Y C JiangFull Text:PDF
GTID:2518306524976459Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Image recognition is one of the most basic research directions in computer vision.With the rapid development of deep learning and widely application of convolutional neural networks,methods for processing general image recognition tasks are becoming more and more mature.Therefore,more and more researchers are focusing on its subdivided fields,and fine-grained image recognition is one of the hottest branches.In fine-grained image recognition tasks,general neural network models can no longer satisfy the task requirements.Meanwhile,this research field is still developing,more and more research topics become closer to practical problems which exist in real life.This paper proposes a new dual-modality modulation fine-grained image recognition algorithm for fine-grained images of fashion domain,which uses data's image and attribute information at the same time.The main work of the full text is as follows:1.We sort out the research background and significance of fine-grained images recognition as well as its overseas and domestic research status.Besides,the principles and applications of related theoretical foundations such as convolutional neural networks,graph convolutional networks,attention mechanisms and multi-modality learning are introduced in detail.2.We study on a new type fine-grained image recognition problem.Compared with the traditional fine-grained image recognition problem,the dataset studied in this paper contains multi-label.It is much more difficult to solve and closer to real life scenarios than single-label classification problem.Meanwhile,few such datasets exist publicly.Therefore,there are not many relevant studies on this kind of problem.This paper does a special exploratory research on it.3.We explore different attribute feature extraction methods.Two methods which use graph convolutional network and deepwalk separately are designed,and the advantages and disadvantages of the two methods are compared in detail by experimental results.And we explore the method for multi-modal feature interaction.With the idea of crossmodal information fusion,we combine information from different feature spaces,attribute feature space and image feature space.Based on the two points mentioned above,finally,the dual-modality modulation model for fine-grained image recognition is proposed.4.We conduct ablation experiment on the open source dataset,which proves the effectiveness of the two branches of dual-modality modulation model.We compare our structure with mainstream works and analyze the experimental results.Experiments show that the algorithm we proposed can get better performance.
Keywords/Search Tags:Deep Learning, Fine-grained Image Recognition, Graph Neural Network, Attention mechanism, Multi-modality
PDF Full Text Request
Related items