Font Size: a A A

Attention-Enriched Spatial Semantic Mechanism For Specification Recognition Research

Posted on:2022-02-14Degree:MasterType:Thesis
Country:ChinaCandidate:J DuFull Text:PDF
GTID:2518306554458494Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Intelligent goods recognition task is realistic and significant for the sale of goods in the supermarket.In different application scenarios,the realization of accurate positioning and identification of goods is of pivotal significance for the customer,the shop and the manufacturer.In the actual scene,the robustness of the model is often not enough to achieve user demands when faced with thousands of commodities.Specifically,the difficulties are mainly reflected in the complexity of the scene,their display locations,the photo angle and light and the similarity of the commodity.In this paper,we propose certain algorithms to distinguish the goods with high similarity,which is called the specification recognition problem.In the field of deep learning and computer vision,we propose approaches to enrich high-level semantic features.So,for extracting crucial spatial features,information fusion model-MHL module with the supervision of high-level semantic feature on low-level position feature is proposed by multi-head attention.Meanwhile,high-level features have the characteristics related to tasks,but it lacks the diversity expression of channel information and spatial information.Therefore,we propose a spatial attention model(FSA)based on frequency domain by combining the channel information modeling of senet and FCA.The combination of high-level and low-level feature maps with rich position information can further improve the expression of key regional features,and solve the recognition task with high similarity in classification problem.Those two modules are designed to mine the key regional characteristics and proven to improve the performance of the recognition model on specification dataset and CUB dataset.The research work of this paper is as follows.MHL module is proposed to fuse the context feature and based on the hierarchy from convolutional neural network and multi-head attention mechanism.Semantic information in high-level feature map is benefit of classification and recognition task,while position information in low-level but large-size feature map is abundant.The main idea of specification task is to dig out the key local feature and MHL module adopt a multi-head attention mechanism to integrate the high-level and low-level information.In this module,semantic feature is regarded as the keys in attention module and position feature is regarded as the query and the values.Fusion of multi-branch results can enrich position feature representation.FSA module is proposed to fix the insufficient expression of position features from backbone networks.Frequency spatial attention is a variation of Frequency channel attention.Its main idea is fix feature gaps of the preprocessing operation-global average pooling when attention mechanism is applied in visual domain.FCA module can enrich the channel feature while FSA module can enrich the spatial feature.1D-DCT can retrieve the lost frequency in preprocessing stage.Better position feature can be enriched by multi-frequency branch in the way of the CBAM spatial attention.The validity of the module is verified in the specification dataset and CUB dataset.In conclusion,this paper focuses on the characteristics of convolutional neural network.MHL module is proposed to enrich position features with the help of semantic feature.The main idea of FSA module is the diversity of frequency components.
Keywords/Search Tags:specification recognition, fine-grained classification, attention mechanism, high-level semantic feature, low-level position feature, frequency-domain attention representation, smart retail, key regional features
PDF Full Text Request
Related items