Font Size: a A A

Research On The Method Of Commodity Image Retrieval By Fusing Multimodal Data

Posted on:2022-11-17Degree:MasterType:Thesis
Country:ChinaCandidate:H J ZhangFull Text:PDF
GTID:2518306614467604Subject:Automation Technology
Abstract/Summary:PDF Full Text Request
With the continued popularity and development of the Internet,online shopping has gradually become the main consumption method of people.The main purpose of applying image retrieval in shopping websites is to make it easier and more accurate for users to search for satisfactory products when facing a huge amount of commodity information.Traditional commodity image retrieval only uses a certain modality of the commodity as the retrieval object,which lacks the correlation and complementarity of information between image and text modalities and the retrieval advantages of the commodity itself,and the retrieval efficiency is not high.The method of retrieving commodity images by fusing multimodal data of images and text can better meet the retrieval needs of users.In order to realize multimodal commodity image retrieval,it is necessary to extract features from different modalities,obtain commodity features fusing multimodal data by fusing different modalities in the feature space,and apply efficient indexing methods for multimodal commodity image retrieval in this space.For the two modal data involved in this study,commodity image and commodity text,this thesis uses deep learning related techniques to construct a commodity image retrieval model based on multimodal data,which consists of multimodal data as input commodity image retrieval results as output.Image features of commodities are extracted by convolutional neural networks,and text features of commodities are extracted by neural networks,and the robustness of the multimodal retrieval model is further improved by the rational use of label information at training time.In the fusion part of two modal information of commodity image and commodity text,the vector cascade method is used.By using neural networks to determine the appropriate weights for different commodity images and commodity texts,a more effective fusion of the two modalities is achieved.In order to improve the retrieval efficiency of the multimodal commodity image retrieval model,the objective function of the multimodal retrieval model is proposed in this thesis by considering cross entropy and triadic loss function together.This function ensures the correct spatial distribution of the fused features of goods while bringing the fused features belonging to the same category of goods closer to each other and the fused features of different categories of goods further away from each other,making up for the lack of interrelationship between features in the multimodal retrieval model and improving the ability to model multimodal data.In this thesis,comparative experiments of commodity image retrieval with unimodal data and multimodal data are conducted on commodity datasets collected from e-commerce websites,respectively.The experiments of commodity image retrieval with commodity images and commodity text are compared with the experimental results of commodity image retrieval using multimodal commodity data,and the experiments of different objective functions,and different modal fusion methods using multimodal retrieval models are conducted separately.The experimental results show that the commodity image retrieval using multimodal data is better,and the experimental results using the objective function of this thesis are better.Under the same experimental environment,the average accuracy of retrieval using multimodal data on the commodity data test set is 70.68%,which is 5.91%and 6.01%higher than that of unimodal data retrieval results;2.99%and 5.88%higher than that of cross entropy and triadic loss function,respectively,which verifies the effectiveness of the method in this thesis.
Keywords/Search Tags:E-commerce, Commodity image retrieval, Multimodal deep learning, Similarity measuremen
PDF Full Text Request
Related items