Font Size: a A A

Research On Image Description Generation For Ethnic Minority Clothin

Posted on:2024-03-28Degree:MasterType:Thesis
Country:ChinaCandidate:X H ZhangFull Text:PDF
GTID:2531307112452494Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the advancement of the construction of "Digital China",there is an increasing demand for the generation of cross-modal ethnic clothing image descriptions in application scenarios such as digital ethnic clothing museums,ethnic clothing product retrieval,and digital ethnic clothing customization.The digital protection of ethnic costume images and the inheritance of ethnic culture play an important role.Driven by application requirements such as virtual fitting,digital clothing,and online shopping,description generation for ethnic clothing images has attracted the attention of many researchers.Due to the specificity of the image description generation task of ethnic costumes,such as lack of data sets,complex attribute information,and high similarity between classes,it is necessary to define and learn the key attribute information and output text information represented by the traditional culture of ethnic costumes,which makes the existing methods difficult to achieve.Therefore,based on the ethnic clothing image dataset,attribute vocabulary,and visual feature information,this paper conducts the following research on the generation of ethnic clothing image descriptions.First,constructing a dataset for image description generation of ethnic costumes is proposed.Given the lack of existing ethnic clothing image datasets and the lack of data that can be directly used for description generation,resulting in the inability to conduct cross-modal research on ethnic clothing images,this paper constructs a clothing image containing 55 ethnic minorities and a total of 30,000 image data sets.Define the local attribute information and text description unique to ethnic clothing based on different areas of different ethnic clothing,and preprocess the image to obtain an image set with a single background color and prominent clothing areas,to solve the problem that the existing ethnic clothing images and text information are difficult to apply Image description generated questions.Second,a local attribute learning method for ethnic clothing is proposed combining regional features and text embeddings.The local attribute information of ethnic clothing images is complex.To solve the problem of inaccurate recognition of clothing salient areas and difficulty in learning local attribute information,resulting in the generated image description results failing to reflect the characteristics of ethnic clothing,this paper first extracts the features of the salient areas to obtain visual feature vectors,combined with the encoding of ethnic clothing attribute words and text information word embedding vectors,multi-instance learning is used to obtain local attribute learning results(including key local attribute information such as the ethnic category of the input ethnic clothing image,clothing color,style,shape,etc.).At the same time,local attribute information is combined with visual features to generate visual features based on ethnic clothing attributes.Finally,the attention perception module constructed based on the visual features of ethnic clothing attributes and the embedding input of text information words participates in decoding,identifying the salient regions of ethnic clothing images,and key learning local attribute information.Third,an attention mechanism for ethnic clothing based on a two-layer LSTM model is proposed.Aiming the problem that the existing methods are difficult to judge the category of ethnic clothing images,and the image-text information association and matching are difficult,which leads to the low accuracy rate of the description generation results based on ethnic clothing images,and it is difficult to meet the application requirements.First,feature extraction is performed on the salient areas to obtain visual features.vector,combined with ethnic clothing attribute word encoding and text information word embedding vector,multi-instance learning is used to generate local attribute learning results.At the same time,local attribute information is combined with visual features to obtain visual features based on ethnic clothing attributes.Finally,the visual features based on ethnic clothing attributes and word embedding of text information are input into the attention perception module to participate in the decoding,and the visual attention and context results are obtained.Fourth,a clothing image description generation method combining local attribute learning and attention perception is proposed.To solve the problems of complex attribute information of ethnic clothing images and low correlation between semantic attributes and visual information due to similar categories,a ethnic clothing image description generation network including two core modules of local attribute learning and attention perception is proposed.First of all,the local attribute learning module is composed of visual feature extraction,attribute word encoding,and text information word embedding.Multi-instance learning is performed on the local attribute word vector and the salient region visual feature vector to obtain the local attribute learning result;secondly,the attention perception module is composed of Semantic attention,visual attention,and gating attention are composed of semantic attention,and visual attention based on local attribute information to decode ethnic clothing image-text encoding information;finally,the semantic-visual information is decoded by using the gating unit Relevance evaluation,through normalization output of ethnic clothing image description results with accurate syntax and rich attribute information.
Keywords/Search Tags:Minority clothing image, Image caption generation, Text embedding, Local attribute learning, Attention-Aware
PDF Full Text Request
Related items