Font Size: a A A

Semantic Attribute Prediction Based On Deep Learning

Posted on:2021-05-01Degree:MasterType:Thesis
Country:ChinaCandidate:M L ShenFull Text:PDF
GTID:2518306308968399Subject:digital media technology
Abstract/Summary:PDF Full Text Request
Image attribute prediction is one of the basic tasks in the computer vision field.Most of the early studies were based on artificially designed features,but the semantic gap between visual images and attributes led to unsatisfactory results.With the rapid development of deep learning technology,using the deep features extracted by deep networks to predict image attributes has become a research hotspot in academia.This thesis focuses on semantic multi-attribute prediction,and proposes three deep neural networks to improve the accuracy of image multi-attribute prediction.The main work and innovations of this thesis are as follows:(1)A bi-directional LSTM network based on the attention mechanism is proposed,thus improving the accuracy of image semantic attribute prediction.Based on the classic deep convolutional neural network VGG-16,we add an attention module based on the residual mechanism to eliminate interference from irrelevant backgrounds,and use a bi-directional LSTM to learn from the extracted features so as to mine the correlations between semantic attributes and image features.Experiments on DeepFashion,the largest open-source clothing image database at present,show that the the predictive recall rates of the proposed deep neural network on four attribute groups,Texture,Fabric,Part,and Style,can achieve 1%to 6%improvement,especially on Texture group.(2)An ABLSTM network based on multi-task mechanism is proposed,which further improves the prediction performance of image semantic attributes.This thesis proposes to add a landmark prediction sub-network to the ABLSTM network based on the previous network,redesigns a multi-loss function,and further improves the network's performance on predicting high-level semantic attributes through joint optimization of the two networks.Experiments on DeepFashion-C dataset illustrate that the ABLSTM network's predictive recall rates of the four attribute groups,Texture?Fabric?Shape?Part,have increased by about 2%to 5%compared with the previous network,and the performance on Shape group has improved the most.Since the performance of the ABLSTM network on Style group doesn't significantly improve,a regression model is proposed,which improves the predictive recall rate of Style attributes by about 6%.(3)An Advanced ABLSTM network is proposed.This network is based on the ABLSTM network and uses a new landmark prediction network that is based on a simulated hourglass network.And a new multi-loss function is designed.A spatial transformation module is introduced into the new landmark prediction network to autonomously learning transformation of image features,and an"down-sampling-up-sampling" simulated hourglass network is designed to learn and predict landmarks.The newly designed multi-loss function updates the loss of the two sub-networks to better train and optimize the entire network.Experiments on DeepFashion-C show that the new landmark prediction network outperforms other comparison network;the predictive recall rates of the Advanced ABLSTM network is 4%to 11%higher than that of the ABLSTM network on the five attribute groups,especially on attributes such as Shape and Part type which are closely related to landmarks.At the same time,the predictive recall rate for the Style type attributes has also exceeded the regression model for advanced semantic attribute prediction.
Keywords/Search Tags:deep learning, attribute prediction, landmark prediction, multi-task mechanism, attention mechanism
PDF Full Text Request
Related items