Font Size: a A A

Research On Joint Perception Of Aesthetics And Emotion In Images Based On Deep Multi-task Learning

Posted on:2021-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YuFull Text:PDF
GTID:2428330602483771Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,with the application of deep learning technology,the research of image processing and analysis has achieved great success.In the application of image classification and target recognition represented by license number recognition,face recognition,object detection,etc.,the progress achieved can be described as breakthrough.But in the high-level abstract understanding of the image,it still faces severe challenges.Typically,the aesthetic quality(high,low)assessment and the emotional information(happiness,anger,fear,sadness,etc.)recognition of the image are both worthy of research.At this stage,both issues have been studied by peers and some progress has been made.However,both the aesthetic quality assessment and the emotional information recognition of images have been considered as separate tasks until now.Intuitively,the emotional information expressed by the image may have a certain relationship with the aesthetic quality assessment of the image.For example,images that make people feel happy arc more likely to be considered as high aesthetic.It is relatively unlikely that images that are frightening and angry are considered beautiful.Objectively,neuro-aesthetic experts have proved that when humans make aesthetic judgments and emotional cognition,the response area of the brain is the same,thus,there may be some connections.As a result,an idea has emerged naturally:the two tasks of aesthetic quality assessment and emotional information recognition of images are interrelated and share certain features,so modeling jointly of these two problems may improve the performance of both tasks simultaneously.In this paper,the multi-task learning method is used to solve the problems of aesthetic quality assessment and emotional information recognition of images simultaneously,and they are modeled in a same deep convolutional neural network framework.Specifically,we propose a new model called Aesthetics-Emotion hybrid Network(AENet).Like traditional deep learning networks,AENct has the characteristics of end-to-end training.Therefore,as long as any input image is given to a model,the results of aesthetic quality assessment and emotional information recognition can be obtained directly.In this model,the features related to the task-specific and the task-shared are extracted from the three-way network branch,which are fused and separated through the feature fusion layer.Then,after the integration and induction of multi-scale regional average pooling,the model uses the features of different levels to improve the accuracy of image aesthetic quality assessment and emotional information recognition.In addition,because there is short of the dataset of images with tags of aesthetics and emotion,and in order to ensure that the experiment can be conducted normally and achieve the expected results,we have also established the first large-scale image dataset with both aesthetic and emotional tags(Images with tags of Aesthetics and Emotion,IAE),which is collected on the image search engines Flickr and Instagram in the Internet.In this paper,crowdsourcing technology is used to mark data,and then reliable experts were selected to score,thus,the dataset has three advantages:reasonable data distribution,high label confidence,and strong category balance.Finally,in order to verify the performance of the AENet model proposed in this paper,this paper conducted a systematic experiment on the constructed large-scale image data set IAE:(1)The experimental results of comparison with existing methods show that the accuracy of image aesthetic quality evaluation and emotional information recognition can be increased by 3%and 4%by using the model proposed in this paper,respectively,effectively verifying the idea that joint modeling of the two problems can improve the performance of two tasks simultaneously.The comparative experiment is composed of two parts:single-task baselines and multi-task baselines.Compared with the single task model,it can prove the rationality and effectiveness of the idea of multi-task,and compared with the multi task model,it can prove the innovation and superiority of the AENet model in this paper.(2)The ablation experiment is conducted by deleting feature fusion units and multi-scale regional pooling units for AENet,and comparing with the performance of the original AENet,which verifies the contribution of the main components to the performance of the AENet network,and suggests that the network structure can be better used by other scholars in a split way,and enhances the reliability of the experimental results.(3)This paper uses the well-trained AE-Net model to conduct cross-dataset performance test on the aesthetic datasets AVA and CUKHPQ,emotional datasets weakly-FI and ArtPhoto.The experimental results show that the AENet model has good generalization ability,and also confirms the confidence and reliability of the label attached to IAE constructed in this paper.
Keywords/Search Tags:multi-task learning, deep convolutional neural networks, feature fusion, aesthetic quality assessment in images, emotional information perception in images
PDF Full Text Request
Related items