| There are a lot of semantic information in the image.How to mine the information and explore the relationship between image and semantic information has become a hot topic in current research.It has certain theoretical significance and practical application value.There are many factors that affect image emotions.Many researchers have considered this problem from the perspective of low-level visual features such as color,texture,shape,etc.However,due to the subjectivity and complexity of emotion,traditional low-level visual features are applied to large data,there is often a problem of low accuracy.With the development of deep convolutional neural networks(CNN),more researchers have begun to build deeper and more efficient network structures for image semantic recognition tasks,and made great breakthroughs.Compared with the traditional methods,the deep learning method can greatly improve the accuracy of recognition,and also proves the superiority and reliability of the deep learning algorithm.Based on the research of deep learning method,this paper firstly expands the small data set used in the research by data enhancement to achieve the sample quantity requirement of network training,and then fuse the four types of features to generat feature vector.The feature vector is input into the network classifier to implement high-level semantic recognition of the image,and finally generates a high-level semantic descriptive phrase.Based on the previous work,the emotion recognition classifier is improved,the classifier is retrained by constructing a stack sparse self-encoding network to further improve the recognition accuracy of emotions and the training of the network.The main work of this paper includes:(1)Based on the IAPS,GAPED,ArtPhoto and Abstract public sentiment datasets,the data augment method is used to expand the dataset,and a “Part_expansion” data expansion method is proposed,which not only makes the number of data sets meet the needs of network training,but also guarantees the number of different categories is balanced,reducing the problem of poor classification due to the large difference in the number of different categories of data.(2)A method for identifying the high-level semantics of images based on multi-feature fusion is proposed.Firstly,the image color feature and texture feature are extracted in the extraction stage.The deep network extracts the object class feature and the deep emotion feature,and fuses the four features to generate a feature vector.Then the feature vector is input into the three connected network,high-level semantic informations of images are realized,and finally semantic descriptive phrases including image emotions and object are generated.(3)A stack sparse self-coding network is proposed to improve the training of emotion recognition networks.Aiming at the traditional random initialization method,it is easy to cause the network to fall into local extremum and cannot converge.In order to alleviate the problem,a stack sparse self-encoding network is proposed.Firstly,the network is layer-by-layer training,and then the network is fine-tuned entirely.In this way,the weight of the initialization network can be placed in a better position,and the network can be easily converged faster and obtain better local extremum.Experiments show that compared with the random initialization method,this method can achieve higher recognition accuracy in the research problem of this paper and achieve the effect of improving network training. |