| With the continuous development of digital media,the quality of images is increasingly required for various practical applications.Digital images may be distorted during capture,compression,transmission,storage and other processing processes,with different distortion types and varying degrees,resulting in degradation of image quality and directly affecting the user experience.Perception-based image quality assessment(IQA)models can assist humans to monitor and improve the quality of images,and play a very important role in image processing algorithms and systems such as image acquisition,image compression and image enhancement.In recent years,researchers have explored the field of IQA and proposed many IQA methods,which have achieved high performance on existing databases.At present,most of these studies focus on recognizable images,i.e.the relative visual quality of these images can be easily determined by humans.These studies belong to the field of coarse-grained image quality assessment(CG-IQA).However,the quality differences of images in many practical applications are subtle,and the performance of these CG-IQA methods in real-world applications is still not satisfactory enough for users,so that they cannot be widely used.In this thesis,we attempt to address this problem and conduct in-depth research on the FG-IQA method.The details of the research are as follows.(1)Bilinear CNNs for blind quality assessment of fine-grained images.This model is named BFG,and the overall architecture consists of three modules: a feature extraction module,a squeeze-and-excitation module and a bilinear pooling module.Firstly,the feature extraction module is constructed based on a sequence of convolutional layers to obtain information content sensitive to fine-grained features,extract quality-aware features and preserve rich image quality-aware information to the greatest extent.Secondly,the squeeze-and-excitation module processes the features in three steps: compression,excitation,and weighting.By modeling the dependency between channels,the representation capability of features is effectively improved.Thirdly,the bilinear pooling module further improves the feature recognition capability and also enhances the discriminability of features by aggregating feature information across all locations in the image,allowing the model to better distinguish fine-grained differences between images.(2)Blind quality assessment model for fine-grained images by fusing local and global information.This model is an improved version of the BFG model,named BFG-LG,with four modules including a feature extraction module,a fine-grained feature extraction module,a selfattention enhancement module and a bilinear pooling module.Specifically,the feature extraction module and the bilinear pooling module are mostly consistent with the design of the BFG model.On the one hand,the fine-grained feature extraction module uses depthwise separable convolution for feature extraction,which completely separates the channel information and the spatial information,allowing the output features to carry a large amount of local fine-grained information.On the other hand,we design the self-attention boosting module as a special multi-head self-attention mechanism,which extracts high-frequency information of images using depthwise separable convolution,while using the Talking Head operation to fuse global information,effectively improving the characterization ability of features.In addition,we use the idea of residual network skillfully to design the fine-grained feature extraction module and the self-attention boosting module as two special residual blocks to further improve the performance of the model.In this thesis,comparative experiments are conducted on the FG-IQA database.The comparison results with twelve full-reference IQA methods and six blind IQA methods prove that the BFG model can effectively discriminate the quality of fine-grained images,and the optimized BFG-LG model achieves optimal performance in all metrics.In addition,the consistency and correlation between the predicted results of the BFG-LG model and the subjective scores were significantly enhanced,with an average increase in the correlation coefficient of 1.85%,narrowing the gap between objective quality scores and practical applications.Meanwhile,the ablation studies validate the effectiveness of the main modules of these two models,including the compression and excitation module and the bilinear pooling module in BFG model,as well as the fine-grained feature extraction module and the selfattention boosting module in BFG-LG model. |