| With the rise of image processing and machine vision technology,a large number of online image editors have emerged,and ordinary people can easily edit and upload images to social networks.However,digital image inpainting technology can not only be used to repair damaged images but also used to erase specific characters,tamper with image semantics,etc.If inpainted images propagate through the Internet,it will cause serious adverse effects on individuals and society.Therefore,forensics research on inpainted images is of great significance to maintaining social stability and has become a hotspot in blind image forensics.At present,image inpainting detection is still in its infancy.Many researchers only design corresponding detection algorithms for specific inpainting methods.For unknown inpainting methods,there are problems such as poor detection accuracy and lack of generalization.To address above issues,two deep inpainting detection approaches based on the convolutional neural network are presented in this paper,which fully exploit the potential features of the images’ internal inpainting area and enhance the ability to detect unknown inpainting algorithms.The main contributions of this paper can be summarized as follows.(1)This paper proposes a pixel-level deep inpainting detection approach,which utilizes an encoder-decoder architecture based on convolutional neural network and vision Transformer,which can extract low-level features while obtaining long-range dependencies inside the inpainted image.The feature enhancement module uses three high-pass filters to extract the noise residuals,effectively enhancing the inpainting traces.The decoder combines the residual features of different levels to supervise the upsampling process of the feature map output by the encoder,and guides the model to accurately locate the inpainted area.(2)To improve the detection ability of small-area or irregular-shaped inpainting samples,an image inpainting detection algorithm based on multi-level gate units is proposed.Each gate unit consists of standard convolution,global average pooling,and element-wise multiplication operations,prompting the model to weigh the contribution of each encoder output,highlighting useful features.The algorithm cascades the output of the encoder with multi-level gate units to provide discriminative features for the decoder while minimizing useless information.In addition,this paper effectively improves the feature learning ability of the model for difficult samples by adding inpainted samples that are more visually difficult to detect during training.The models proposed in this paper are trained on a dataset synthesized by a single inpainting method,and the performance of our models are tested on datasets synthesized by using traditional inpainting methods and methods based on deep learning,and compared with multiple existing methods.Extensive experiments on ten challenging datasets demonstrate that the two proposed models perform favorably against other competitors under different evaluation metrics. |