| Moving Object detection is an essential research topic in the field of computer vision,which has broad application prospects in human-computer interaction,video surveillance,military reconnaissance,medical image processing,etc.As an image preprocessing task,moving object detection aims to obtain moving objects from input video frames.Its detection accuracy directly affects the subsequent advanced tasks such as object tracking,object recognition,and behavior analysis.However,a series of complex scenes such as illumination changes,dynamic background,bad weather,shadows,and camera jitter in practical applications bring great challenges to traditional moving object detection methods.In recent years,the continuous development of deep learning technology has provided more comprehensive solutions for moving object detection in complex scenes.In particular,convolutional neural networks have proved their reliability in many complex fields with their powerful feature extraction ability.Based on this,this thesis conducts an in-depth study of moving object detection methods in complex scenes,inspired by existing deep learning frameworks.The main works of this thesis are as follows:(1)To address the problem of insufficient focus on objects in traditional convolutional neural networks,a symmetric pyramid attention mechanism-based is proposed for moving object detection.On the one hand,a symmetric pyramid attention module is designed to obtain pivotal target information and establish correlations among sub-regions,while the connection between different levels of knowledge is strengthened by skip connections.On the other hand,a Dilated Convolution Block(DCB)is constructed to acquire multi-scale features,which provides sufficient semantic information and geometric details for the network.By doing this,more attention is given to the target,and contextual information is tightly linked to obtain more valuable cues,which helps in the accurate acquisition of foreground.Experiments on publicly available datasets demonstrate that the proposed method can effectively improve the accuracy of moving object detection.(2)To solve the problems of incomplete objects and blurred object boundaries in detection results,a dual-branch enhanced network is proposed for moving object detection,which can simultaneously extract sufficient spatial features and contextual information.First,to extract advanced contextual information,a Recurrent Gated Bottleneck Module(RGBM)is designed,and a Global Attention Module(GAM)is constructed as an auxiliary branch to obtain fine resolution details.Then,a Gated Residual Dense Module(GRDM)is proposed to enhance the feature representation by reconstructing the fused information.Meanwhile,a weighted loss function is designed to optimize the network.Finally,experimental results on the mainstream moving object detection dataset show that the proposed dual-branch structure and RGBM,GAM,and GRDM modules are reasonably designed,and the network comprehensively performs well in several evaluation metrics.(3)For the issue of detection accuracy degradation caused by the lack of spatio-temporal difference information,an Interactive Spatio-temporal Feature Learning Network(ISFLN)is proposed.First,the deep and shallow spatio-temporal information of two paths with multi-level and multi-scale is obtained.Specifically,the deep feature is beneficial to improve the ability of feature recognition,while the shallow feature is devoted to fine boundary segmentation.Then,an interactive multi-scale feature extraction module is designed to facilitate information transmission between different types of features.Next,a multi-level feature enhancement module,which provides precise object knowledge for decoder,is proposed to guide the coding information of each layer by the fusion spatio-temporal difference characteristic.Finally,experimental results demonstrate that the proposed ISFLN has favorable performance and is competitive.(4)For the problem of poor network adaptability caused by invalid reference frames and excessive reliance on scene diversity in moving object detection,a Motion-Appearance-Aware Network(MAAN)is proposed to learn robust feature representation.Specifically,a module for mining motion information at multiple time scales,which can adaptively adjust information elements,is designed to refine the motion feature.And salient object features are obtained by the appearance feature extraction module.Subsequently,to enhance semantic consistency and reduce redundant connections,a module called multi-view feature evolution is constructed,which effectively fuses motion and appearance information by global communication and local guidance,respectively.Moreover,two strategies are proposed to obtain uniform and consistent change objects during information propagation.One is to feed the predicted mask of the previous frame into the decoder to provide prior information for the network,and the other is to match different levels of motion cues at multiple time scales to the decoder.The experimental results on several moving object detection datasets show that the proposed MANN can obtain good performance and improve the network adaptation capability. |