| Event camera is a new type of sensor that outputs asynchronous event stream by capturing the brightness changes of each pixel.Compared with traditional cameras,event cameras have advantages such as low latency,high dynamic range,and high temporal resolution.However,due to the non-continuity and high speed of event stream,processing and analysis are more challenging than traditional cameras.Therefore,the development and application of event stream reconstruction technology become necessary support for event camera technology.The focus of this work is on the research and application of high-quality asynchronous event stream reconstruction algorithms,and the main work and improvements are as follows:First,traditional event stream reconstruction algorithms rely on prior assumptions,which can lead to unrealistic blurs and ghosting.Most of the reconstruction algorithms based on deep convolutional neural networks depend on event stream data with real images to train network parameters.To address these issues,this paper proposes a self-supervised event stream reconstruction algorithm based on brightness constancy.By combining the reconstruction network with the optical flow estimation network,the algorithm calculates the brightness increment difference between the reconstructed image and the input event stream to perform network learning,and assists optimization with time consistency loss and total variation loss.In addition,a multi-scale denoising module and a sub-pixel decoding module are designed to improve the quality of event stream reconstruction.The experimental results on public datasets and real shooting datasets show that the proposed algorithm can effectively improve the selfsupervised reconstruction effect of event stream.Second,to solve the deficiency of event stream reconstruction in complex spatio-temporal content representation and large-scale correlation modeling in video scenes,this paper designs a deep feature extraction module based on cross-scale attention Transformer,which can improve the learning of long-range cross-scale dependency relationships during the reconstruction process.At the same time,this paper designs a convolutional weighted feature fusion method and constructs a more discriminative multi-level dense connection method based on it to improve information utilization and optimize the learning ability of the network for high-level and low-level features.The proposed cross-scale attention Transformer-based deep feature extraction module and dense connection method are applied to the self-supervised reconstruction algorithm architecture,and experimental verification is performed on public datasets and real shooting datasets.The experiments show that the cross-scale attention Transformer-based deep feature extraction module and dense connection method can effectively improve the reconstruction quality.Finally,this paper uses the reconstructed intensity frame image as the intermediate representation of event stream to extend the applicability of event stream in downstream computer vision tasks.Four representative single-object tracking algorithms are selected,and event stream reconstruction data are used for single-object tracking.The algorithms are comprehensively evaluated from the aspects of tracking performance,tracking speed,and algorithm memory usage.Based on the selected optimal algorithm,the target tracking effect is achieved and compared on traditional video data and event stream reconstruction data in highspeed movement and dim scenes. |