Font Size: a A A

Research On The Encoding Of Vehicles In Multisource Surveillance Videos

Posted on:2020-02-15Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:1362330620452205Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the continuous development of “Pin-an China”,the surveillance video data storage scale of typical model cities in China has reached the PB level,and the growth rate of the total amount of surveillance video data has far exceeded the speed of video coding efficiency.In order to reduce the storage cost of surveillance video,it is urgent to study an efficient surveillance video coding method.The camera in the urban surveillance network mainly shoots the three elements of security,“people,vehicles and objects”.Compared with the static background,the compression coding of dynamic foreground objects has more potential and challenges,especially the encoding of vehicle objects.The moving object is repeatedly recorded by each camera in the metro space,and the captured video collection is called multi-source surveillance video.The object data generated by the same moving object at different times and at different places has a high similarity,and the formed data redundancy is called moving object redundancy.As time and space move,the number of moving objects in the video increases sharply.The motion of massive objects will generate a large amount of motion object redundancy,which becomes the main component of redundancy in multi-source surveillance video.Removing motion object redundancy is to achieve multiple sources.The key to monitoring video for efficient compression.The redundancy of moving objects exists not only in single-segment video but also in different videos containing the object,and has both global and local characteristics.The existing single-source and multi-source video coding methods lack organic integration,and it is difficult to effectively remove motion.Object redundancy.Based on the idea of fusion coding,the motion object is predicted from the global and local aspects,and the synthetic reference frame is generated and used as a reference for the motion object coding.This provides a new opportunity for the removal of the motion object redundancy.But at the same time,it also faces the following challenges.In terms of global redundancy removal of moving objects,existing methods usually adopt feature-based prediction,and feature matching methods are used to establish the relationship between different video objects to determine the prediction structure.However,this method has higher requirements for shooting overlap.For multi-source surveillance video that follows the principle of under-coverage layout,the shooting areas are almost non-overlapping,and the objects have different postures under different videos,and often there are too few matching features or even matching features.The situation leads to the failure of the feature-based prediction model.In terms of local redundancy removal of moving objects,existing methods generally use motion compensation techniques on a two-dimensional plane to search for the most similar regions in adjacent frames as prediction values for the current object region.However,in the real world,the object moves in three-dimensional space,and only the translational motion on the two-dimensional plane is used to model the threedimensional motion with low precision.In the case of complex motion,such as threedimensional rotation,scaling,etc.It is difficult to find the optimal matching area,resulting in a sharp decrease in prediction accuracy.In terms of reference frame fusion,existing image fusion methods are generally oriented to image enhancement tasks,which decompose images into different frequency bands by transform,and fuse different image content according to certain criteria in a certain frequency band.This kind of task aims to obtain better.Subjective visual quality.For the video coding task,in order to reduce the prediction residual and improve the coding efficiency,we hope that the reference frame obtained after the fusion can be similar to the video frame to be encoded as much as possible.However,the existing image fusion method does not consider the similarity constraint between the fused image and the target image,resulting in a large difference between the fused image and the target image,and using it as a reference frame will generate a large number of prediction residuals.In view of the above difficulties,this paper studies the multi-source surveillance video efficient coding method and achieves the following innovations.(1)Object global prediction method based on knowledge representationAiming at the change of appearance and posture of non-overlapping video,the pixel distribution is different,lack of matching features,and it is difficult to establish the object prediction relationship.This paper studies the hierarchical knowledge extraction method of object,and proposes the use of strong space-time consistency.The high-level knowledge expression object constructs the object global prediction model based on knowledge representation.The common knowledge is used to correlate the inter-video objects,and the object cross-video similarity is mined to improve the global redundancy removal performance of the prediction model.The experimental results show that the proposed method still has better performance when the attitude of the object changes drastically and the appearance of the environment is different.The simulation monitoring video dataset is compared with the feature-based prediction method and the HEVC interframe prediction method.The prediction error is reduced by 8.18%and 16.34%,respectively.(2)Object local prediction method based on 3D transformAiming at the problem that the two-dimensional translation model is difficult to express the three-dimensional rotation and scaling of the object under complex conditions,the prediction accuracy is drastically reduced.This paper proposes a three-dimensional transformation based object local prediction model,which uses the perspective transformation model to describe the real motion of the object and enhance the prediction model.Local redundancy removes performance.At the same time,with the object 3D model,the perspective transformation model parameter estimation problem is transformed into two perspective projection processes with known parameters,which simplifies the model parameter solving process and makes it possible to apply complex high-order motion models to prediction.The experimental results show that the proposed local prediction model has good prediction performance in most scenarios,and can effectively remove local redundancy.Compared with affine transformation based prediction method and HEVC interframe prediction on analog surveillance video dataset.The prediction error of the method is reduced by 11.11% and 21.17%,respectively.(3)Content adaptive reference frame fusion methodThe lack of similarity constraints for the visual quality-oriented fusion method leads to the difference between the fused image and the video frame to be encoded,which is difficult to be used as a reference.In this fusion process,the absolute error between the fused image and the video frame to be encoded is used as the constraint.Minimize the difference between the two,making the fused reference frame suitable for coding tasks.In addition,this paper analyzes the characteristics of global reference frames and local reference frames,and constructs a content prior constraint to guide the reference frame fusion process to further improve the reference frame fusion accuracy.The experimental results show that the performance of the proposed method is superior to the comparison method in both subjective and objective aspects.Compared with the existing image fusion network,the proposed reference frame fusion network has higher quality and closer to the target image color,and also retains clearer edge contour and other structural information.(4)Multi-source surveillance video coding method based on fusion referenceBased on the above research,this paper builds a multi-source surveillance video codec framework based on fusion reference to remove the redundant motion object and realize high-efficiency coding of multi-source surveillance video.At the same time,this paper also proposes a rate-distortion estimation method based on motion reasoning and a reference frame queue management method based on rate-distortion cost to optimize the overall coding performance.The experimental results show that the proposed coding framework is more efficient than the comparison method in the simulation environment,simple scene and complex real scene.The coding framework can effectively remove motion object redundancy and achieve efficient multi-source surveillance video coding.In summary,this paper focuses on the removal of moving object redundancy in multi-source surveillance video,proposes a new global and local prediction model,and combines the advantages of both,builds a multi-source surveillance video coding framework based on fusion reference.It provides theoretical and technical support for the study of the generation mechanism and removal method of composite redundancy.
Keywords/Search Tags:multi-source surveillance video, moving object redundancy, reference frame fusion, video coding
PDF Full Text Request
Related items