Font Size: a A A

Object Counting In Surveillance Video

Posted on:2019-04-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:X LiuFull Text:PDF
GTID:1318330542494139Subject:Control Science and Engineering
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the social economy,public safety problem has become a hot issue of national concern,and the video surveillance technology has come into being and been widely popularized.Nowadays,intelligent video surveil-lance system has been applied in traffic intersection,railway stations,airport,shopping malls,plaza and other public places.As an important content in the field of intelligent vidco surveillance,object counting has a lot of applications in real life.For instance,accurately counting the pedestrians and vehicles in surveillance videos could greatly facilitate the development of public security and traffic management.In addition,esti-mate the customer flow in shopping malls could guide the adjustment of business hours and personnel arrangement.Object counting is one of the classic visual recognition tasks that aims to estimate the number of specific objects within an image.According to the different source of data,existing object counting approaches can be divided into two categories:counting in video frames and counting in single image.In this work,we focus on both two aspects and propose some specific object counting methods after analysing the shortcomings of existing methods and the different characteristics of the actual application scenario.The main contributions of this work are as follows:1)As for the counting problem in video frames,we mainly consider the vehicle counting problem in intelligent transportation system.Video analysis algorithms can be implemented in either the pixel or the compressed domain.Particularly,the pixel-domain algorithms first decode the compressed surveillance videos into raw frames and then operate on the frame pixels,the computational complex-ity is usually tremendously high.Therefore,in this dissertation,we propose to address the vehicle counting problem in the compressed domain.Specifically,we first develop new low-level features to capture the crucial information use-ful for counting vehicles.These features can be easily computed from the pro-vided MVs and block partition modes,and cover the size,shape,motion,and texture information of traffic scenes.Combining these features can effectively mitigate the challenges of information insufficiency for counting vehicles in the compressed domain.Then we propose a Hierarchical Classification based Re-gression(HCR)model for counting vehicles in a video frame.HCR divides the traffic scenes into multiple cases according to the vehicle density and then adopts the most suitable regression model for each individual case.Finally,we exten-sively evaluate the proposed method on real highway surveillance videos.The results consistently show that the proposed method is very competitive compared with the pixel-domain methods,which can reach similar performance with much lower computational cost.2)The video analysis in compressed domain mainly relies on the encoding metadata.However,the critical metadata in compressed videos(i.e.motion estimation and compensation vectors)are designed for compression efficiency rather than video analysis.Consequently,the features extracted from video bitstreams are noisy and cannot accurately describe moving vehicles compared with the raw frames,which makes it quite difficult to accurately model the realistic traffic scenes for vehicle counting.To improve the robustness,we propose to address the vehicle counting in compressed domain by combining spatial and temporal regression.To be specific,besides the spatial regression,we further propose a locally tempo-ral regression method to refine the per-frame counting results,which exploits the continuous characteristics of the traffic flow.By combining the spatial and tem-poral regression,our proposed method can produce robust and accurate vehiclecounting results.Experimental results on real highway surveillance videos show the effectiveness of the proposed method.3)As for the counting problem in single image,the real scenes from surveillance video inherently suffer from various interference,e.g.,object deformations and camera perspective distortions,and thus it is especially vital for object counting to achieve the intrinsic robustness.In this dissertation,we particularly consider object counting as a unified learning problem of feature extraction and pixel-wise density estimation,and propose a novel counting model,named pyramid object counting network(POCNet).Specifically,we first propose to hierarchically ex-ploit the global information of images in estimating the pixel-wise densities.Such an idea in this work is implemented by embedding spatial pyramid pooling into the counting model for the sake of the simplicity and robustness to various in-terference.Then,we propose a pyramidal hierarchy counting(PHC)module for density estimation.PHC benefits object counting to incorporate multi-scale infor-mation from the inherent feature hierarchy of deep convolutional networks,thus more accurate density maps can be generated.Finally,we experimentally evalu-ate the effectiveness of the proposed method on several challenging benchmarks and the results show that our proposed POCNet outperforms the recent works and achieves state-of-the-art performance.In summary,we focus on object counting under surveillance video and go deep into its study.According to different characteristics of the actual application scenario,we proposed two different counting framework,i.e.counting in compressed domain and pyramid object counting network(POCNet).The experimental results on several repre-sentative benchmarks show that the proposed methods outperforms the recent works and achieves state-of-the-art performance in both accuracy and computational efficiency,which demonstrate its potential applications in intelligent video surveillance system.
Keywords/Search Tags:Intelligent Surveillance System, Object Counting, Compressed Domain, Deep Learning, Convolutional Neural Networks, Spatial Pyramid, Feature Learning
PDF Full Text Request
Related items