Font Size: a A A

Research Of Object Detection Algorithm Based On Video Surveillance

Posted on:2010-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:S ZhengFull Text:PDF
GTID:2178360272995797Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The computer vision is the technology that uses the computer to analyze the image structure and related content to gain the information. This technology entrusts the computer the function that is similar with the human. The shape analysis is a very important part in the computer vision research and the correlation technique is widely applied in the industry, the medicine, the transportation, the military and so on. Since the 1960s, the research development of the shape image analysis domain has been rapid. The researchers have proposed many shape analysis methods and some related methods have played the role in the scientific research domain, however there are so many questions that the researchers do not solve. There must have the questions of information losing and geometry invariable when the three-dimensional body of the real world mapping on two dimensional image plane. In order to obtain the shape data which needs for the high-level picture shape analysis and understanding, the image division should contain the region of the corresponding visual significance. Therefore,to the image segmentation and shape characteristic description, we need further research on the pattern sorting algorithm.With the rapid development of computer technology and network, the computer processing of information form already has not only limited to the initial writing, the form and the graph and so on, present's computer already has processed the huge data,high complex computation degree of sound and video. The dynamic video frequency takes one new kind of media form, its expression, processing and management have own characteristic, but it has a greater difficulty, its research has become an important aspect in the information field. In the new international standard MPEG-4 which developed by MPEG, the concept of video object is proposed, which organizes the scenery by independent and meaningful video.To each kind of media object,the MPEG-7 standard carries on the unification and standardized description,But the standard of MPEG-4 and MPEG-7 do not refer to concrete arithmetic in video semantics segmentation, instead, they are left to research further as an open part.The segmentation of moving objects in video is an difficulty and key point in MPEG-4. The important value of video segmentation is reflected in two aspects: First, carrying on the independent compressed encoding in the video object by segmented, to improve the compressed encoding efficiency, with the low network band width, to obtain the better video image quality. Second, organizing video content structure with segemented video object can realize video access, alternation, searching and index.Object detection is an important, yet challenging vision t ask. It is a critical part in many applications such as image search and scene understanding;however it is still an open problem due to the complexity of object classes and imagesCurrent approaches to object detection can be categorized by top-down, bottom-up or combination of the two. Top-down approaches often include a training stage to obtain class-specific model features or to define object configurations. Hypotheses are found by matching models to the image features. Bottom-up approaches start from low-level or mid-level image features,i.e. edges or segments. These methods build up hypotheses from such features, extend them by construction rules and then evaluate by certain cost functions.The third category of approaches combining top-down and bottom-up methods have become prevalent because they take advantage of both aspects. Although top-down approaches can quickly drive attention to promising hypotheses, they are prone to produce many false positives when features are locally extracted and matched. Features within the same hypothesis may not be consistent with respect to low-level image segmentation.On the other hand, bottom-up approaches try to keep consistency in low level image segmentation, but usually need much more efforts in searching and grouping. Wisely combining these two can avoid exhaustive searching and grouping while maintaining consistency in object hypotheses.Our detection method falls into this last category of combining top-down recognition and bottom-up segmentation, with two major improvements over existing approaches.First, we design a new improved Shape Context (SC) for the top-down recognition.Our improved SC is more robust to small deformation of object shapes and background clutter. Second, by utilizing bottom-up segmentation, we introduce a novel False Positive Pruning (FPP) method to improve detection precision. Our framework can be generalized to many other object classes because we pose no specific constraints on any object class.Our method contains three major parts: codebook building, top-down recognition using matching and voting, and hypothesis verification。The object models are learned by building a codebook of local features. We extract improved SC as local image features and record the geometrical information together with object figure-ground masks. The improved SC is designed to be robust to shape variances and background clutters. For rigid objects and objects with slight articulation,our experiments show that only a few training examples suffice to encode local shape information of objects.We generate recognition hypotheses by matching local I mage SC features to the codebook and use SC features to vote for object centers. A similar top-down voting scheme is described in the work of[4] , which uses SIFT point features for pedestrian detection. The voting result might include many false positives due to small context of local SC features. Therefore, we combine top-down recognition with bottom-up segmentation in the verification stage to improve the detection precision.We propose a new False Positive Pruning (FPP) approach to prune out many false hypotheses generated from top-down recognition. The intuition of this approach is that many false positives are generated due to local mismatches. These local features usually do not have segmentation consistency, meaning that pixels in the same segment should belong to the same object. True positives are often composed of several connected segments while false positives tend to break large segments into pieces.Our experiments test different object classes including edestrian, bike, human riding bike, umbrella and car These pictures were taken from scenes around campus and urban streets. Objects in the images are roughly at the same scale. For pedestrians, the range of the heights is from 186 to 390 pixels. Results show that our detection algorithm can achieve both high recall and precision rates.
Keywords/Search Tags:computer vision, Shape Context, False Positive Pruning, recall
PDF Full Text Request
Related items