Font Size: a A A

The Study Of Object Detection Based On Context

Posted on:2017-01-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:T LiFull Text:PDF
GTID:1108330485488393Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Object detection is an important yet challenging problem in computer vision. It is widely used in many fields including intelligent surveillance, image search, humancomputer interaction, etc. Due to many unfavorable factors, such as the unknown great appearance variance among the same object category, the large similarity among different categories, and uncontrolled background(e.g. illumination, occlusion, viewpoint), object representation can be unreliable which may negatively affect the detection accuracy. It is desirable to seek for other information source for improving object detection. The context captures the inherent correlations between objects, which may be useful as an auxiliary information source to resolve the ambiguity of object appearance and reasoning unreliability for achieving more robust accuracy.Despite the vast amount of ever-made research effort, it is still an open problem mainly due to the inherent difficulty in bridging the large gap between the low-level object representation and the high-level semantics. Along this line of research works,this thesis aims to explore the further potentials of context in improving object detection accuracy.Specifically, this thesis investigates multiple different issues in object detection for fully discovering and exploiting the available context information, each from a different point of view. In addition, a few of novel object detection models are formulated for boosting object detection performance, based on machine learning algorithms like convolutional neural networks. The main studies and contributions of this thesis are:(1) For addressing the object detection in static images, a generic object detection model based on Hough context is proposed. Specifically, we construct a novel ellipseshaped object context with polar coordinate system by considering every single image pixel as a potential object center, different from existing methods that vote the object center with visual features. With this proposed context, we design a novel representation for local context through sparse sampling and employing the special structure information of multiple features. Moreover, a weighted voting algorithm is presented, capable of both independently and jointly voting object location. The extensive evaluations validate the effectiveness of the proposed approach in enriching object representation and improving the detection accuracy.(2) For solving the co-occurrent object detection in images, a new local context representation is introduced for encoding the semantic relationships between objects based on which a multi-layer object detection model is further established. Specifically, we first organize all image data into multiple scenes by unsupervised clustering on global context information. For each scene, we first design a consistent object pairs for capturing localized spatial position, scale and angle configuration among objects; then, we construct scene-dependent trees with the consistent object pairs to build the multi-layer object detection models. Our comparative experiments demonstrate the efficacy of the proposed method in encoding semantic constraints between objects and boosting object detection reliability.(3) For dealing with the object detection in videos, a novel spatio-temporal context representation based on the motion dynamic change is introduced and applied to fire detection and people counting applications. For fire detection, we build a dynamic context representation for encoding the spatio-temporal characteristics of fire motion orientation.By using Kernel SVM as the detection model, the experiments show superior fire detection performance with the proposed context representation. For people counting, the proposed context aims to capture the optic flow dynamics of localized block motion and the size of block over time. Further, a new tracking algorithm based on this dynamic context is formulated, specially tailed for people counting. The final linear regression model is built up on SVM, which shows favorable capability of counting people accurately by comparative evaluations.(4) For handling the object detection in particular situations, an adaptive scenespecific context representation approach is formulated via leveraging the abstract representation extracted from the Convolutional Neural Network(CNN). In particular, given a scene and a scale, we propose to extract a respective context representation by comparing context feature maps between different groups and preserving the correlated location index. This allows us to obtain effective context information. By combining with object appearance information, we can build an adaptive context based CNN deep object detector. Our extensive experiments validate the representation power and complementary effect of this deep learning based context for object detection.
Keywords/Search Tags:Object detection, Context, Hough voting, Mixture of experts, Convolutional neural network
PDF Full Text Request
Related items