Font Size: a A A

Learning Simple Local Features For Object Detection

Posted on:2011-12-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:W Z ZhangFull Text:PDF
GTID:1118330332478385Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
This thesis is concerned with object detection, one of the most challenging problems in Computer Vision, that is recognizing objects of some category, and localizing them in cluttered real-world images. This capability is one core competency of the human visual system. Yet, computer vision systems are still far from reaching a comparable level of performance. The main difficulty lies in finding an effective object representation that is tolerant of intra-class variations in appearance and geometry, while distinctive to inter-class variations, and meanwhile robust to image clutters, illumination changes, partial occlusion, etc..The thesis reviews the object detection approaches with an analysis of object models and features of common use. Based on this investigation, the research on how to build an accurate and robust object model is carried out on two levels:simple local feature and learning algorithm.On the level of local feature, an appearance-base feature named Scattered Rectangle Feature (SRF) and a shape-based feature called Hough Transformed Line Segment (HTLS) are proposed.SRF is a variant of Haar-like feature (HLF). It is also template-based, while the rectangles in the template are not required to be adjacent and aligned horizontally or vertically. Therefore the rectangles can not only explore more orientation cues, but also encode misaligned, detached and overlapped shape information, resulting in a more free and distinctive feature. Meanwhile, since SRF takes the rectangle template as HLF does, it can make full use of integral image, and be computed in constant time no matter its scale or location. Moreover, the thesis proves by construction that any non-degenerated SRF is equivalent to several HLFs that are constrained by some geometric relationship. The cue of the object part represented by such a SRF is therefore equal to the combined cues of several HLFs, which makes the detector based on SRF more robust. The experiments on the MIT and CMU face test set show that the detector based on SRF outperforms that based on HLF.HTLS is a simple shape feature, motivated by the fact that a line drawing conveys most of the information. HTLS is represented by a quadruple of the inclination of the normal, the distance from the origin, the shift distance of the center from the foot and the length, rather than the end points. The quadruple does not only uniquely define any HTLS but also handle the rotation, translation and scaling conveniently. Given a local coordinate system on the object centroid, the quadruple incorporates the geometric relationship between the HLTS and the centroid implicitly, resulting in a compact Implicit Shape Model, which has been proved effective in object detection tasks. To even enhance the distinctness of HTLS, connectiveness is employed to form HTLS groups. A weighted Euclidean distance in HTLS space to measure the similarity between the HTLS (groups) is introduced accordingly, which can handle well partial match and other noises caused by the unreliable edge detection, after a careful pick of the weights. By using the distance, distinctive HTLS (groups) from the training set are collected into a codebook. The experiments on motorbike and cow category show that shape cues are not only important but also competent for object detection.On the level of learning algorithm, a variant of AdaBoost named 2-threshold AdaBoost is proposed. It shares the same framework with AdaBoost except the 2-threshold weak hypothesis and the 2-threshold weak learner. The motivation is that selecting better weak hypothesis makes the final strong one more robust and efficient. The 2-threshold weak hypothesis guarantees a smaller (or the same if degenerated) classification error, since the extra threshold enables a finer splits over the feature values. The selection of the optimal values for the two thresholds is transformed by the weak learner into a maximum-sum consecutive subsequence problem, which can be solved by Dynamic Programming. Applying the learning algorithm on HLF and SRF, the experiments demonstrate that the variant converges faster, with fewer stages and features in the cascade, given the same training goal and in general the cascade learned by the variant outperforms that by AdaBoost on MIT and CMU face test set.Besides those results, a face training set of 19*19 resolution is built for the experiments in the thesis. An objective detection criterion is also introduced for MIT and CMU face test set, which relies on the minimal face rectangle extracted from the ground truth information of the test set.
Keywords/Search Tags:Object Detection, Computer Vision, Intra-Class Variation, Inter-Class Variation, Local Feature, Haar-Like Feature, Template, Integral Image, Hough Transform, Implicit Shape Model, Edge Detection, Weak Hypothesis, Strong Hypothesis
PDF Full Text Request
Related items