Font Size: a A A

Research On Object Detection Based On Bag Of Visual Words Model

Posted on:2015-06-16Degree:MasterType:Thesis
Country:ChinaCandidate:F Y ZhangFull Text:PDF
GTID:2298330422470642Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
The process of object detection is to segment the interested target from the complexbackground by learning the classifier which could distinguish between target andbackground region, it is the research basis in the field of scene classification and targettracking. However, due to the influence of occlusion problem, illumination variation andhaving bigger internal difference, the current detection algorithm has some limitations toresolve the above problem. Based on the bag of visual words model of middle-levelsemantic description, this paper focus on how to training visual words, visual wordsrepresentation and object detection to do the following research:First of all, considering that the traditional bag of visual words model ignores thespatial location information and scale information of local features, and license platenumber is composed of fixed size and number of characters according to certain rules, weconstruct a kind of local visual words that including direction information,relativeposition and feature scale for license plate characters. According to the feature matchingalgorithm, we determine the location of each license plate character and locate the wholelicense plate region through combining relative position of visual words with the prioriknowledge of license plate. Compared with the traditional license plate localizationalgorithm, our method can deal with various changes to locate license plate robustly.Besides, it also has the advantage of real-time and accuracy.Secondly, since the spatial structure of local feature and clustering algorithm oftraining visual words play an important role in image representation based on the bag ofwords model. Sparse coding optimization algorithm was adopted to construct bag of visualwords of multiple attribute features, including shape and color feature. According to themax pooling function, we compute the spatial pyramid descriptor of color and shape,describe all the training samples through the late fusion algorithm of local features, andtrain the multi-class linear SVM. After that we locate the target region based on slidingwindow detection pattern. Simulation results show that this method can locate the facesand pedestrians who have bigger local position differences robustly. Finally, in order to further eliminate the ambiguity of visual words, we introduce thesemantic information on the basis of traditional bag of words model. The correspondingvisual words of each semantic component in target are trained through the semi-supervisedclustering approach, then the local features of test images are classified by calculating thesimilarity of feature vectors, according to the result of classification, we use the way ofsliding window statistic to find the most similar object parts region, after that the accuratetarget position is determined by taking in to account the relative position information of allobject parts. The experiments show that this method could eliminate the ambiguity ofvisual words effectively, and achieve better result in face detection experiments by usingthe relative position constraint.
Keywords/Search Tags:object detection, bag of visual word, feature matching, sparse coding, feature fusing, semantic information, part mode
PDF Full Text Request
Related items