Font Size: a A A

Towards Associated Hierarchical Structure Analysis In Vision Tasks

Posted on:2017-05-11Degree:DoctorType:Dissertation
Country:ChinaCandidate:T T YaoFull Text:PDF
GTID:1318330512468669Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The main aim of computer vision tasks is to obtain automatic understanding of input multimedia information (such as images and videos) with effective representation and reasonable learning/inference methods. Different vision tasks have natural connections and could be generally organized into three levels. The computing process of low level tasks, which usually deal with the original input pixel matrix directly, could lay the foundation for the mid and high level tasks. The mid level tasks mainly focus on analyzing the intrinsic properties or the motion state of the interested objects contained in images and videos. Mid level analysis bridges the gap between hierarchical visual tasks since it can not only give extra guides for the processing on low level, but also provide effective clues for the understanding on high level. By further exploring the interrelationship between contained objects, high level tasks focus on obtaining whole description and understanding of input visual information. They could guide analysis processing and improve computing efficiency of other level tasks through the top-down feedback.In the computation processing of different computer vision tasks, most existed methods hypothesize the input data follow independent and identical distribution. The analysis results will be obtained by utilizing appropriate machine learning algorithm to better represent and model the visual information. However, since the extracted features contain a lot of noise and redundant information, it is difficult to establish a robust distribution of input data. Moreover, the intrinsic structure and relationship between features have been ignored, so it is hard to get accurate representation of visual information with only low order statistics. Therefore, those methods usually could only obtain the local optimal solutions of visual problems which cause the misunderstanding of the input multimedia information. Analyzing the higher order statistical properties of the input data and characterizing the structure information of original features with priori or constraint information are very helpful to address the aforementioned problems. In this way, the relationship and semantic knowledge between original features could be established and achieve better performances for hierarchical computer vision tasks.In this paper, we concentrate on characterizing the structure information between different visual compositions and integrating them into the computation processing of hierarchical computer vision tasks. With the help of structure analysis, the interrelationship and constraint among different visual entities could be established which result in better analysis performances for each level task.The main contents of this paper can be summarized as follows:Focusing on low level vision task of image segmentation which clusters similar pixels with the same label, we propose a top-down inference with relabeling and mapping rules under hierarchical MRF model to solve the problem of over segmentation due to ignoring prior structure constraint. The relationship between visual feature of each pixel and note in undirected graphical model are constructed. In this way, the intrinsic structure of the original data is transformed into the prior information under the computation processing. The prior information is then integrated into the label calculation of each scale by analyzing the feature consistency in the neighborhood of the same scale and the similarity in the contact scales. The experimental results demonstrate that incorporating the prior information into the low level vision processing by analyzing the interrelationship between each pixel could improve the accuracy, robustness and generality of the method.Focusing on the mid level vision task of object recognition, we propose a novel statistical model of discriminative sequential association latent dirichlet allocation to solve the problem of misrecognizing of similar objects due to the lack of appropriate object structure description. By constructing the corresponding association between different visual entities and nodes in the directed probability graphical model, the generation structure description of the interested object is obtained. Furthermore, the relevance determination mechanism is established by employing the sequential associations as space-time constraint with additional posteriori discriminate and switching variables. The experimental results demonstrate that, the proposed model could achieve better performances with efficient convergence by constructing more appropriate structure of object to guide the mid level inference processing.Focusing on the mid level vision task of human action recognition, we propose a novel discriminative dictionary learning framework by formulating a universal multi-view dictionary to solve the problem of misrecognition due to shared common patterns among different action classes. The proposed method consists of a shared sub-dictionary and a set of class-specific sub-dictionaries, which could characterize the inter-class differences between different actions more efficiently. Additionally, group sparsity and locality constraints are utilized to preserve the relationship and structure among features. Furthermore, multiple descriptors are fused to obtain more robust action descriptions. The experimental results demonstrate that, the more discriminate characterization and better performances could be achieved by analyzing the internal relationships and structure between different action classes.Focusing on the high level vision task of scene classification, we propose a hybrid discriminative approach with Bayesian prior constraint to solve the issue of poor generalization and classification performances of discriminative models under limited training samples. The generation structure of the scene with a number of constituent objects is established by introducing the generative prior into the discriminative approach. Moreover, an effective fusion decision with feedback inference is defined to obtain the most confident testing samples along with the estimated labels from different classifiers. In this way, the model could be updated effectively with the automatically enlarged training set. Experimental results demonstrate that, the discriminative of model and the accuracy of scene classification will be improved by constructing the fusion of different classifiers at the decision stage to guide the high level knowledge inference processing.
Keywords/Search Tags:Structure Analysis, Priori Information, Structure Constraint, Computer Vision, Machine Learning
PDF Full Text Request
Related items