Font Size: a A A

Domain Adaptation And Semantic Correlation In Video Concept Detection

Posted on:2017-03-22Degree:DoctorType:Dissertation
Country:ChinaCandidate:J GengFull Text:PDF
GTID:1108330482979519Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
Nowadays, smart devices are popularizing with an unexpected speed. Meanwhile, the internet and mobile networks are connecting the information from the whole world. People become information creators and disseminators other than receivers. Videos, as commonly used carriers of information, are widely used in daily lives. They appear intuitively and vividly because of the combination of multiple modalities such as images, audios and timing. However, the complex semantic information included in videos brings many challenges to manage and retrieve them effectively and precisely. The manual annotation of video contents is quite expensive. Hence, the technology of content-based video concept detection is developed for directly extracting the semantic concepts such as object, people and scenes from videos.Difficulties still exist in the current video concept detection systems, making it unable to be applied in the practice satisfactorily. For examples, the domain differences between training and testing datasets cause performance decreasing on the concept detectors; how to find a flexible and effective fusion method among multiple features; how to cross the "semantic gap" that widely exists between low-level and high-level features. Based on the three challenges, this thesis makes creations on the domain adaptation on feature-level, the domain adaptation on multi-feature fusion and the semantic video concept combination. The main contributions are listed as follows.(1) For the problem of different sample distributions between different domains on the feature-level, we propose the domain adaptive boosting (DAB) algorithm based on the Adaboost algorithm. It mainly focuses on the unreasonable data allocation between the target domain (testing dataset) and the source domain (training dataset) that exists in the TrAdaBoost. Two main steps are proposed:first, the unsupervised clustering is implemented in the source domain feature space; second, in each iteration, the selected samples from the validation set of the target domain are mapped into the clustered space to match the samples of the source domain. Finally, the selected samples from both domains are together used to train the weak learner. As an extension of the TrAdaBoost, the DAB can well fit the domains with large scale definitions and the validation set with small size. Two advantages of the DAB are noticeable:the importance of the target domain samples is emphasized and improved results are obtained in experiments; the calculation is decreased because we do not need to judge and weight the samples of the source domain.(2) For the problem of the domain adaptation lacking of the current multi-feature fusion models, we propose the domain adaptive linear combination (DALC). It is a later fusion model based on the scores outputted from the multi-feature classifiers. Based on the linear combination (LC) model, DALC updates the fusion parameters of the LC model from analyzing the differences between the source and target domains. The basic idea is finding a correlation between LC fusion parameters and the domain samples. Then, it uses the correlation built in the source domain to guide which in the target domain. The updating is based on searching a better fusion parameter to minimizing the distance of correlations in two domains. DALC model is a generic unsupervised method. It is fast without any training. It performs better than other multi-feature fusion methods which do not consider the domain differences.(3) For the problem of the combination of video semantic concept, we propose the concept combination model based on the node equilibrium (NE). The NE model and the DALC model construct a two-stage semantic model. The NE model is based on a mechanical model which represents the shot-concept pair as a physical node; represents their scores as positions; represents the correlations among concepts as physical attractive and repulsive forces. For different kinds of concept correlations, different kinds of forces are defined. They can effect on a same node to represent that the original scores are updated by multiple kinds of concept correlations. Compared with the state-of-art, the NE model is able to use multiple kinds of concept correlations simultaneously to construct more complex semantic structure. We used three kinds of correlations in the thesis:co-occurrence correlation, hierarchical correlation and temporal correlation. The NE model is a heuristic model which is based on simulating the human prior knowledge. Hence, it is intuitive. Besides, it has a brief combination formula which is fast to solve. And it can use supervised or unsupervised combination parameters depend on whether the prior knowledge or a training dataset is provided.
Keywords/Search Tags:Video annotation, concept detection, domain adaptation, multi-feature fusion, concept combination, semantic model
PDF Full Text Request
Related items