Font Size: a A A

Research On Video Annotation With Machine Learning Techniques

Posted on:2009-08-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:M WangFull Text:PDF
GTID:1118360242995814Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the advances in storage devices, networks, and compression techniques, large-scale video data become available to more and more ordinary users. How to manage and access these data becomes a challenging task. Video semantic annotation is a technique that attemps to detect semantic concepts in video clips according to their content, and it can facilitate high-level applications, such as video indexing, retrieval and summarization.The most intuitive approach to accomplishing this task is manual annotation. However, manual annotation is a labor-intensive and time-consuming process, and it can hardly be applied for large-scale data set or concept set. Thus, learning-based video annotation becomes an alternative method. In this thesis, we propose several learning-based video annotation methods, which aim to obtain accurate annotation results automatically or semi-automatically (with minimum interactive manual operations). The main contributions are illustrated as follows:1. We incorporate unlabeled data into the traditional KDE algorithm and obtain two semi-supervised methods, i.e., SSKDE and SSAKDE, which are able to tackle the training data insufficiency problem. The traditional KDE method is simple yet efficient, but its performance highly depends on the size of training set. On the other hand, training data insufficiency is frequently encountered in video annotation due to the high labor costs of manual annotation. Through the exploitation of unlabeled data, this difficulty can be attacked and annotation performance can be significantly improved.2. We propose a unfied automatic video annotation scheme. Besides the training data insufficiency problem, there are several other difficulties in video annotation, including the dimensionality curse, the choice of distance metric, and the utilization of temporal consistency. We have analyzed that these problems all correspond to the similarity estimation issue or semi-supervised learning problem, and thus they can be tackled in a multi-graph semi-supervised learning scheme. We propose an optimized multi-graph semi-supervised learning (OMG-SSL) method, which integrates multiple graphs into a unfied regularization framework and the weights can be automatically adjusted according to certain criterion.3. We propose a multi-concept multi-modality active learning method for semi-automatic video annotation. Active learning is an interative learning approach that involves both human and computer. Through the iteration of learning and sample selection, the obtained training set can be more effective than that gathered randomly. Thus, applying active learning is another paradigm to tackle the training insufficiency difficulty. However, most of the existing active learning methods applied in video annotation have not considered the properties of video annotation, i.e., multiple concepts and multiple modalities. So, we propose a multi-concept multi-modality active learning method to simultaneously address these two issues. In each turn, the concept that is expected to get the highest performance gain is selected, and the numbers of the selected samples for multiple modalities are set to be proportional to their corresponding performance gains. After that, a graph-based semi-supervised learning method is applied for each modality. In this way, the human efforts can be sufficiently explored.4. We propose a video shot size annotation scheme. The annotated concepts in the existing works mainly belong to scene, event and object categories, and the video shot size patterns are ignored. Different from these general concepts, video shot size patterns have their own properties. For example, the patterns are exclusive and there exists certain order among them. In addition, only using the often-applied low-level features can hardly obtain satisfactory results, since the patterns are correlated with several mid-level fatures, such as the numbers and sizes of the regions in the frames. Thus, we propose a video shot size annotation scheme based on the co-training between a low-level feature set and a mid-level feature set. Furthermore, based on the order in the shot size patterns, a cost function is introduced, and the final decision is made according to cost-minimization criterion.A noteworthy issue is that although the methods in this thesis are proposed for video annotation, in fact they can be applied in many other domains as well (such as the SSKDE and OMG-SSL methods). Video annotation is closely related with many different domains, such as machine learning, computer vision and cognitive science. We also hope that our work can provide several inspirations or methods for these communities.
Keywords/Search Tags:Video Annotation, Video Retrieval, Machine Learning, Semi-Supervised Learning, Active Learning, Multi-Concept, Multi-Modality, Shot Size
PDF Full Text Request
Related items