Font Size: a A A

Research On Key Technologies Of Mobile Multimedia Annotation And Management

Posted on:2016-12-16Degree:DoctorType:Dissertation
Country:ChinaCandidate:D B DuanFull Text:PDF
GTID:1108330482457817Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With fast transmission rates, improved compression algorithms and de-creased cost of storage, especially the popularity of smart phones and emer-gence of social networks, digital visual data including video and images are growing explosively in recent years. The demands for effective and efficient data retrieval is ever increasing. It’s a common scheme to annotate those data which can be later accessed using text retrieval skills. As manual annotation is known to be expensive, inefficient and overly personalized, automatic annota-tion by machine is more preferred nowadays.Among all the automatic techniques, concept based annotation is one of the the most popular. Although a lot progress has been made, there are still several problems that hinder its further development, including heavy depen-dence on training samples and limitation of semantics in visual data. In this thesis, we tend to handle the task of automatic video annotation from a new perspective. Visual data in itself are digital descriptions about entity and events in the world recorded by visual sensors. Annotation tries to restore the original semantics of visual data and rephrase them in textual words for ease of man-agement. Technically function of visual sensors are limited and a large amount of semantically rich contextual data are ignored during recording. Currently most of researchers still focus their work on analysis of visual data, while we pay our attention to the process of data generation. With the development of Internet of Things(IoT). wearable devices equipped with powerful sensors are commonly used in everyday practice. This thesis conducts a deep research on labeling video data with the help of wearable sensors, and makes the following achievements:· Conventional face detection and tracking need to process every frame in video. We propose a fast face detection and tracking algorithm in this thesis. A large amount of faceless frames are filtered out based on con-textual data collected by wearable sensors. Detection and tracking time, false positive and negative rates are substantially reduced as proved in our experiments.· Based on previous achievements in sensor assisted fast face tracking, we propose a frontal face recognition algorithm which makes use of the cor-relation between body orientation and face direction captured respectively by orientation sensor and camera. Similar to the approach for person i-dentification, this method requires no sample faces for training and thus achieves better performance demonstrated by extensive experiments.· To identify a person in video, we need to collect as many representative training samples as possible. We propose a person identification method which takes advantage of motion consistence of the sample person sensed by different sensors. The method frees itself from dependence on training data and outperforms conventional efforts in terms of both computational complexity and accuracy.· We propose a method for automatic video annotation. The method con-ducts activity recognition from video data of camera and acceleration data of accelerometers respectively. Identity of person are disclosed by fusing recognition results and we could label the video in the format of when, where, who and what.
Keywords/Search Tags:Context, Sensor Fusion, Person Idendtification, Frontal Face Recognition
PDF Full Text Request
Related items