| Numerous web videos associated with rich metadata are available on the Internet today. Here metadata means title, tags and description, these information can help people better understanding the content of video. Among them, tag which is the semantic annotation of video is an important indexing form for videos. Most of the video retrieval depends on tags. But, tags in video is not specific and accurate. Especially, tag is the description of full video now, but in fact, most of the tags only the description of certain parts of the video. In this paper, by learning precise tag localization on web videos, more specific and accurate video tagging can be realized, and it can benefit not only to video retrieval, but also to video related applications and research issues.In this paper, we first make the meaning of tag localization clear and introduce its significance, related field and basic methods. Then, we introduce our proposed approach, our method is based on topic model and kernel density estimation. Because video consists of keyframes, tag localization can be treated as from video level tagging to keyframe level tagging. Our method includes three steps, first, kernel density estimation model is used to calculate the relevance of keyframes, keyframes with high relevance will be selected as validation set. Then LDA is used for mining the underling semantic information of keyframes and semantic topics can be learned, then validation set is used to calculate the relevance of each topic. Finally, relevance of keyframes is calculated again based on their topic distribution and relevance score, the final relevant keyframe set is obtained. Our method is tested on a real dataset called YouTube22, experiment results validate the effectiveness of our method. YouTube22is not designed for tag localization, as tag localization is gaining increasing research interests, a benchmark dataset for the fair evaluation of tag localization algorithms is highly desired. In this paper we published and introduces a data set called DUT-WEBV for evaluation tag localization algorithms, our dataset contains50semantic concepts, a total of4000video, we carefully annotated each video and our proposed approach is also verified in this dataset. Finally, we introduce tag localization methods for object related semantic concepts based on general object detection. |